Post on 07-Apr-2018
8/4/2019 Duplicate Content & ECom
1/15
Is your E-commerceSystem Harming Your
Search Engine Rankings
www.altruik.com
Hamlet BatistaChie Search Strategisthbatista@altruik.com
8/4/2019 Duplicate Content & ECom
2/15
Copyright 2011 Altruik, Inc.
1
Table o Contents
What Is Duplicate Content? 3
How Duplicate Content Aects Your
Search Engine Rankings 5
How To Put An End To Duplicate Content
So You Can Reclaim Your Ranking 8
When Duplicate Content Is Not Really
Duplicate At All 11
Sound Like Too Much Manual Labor? 12
Will You Prot From Addressing Duplicate
Content Issues? 13
Heres What You Should Do Next 14
As an online retailer, your search engine
strategy isyour business strategy. Have
you noticed your search engine rankingsslipping away recently? Do you wonder what
the cause might be? It is critical that every
page selling your products ranks as highly as
possible in search engines like Google. Thats
why it is important that you optimize your site
or search engine spiders, especially i you are
using a CMS (content management system).
There is a hidden dangeran issue that aects
the majority o e-commerce websiteswhichmost business owners dont know about it
until it is too late. That problem is duplicate
content.
I you have multiple copies o the same page,
dierent URLs that point to the same content,
and navigation systems that track your users,
there is a good chance that you have an issue
with duplicate content, too.
Most Content Management Systems, as use-
ul as they are, surprisingly are not designed
with SEO in mind. Your CMS eatures tools
that make nding products easier or visitors
to your website. But those same eatures that
duplicate product pages into multiple catego-
ries oten make it dicult or Google to crawl,
index, and rank all o the pages on your site.
8/4/2019 Duplicate Content & ECom
3/15
Copyright 2011 Altruik, Inc.
2
Duplicate content causes serious problems
because it:
Weakens the rank o your most popular
pages
Sends Google on a wild goose chase, caus-
ing it to abandon your site altogether
Blocks large portions o your website rom
getting indexed
Prevents your most protable pages rom
reaching the top o Googles rankings
Cripples your best link-building eorts
Duplicate content problems are like leaky au-
cets. As more sites link to your duplicate URLs,
the reputation and rank o your top-selling prod-
uct pages go down the drain. Products that once
ranked very high suddenly begin tumbling down
the rankings, and your competition gains the
upper hand. The question now is: how can you
identiy duplicate content and patch up the leaks
that are ruining your search engine rankings?
Keep on reading because were going to teach
you what most people dont know about the
mess their CMS is leaving behind. Your priority
is to patch these leaks beore they drown yourentire online business. With the right tools, you
can build an even stronger search presence.
I you have a duplicate content problem,
huge portions o your website might not be in Googles index.
8/4/2019 Duplicate Content & ECom
4/15
Copyright 2011 Altruik, Inc.
3
What Is DuplicateContent?
First, the basics. Duplicate content
is any page on the Internet that is
either exactly the same or nearly
identical to another page. Google
compares the text o multiple pages
to determine a match. I the writ-
ten content is exactly the same, or
almost exactly the same, Google
considers the newer page to be du-plicate content.
Most duplicate content is created
when your CMS allows visitors
(and Google) to access the same
page rom dierent URLs. Lets say
your online store has a category or
shoes and another category or all
products in the color black. The
same pair o black shoes can be ac-
cessed rom two dierent category
combinations, one in which the user
selects shoes rst, and another in
which the user chooses black rst.
These pages are almost identical. The URLs are dierent
but lead you to the same product, Jessica Simpson Womens
Leve Black Leather shoe. In each example, users selectedvarious categories in dierent orders and were able to ac-
cess the same content via dierent paths.
8/4/2019 Duplicate Content & ECom
5/15
Copyright 2011 Altruik, Inc.
4
Googles search engine robot crawls your web-
site like a nosy visitor, ollowing each link or
every category. It will nd the same page twice,
once under one combination o categories,
and another under the other. You dont actu-
ally have two copies o the same page, but your
CMS setup certainly makes it look like you do.
Duplicate content is also created when:
You use multiple subdomains. Google thinks
you have duplicate content when you put
the same page on http://example.com as
you do on http://www.example.com. That
www. makes a big dierence to Google.
Your CMS creates separate pages or di-
erent product colors. Google cant tell the
dierence between an image o a blue shoe
and a red shoe (it relies on textual descrip-
tions). It will conclude that one o these
pages is a duplicate.Your CMS dynamically
generates pages as your users click on links.A good example o this is a calendar that
creates a new page every time you click on
the next month link.
You, or people linking to your pages, add
extra parameters to URLs (sometimes or
tracking), creating multiple URLs that direct
Google to the same page over and over
again.
As you can see, duplicate content can arise rom
a variety o sources. Each o these is another
leak in your aucet. It creates a number o nasty
problems, both or Googles search engine robot
and or other search engines. Sometimes it
sends the robot on an endless chase that Google
eventually abandons, and at other times, it
simply dilutes the reputation o all your aected
pages. When only a small portion o your site
makes it into the search engine rankings, your
overall ranking suers.
Page reputation is diluted with the same con-
tent is accessible through multiple URLs. You
can recapture reputation and prevent duplicate
content by consolidating non-canonical versions
with 301 redirects. Source: Googles SEO Report
CardGoogle Webmaster Central
In the next section, well show you how dupli-cate content prevents your most protable pages
rom making it to the top o Googles rankings.
Glossary
301 Redirect An HTTP status code. Automatically redirects users to a specic URL
200 A successul request, content is returned
8/4/2019 Duplicate Content & ECom
6/15
Copyright 2011 Altruik, Inc.
5
How Duplicate Content Aects Your Search EngineRankings
Now were ready to see how duplicate content aects Googles impression o your content, wreak-
ing havoc on your search rankings in the process.
How duplicate content
dilutes the ranking o your
top-rated pages
Lets say you just wrote a
popular article that wentviral. Would you rather see
the entire article getting a
million views, or would your
preer to split the article
in two and assign 500,000
views to each section? I you
chose the ormer option,
youre on the right track.
As a single page receives
more views, it increases thechances o receiving natural
links. More people share the
page, blog about it, and link
to it.
You want your website listed in the prime real estate o the results
page. Splitting links will dilute rankings o your strongest pages.
8/4/2019 Duplicate Content & ECom
7/15
Copyright 2011 Altruik, Inc.
6
When you have multiple versions o the same
article, video, or page, Google splits your repu-
tation between all o the pages. Your duplicate
pages siphon o a large portion o your inbound
links, and it takes longer or your article, video,
or page to rank highly in Google. No matter
how many links you get, some o them are go-
ing down the drain.
How duplicate content cripples your best
link building eforts
Consider another example. A Doggy Care
website using a CMS creates two URLs or dog
bone under the category ood and another
under the category treats. To a search engine,
the result is once again duplicate content.
Thats only the hal o it. What happens when
customers really like the dog bone and want
to tell others about it? They link to it on their
website. However, because there are two dier-
ent pages created by the CMS or the same dogbone, they might link to either one o them.
A product that would have received 100 links
only receives hal that. The rest leak over to the
duplicate page.
When youre trying to rank highly in Google,
you must avoid wasting your links and repu-
tation on duplicate pages. I these duplicates
make it into Googles index, they will almost
certainly be ltered out o the rankings. 100% o
your inbound links should go to the same page.
In Googles eyes, that gives you 100% o the
reputation.
How duplicate content sends Google on a
wild goose chase
Another problem to consider can be even more
tragic or your search rankings. What might
happen i Google decides that your website is
composed mostly o duplicate content? The
short answer is that it will stop indexing your
pages and move on to other websites. Here is
what Matt Cutts, head o Googles Webspam
team, has to say about duplicate content:
Imagine we crawl three pages from a site, and
then we discover that the two other pages were
duplicates of the third page. Well drop two out of
the three pages and keep only one, and thats why
it looks like it has less good content. So we might
tend to not crawl quite as much from that site
[T]he fact that you had duplicate content and
we discarded those pages meant you missed
an opportunity to have other pages with good,
unique quality content show up in the index.
There are a number o scenarios in which
Googles robot will give up crawling your web-
site, leaving vast numbers o pages completely
out o the index and your site fagged as mostly
spam. Here are some o the most common
caused by your CMS:
8/4/2019 Duplicate Content & ECom
8/15
Copyright 2011 Altruik, Inc.
7
Your CMS creates a calendar that generates
a new page or a new month every time you
click on the next month link. Because your
website keeps generating a new link every
time Googlebot ollows the next month
link, Googlebot keeps ollowing this link as
long as it can and eventually times out.
Your website eatures a guided navigation
shopping cart with categories or dierent
brands and types o products. Because the
products and categories are linked to each
other (oten in very complex ways), Google-bot keeps ollowing the links in circles until it
times out.
Your website uses a session ID in its URLs
to track users who have cookies disabled
(jsessionid is a common example o an
in-URL session ID that gets indexed as du-
plicate content). I these IDs are present in
the path_ino portion o your URL, they are
particularly dangerous.
This last one can be particularly nasty. When a
search engine bot crawls the site, it acts like a
user with browser cookies disabled. Each time
Googlebot requests a page, it is given a new page
with a new jsessionid. This quickly causes the
bot to see millions o pages that are identical,
diering only in the URLan innite space that
Googlebot treats as duplicate content.
Once Googlebot understands that is going in
circles (or down an endless drain like the calen-
dar example), it concludes that your site is com-
posed mostly o duplicate content and stops
crawling your website. This is a very bad thing,and it can cause large portions o your site to go
unnoticed. You can make vast improvements in
your search engine rankings by tackling just this
problem alone.
Now that you understand how duplicate con-
tent can harm your search engine rankings,
we want to show you what you can do to stop
your CMS rom creating so much o it. Youll be
happy to know that all o these problems can be
solved, and you can use automated tools to help
you handle most o them.
Source: Google's SEO Report CardGoogle Webmaster Central
8/4/2019 Duplicate Content & ECom
9/15
Copyright 2011 Altruik, Inc.
8
How To Put An End To Duplicate Content So You CanReclaim Your Ranking
As you have already seen, duplicate content problems happen all on their own. I you dont do
something to address them beore they aect your ranking, your competitors will gain the edge.
There are solutions to duplicate content problems and well take a look at the how to solve the
most dangerous ones.
Is Your Content Accessible From Multiple
Subdomains?
As we discussed earlier, when your website
is accessible rom multiple subdomains (or
example, both example.com and www.ex-
ample.com), Google treats the content on one
o the subdomains as a duplicate. It can also
happen when your CMS uses multiple URLs to
point to the same content. I Google ollows
the link http://
www.dogtoys.com/
chewybone.php
and http://www.dogtoys.com/bones/
chewybone.php
to the same page,
Google will index a
duplicate page or
one o the URLs.
But the x is rela-
tively easy. You just
need to tell Google
which subdomain contains the original source
material. There are two ways to do this:
1. Implement 301 redirects to send people to the
right subdomain with the original content.
2. Or use the Googles Webmaster Tools to
choose which domain contains the original
content. This process is sometimes called
canonicalization.
When this page is selected in the search engine results page, us-
ers are automatically directed to the canonical URL.
8/4/2019 Duplicate Content & ECom
10/15
Copyright 2011 Altruik, Inc.
9
Once you have indicated to Google where it can
fnd the original content, it will no longer index
your subdomain. Because your primary (canoni-
cal) page will be the only page that can get links
and reputation, it will start perorming much
better in the search engine rankings. Congratu-
lations: youve just fxed one o the leaks in your
aucet.
Are some pages near duplicates o others?
What to do when your product descriptions
only difer by ew words.
This problem usually aects online retailers
who sell many dierent versions o the same
product. Perhaps you sell a golden chocolate
basket, a silver chocolate basket, and a bronze
chocolate basket. I the only dierence between
one product description and the next is the
color or the image, you need to indicate this to
Google so that it does not conclude that you
have duplicate content.
You can do this by using the rel=canonical
link tag on the pages with the near-identical
content. Make sure you place this tag some-
where in section o these near-duplicate
pages, just as you would with meta tags. Heres
an example.
Whenever you use this tag, you are telling
Google that the current page is either a dupli-cate or a near-duplicate, and the original page
can be ound at the address you have specied.
Do your URLs contain extra parameters or
tracking and sorting? They might accidentally
convince Google that you have a duplicate
content problem.
Some shopping carts add parameters to your URLs or the purposes o sorting, dividing products into
pages by category, and tracking users. Googles search engine robot unwittingly ollows all o these
URLs, and it keeps fnding more duplicate content. I you dont tell Google which parameters to ignore,
Googlebot will keep spinning its proverbial wheels. Heres what you can do:
8/4/2019 Duplicate Content & ECom
11/15
Copyright 2011 Altruik, Inc.
10
How to stop Google rom going on a wild
goose chase.
Sometimes Google nds large sections o your
website that contain links to pages with no
original content. This is called the innite
space problem because Googlebot gets stuck
in these sections, continually crawling the
same series o dynamically generated pages or
URLs with session IDs and tracking parameters,
over and over again. As we discussed, oten the
culprit is the jsessionid parameter. Thankully,
there is a way to stop it.
Google knows about the innite space prob-
lem, and will tell you i your website has this
issue when you log in to Google Webmaster
Tools. Specically, it will list which links lead
to an innite space, and oers a ew tips topatch things up.
Once youve ound the links that lead to an
infnite space, do one o the ollowing:
Set the rel attribute in the suspicious link
to noollow. When you do this, your new
link should look like the ollowing:
< a hre=http://www.calendar.com/nextmonth.php
rel=noollow>next month
Block the innite space URLs in your robots.
txt le.
Make it impossible or search engines to ex-
tract these URLs. You can do this by hiding
them within JavaScript.
Now that you have the tools to clean up dupli-
cate content, in the next section well consider
a ew important cases where duplicate content
is not only acceptable, but necessary.
Google Web-
master Tools
allows users to
defne what pa-
rameters Google
should ignore
when crawling
a website.
8/4/2019 Duplicate Content & ECom
12/15
Copyright 2011 Altruik, Inc.
11
Sometimes you end up with exact duplicatepages or legitimate reasons. This is no crime,
o course, but it does require you to let Google
know so that your site may be indexed appropri-
ately by the search engine robot. It also pre-
vents your website rom being fagged as mostly
duplicate content. Heres the x:
Use a 301 redirect i you have duplicatepages that just cant be avoided.
Using a 301 redirect not only sends your users
to the canonical page, it also tells Google that
the page is an exact or near duplicate. Google
continues to crawl your site because you are no
longer using up its bandwidth unnecessarily.
There are also two minor cases worth under-
standing where duplicate content can actually
help your rankings. Keep in mind, these are very
specic and do not apply to every website.
You dont have to worry about localizedcontent on international domains.
What happens when you host the same content
on dierent regional servers and international
domains? For example, suppose you copy the
same content on http://www.example.com to
your local servers at http://www.example.r.
Will the content make it into the search engineresults page abroad, or will also be deemed du-
plicate content?
In this case, there is nothing to worry about.When hosted on dierent international do-
mains, search engines like Google do not con-
sider the same content as duplicate content.
That said, the issues concerning subdomains
that we discussed previously also apply to your
international websites. That means you will
have to go through the time-consuming task o
canonicalizing your URLs so that they all point
to the same international pages, just like you
did on your home website domain.
Sometimes you dont need to consolidate your
duplicate pages. Heres how to know when...
As youve learned, in most cases it is benecial
or a single page to garner the highest possible
rank. Ater all, i this page eatures one o your
bestselling products, you are practically guaran-
teed more sales. But there is one case when us-ing canonical tags and giving all o your reputa-
tion to a single page isnt the best idea.
When your visitors really care about your
products attributes (e.g. the products color),
it might be smart to separate your pages. Lets
return to the example about shoes. I your
online store oers the same shoe in multiple
colors, and you have ound that customers are
specically searching or products in the colorturquoise, you might benet rom treating
each color o the product as a separate page.
When Duplicate Content Is Not Really Duplicate At All
8/4/2019 Duplicate Content & ECom
13/15
Copyright 2011 Altruik, Inc.
12
Both your shoes and your turquoise pages
will get trac rom color-based searches. Your
competitors are probably doing the same thing.
Whenever you separate your pages, you need
to make them stand on their own. Your product
page or the turquoise shoes must be distinct
enough rom the page or the black shoes to
pass Googles duplicate content lter. Otherwise,
Google will not rank the page at all. It is not
enough to swap out a ew words and reorganize
paragraphs to create a new description. Google
is too smart or that. Youll need to rewrite eachnew product description rom scratch.
Once again, it bears repeating that this is an ex-
ceptional case. You must really understand your
customers, and more importantly, pay attention
to their search behavior.
I your customers are not typically searching
or dierent variations o the same product, it
is sae to use canonical tags and consolidate
duplicate content. But i they usually search
or items by their color, size, weight, etc., you
should keep the pages separate and write new
descriptions to individualize the content.
It doesnt matter i you own one website or
many websites on several international do-
mains. By now, you have the knowledge to
understand and tackle the problem o duplicate
content. However, you have probably realized
just how time-consuming the process o consol-
idating your content can be. Do you really want
to go through every duplicate or near-duplicate
page, every subdomain, and every extra param-
eter in your URLs?
You are a businessperson, so like us your an-
swer will be an armative NO! You have better
ways o spending your valuable time. Luckily
or you, we developed our Lighthouse sotware
originally to solve our own duplicate content
problems. We were slaving away, consolidating
content or one o our clients, and we sim-
ply grew tired o the whole process. You can
manually implement only so many 301 redirects
beore you start thinking, There has to be a
better way.
Lighthouse does everything weve discussed so
ar, and it does a ew more things beyond the
scope o this paper. Here is a quick rundown:
Sound Like Too Much Manual Labor? Theres Good News.Most O It Can Be Automated.
8/4/2019 Duplicate Content & ECom
14/15
Copyright 2011 Altruik, Inc.
13
Automated 301 redirects and
rel=canonical tags. Lighthouse spots your
duplicate pages and automatically imple-
ments 301 redirects and rel=canonical tags.
Automated robots.txt analysis. Lighthouse
nds and corrects problems with sitemap
accessibility, innite spaces, and crawl
delays.
We understand how all o this can seem like
a huge project at rst. Thats why wed like to
show you a way to measure the direct business
benet youll get rom tackling each o these
issues head on. In the next section, youll learn
what you need to know beore you decide to
launch an all-out assault on your websites du-
plicate content.
Its one thing to suspect you have a problem.
Its quite another to know the severity o the
problem and identiy where it is located. You
wouldnt x a aucet that isnt leaking, so why
would you tackle a duplicate content problem
that is practically nonexistent? We want to
show you how to measure the direct business
benet youll get rom patching up the leaks
your CMS leaves behind. It works wonders orus, and we are sure it will or you too.
Step one: establish a baseline or measure-
ment. First, determine how many pages your
site has. Add up your product pages, category
pages, and ancillary pages. The total number is
your real number o site pages. Consider two
key monthly metrics: 1) Revenue per page (total
site revenue pages indexed in Google) and 2)
Searches per page (total search clicks to your
site pages indexed in Google).
Step two: implement the change and wait. Fix
your duplicate content issues, or hire a proession-
al to do it or you. Then sit back and wait at least
one month beore you make another measure-
ment. Sometimes it takes a while beore Google
returns to crawl the extra pages on your site.
Step three: look or an increase in active pages
and trafc. What you measure next dependson your goals. I you are looking primarily or
increased revenue, as we all are, you want to
see an increase in the number o pages indexed.
Compare your revenue per page beore and ater
the duplicate content fx. The second metric to
consider is search clicks per page. Youll notice an
increase here i your site suered rom duplicate
pages that divided your audience and your links,
reducing your primary page reputation in Google.
I all went well, your canonical pages will rank
higher, and as they perorm better youll also in-
Will You Prot From Addressing Duplicate Content Issues?Heres a Surere Way to Know.
8/4/2019 Duplicate Content & ECom
15/15
Copyright 2011 Altruik, Inc.
crease your revenue. You should see an increase
in the number o unique pages receiving regular
search trac as well as an overall increase in
trac to your website. This increase usually
happens because Google crawled more o your
website and more o your pages made it onto
the search engine results page.
By now you realize that duplicate content prob-
lems happen all on their own, and it is up to you
to stop them beore you lose your rankings to
the competition. Even i you take care o every
piece o duplicate content today, you still will
have to deal with it periodically in the uture.The more content you add to your website, the
more likely duplicate pages will pop up. Its nice
to have a way to constantly keep it in check.
Most business owners wait until their next web-
site redesign to start tackling their duplicate
content problems, but this approach comes at
a huge cost. Each low-ranking page amounts
to customers who never
made it to your store.
Can you really aord
to lose a single sale
between now and your
next redesign?
Automatic duplicate
content management
is the only solution that
makes sense. When youallow our Lighthouse sot-
ware to consolidate your
content as you create it, your pages start to rank
better right out o the gate. You dont have to stop
what you are doing to handle a situation that can
easily get out o control. Its something we like to
call peace o mind.
I you are interested in ridding your site o dupli-
cate content problems or good, we encourage
you to give us a call. Well tell you more about
Lighthouse and how you can use it to take care
o your duplicate content automatically. Why go
through page ater page when sotware can do
all the dirty work? We created Lighthouse be-
cause youve got better things to do.
Heres What You Should Do Next