Download - Technical SEO | Joomla Day Chicago 2012


JoomlaDay! Chicago 2012Jessica Dunbar

Get Your Nerd On:Technical SEO

What we will cover

Accessibility IndexabilityOn-Page Ranking Factors


What is Robots.txt?

used to restrict search engine crawlers from accessing sections of your website

The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) on how to crawl & index pages on their website.

Robots Cheat Sheet

Robots Cheat Sheet

Robots Cheat Sheet

Robots Meta Tags

The robots meta tag is used to tell search engine crawlers if they are allowed to index a specific page and follow its links.

superior to robots.txt

This tells engines they can visit but they are not allowed to display the URL in results.

What are HTTP Status Codes?

response status codes are returned whenever search engines or website visitors make a request to a web server.

HyperText Transfer Protocol (or HTTP) response status codes are returned whenever search engines or website visitors make a request to a web server. These three digit codes indicate the response and status of HTTP requests.



200 or 2xx These codes indicate success


The data requested has been assigned a new URI, the change is permanent.

3xx are types of redirection codes.

301 isThis and all future requests should be directed to the given uri


The data requested actually resides under a different URL, however, the redirection may be altered on occasion

required the client to perform a temporary redirect (the original describing phrase was "Moved Temporarily")

Make life easy, Pass your link Juice use a 301


The requested resource could not be found.

Importance of 404's. Do not redirect 404's to home page or to show a 200 status.


indicate cases in which the server is aware that it has encountered an error or is otherwise incapable of performing the request


The server is currently unavailable (because it is overloaded or down for maintenance)

What are Sitemaps?

Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling.

Your site has dynamic content. Your site has pages that aren't easily discovered by Googlebot during the crawl processfor example, pages featuring rich AJAX or images. Manually submit and checkover your sitemapAuto Sitemaps can have baggage NO ERRORSNO 301s

What is site architecture?

Your site architecture defines the overall structure of your website, including its vertical depth (how many levels it has) as well as its horizontal breadth at each level.When evaluating your site architecture, identify how many clicks it takes to get from the homepage to other important pages. Also, evaluate how well pages are linking to others in the site's hierarchy, and make sure the most important pages are prioritized in the architecture.Ideally, you want to strive for a flatter site architecture that takes advantage of both vertical and horizontal linking opportunities.Its about getting the best, most relevant content in front of users and reducing the number of times they have to click to find it. The same applies to search engines, by flattening your site architecture; you can make potential gains in indexation metrics such as the number of pages generating search engine traffic and the number of pages in a search engine index.

Flash and JavaScript Navigation

Although search engine crawlers are smarter. It is still safer to avoid Flash and JavaScript navigation rather than fix it

Site Performance

Users have a very limited attention span, and if your site takes too long to load, they will leave.

Search engine crawlers have a limited amount of time that they can allocate to each site on the Internet.


pages that search engines are allowed to access. how many of those pages are actually being indexed by the search engines.

Site: Command

The index and actual counts are roughly equivalent - this is the ideal scenario.

The index count is significantly smaller than the actual count - this scenario indicates that the search engines are not indexing many of your site's pages.

The index count is significantly larger than the actual count - this scenario usually suggests that your site is serving duplicate content

Search Engine Penalties

Hopefully, you never have to deal with this. But if you think your site has been penalized, here are 4 steps to help you fix the situation:

Make Sure You've Been Penalized

Be sure you are actually penalize. Use the previous accessibility checks.

Reason(s) for the Penalty

Step 2: Identify the Reason(s) for the PenaltyOnce you're sure the site has been penalized, you need to investigate the root cause for the penalty. If you receive a formal notification from a search engine, this step is already complete.

Fix the Site's Penalized Behavior

Step 3: Fix the Site's Penalized BehaviorStep 4: Beg for forgivness

On-Page Ranking Factors

For each of the on-page ranking factors, we'll focus on URLS, and Duplicate Conten

Best Practice URLs

Is the URL short and user-friendly?

Does the URL include relevant keywords?

Is the URL using subfolders instead of subdomains?

Does the URL avoid using excessive parameters?

Is the URL using hyphens to separate words?

What Is Duplicate Content?

Duplicate content exists when any two (or more) pages share the same or similar content content.

True Duplicates

A true duplicate is any page that is 100% identical (in content) to another page. These pages only differ by the URL:

Near Duplicates

A near duplicate differs from another page (or pages) by a very small amount it could be a block of text, an image, or even the order of the content:

Cross-domain Duplicates

A cross-domain duplicate occurs when two websites share the same piece of content:

These duplicates could be either true or near duplicates.

www vs. Non-www

www vs. Non-www For sitewide duplicate content, this is probably the biggest culprit.

Staging Servers

Trailing Slashes ("/")

Secure (https) Pages

Home-page Duplicates

Duplicate Paths

Duplicate Paths

Having duplicate paths to a page is perfectly fine, but when duplicate paths generate duplicate URLs, then youve got a problem.

Product Variations

Product Variations

Product variant pages are pages that come from the main product page and is only different by an option. Example. Ipod Nano, all the same just color variation

Geo-keyword Variations

Geo-keyword Variations

Back in the good old days, you just copying all of your pages 100s of times, adding a city name to the URL, and use a find and replace.

Content wins here,now days you need to get creative.

Other Thin Content

Other Thin Content

Scraped Content

Scraped Content

Scraped content is just copied content, except that you didnt ask permission. It's illegal STOP IT

How To Find

Google Webmaster Tools

Google Webmaster Tools

In Google Webmaster Tools, you can pull up a list of duplicate TITLE tags and Meta Descriptions Google has crawled. This is a good starting Point.

Googles Site: Command

Googles Site: Command

When you already have a sense of where you might be running into trouble and need to take a deeper look Googles site: command is very powerful

SEOmoz Campaign Manager

Your Own Brain


Contact[email protected]