Meta & X Robots vs Robots.txt - Ari Roth at KahenaCon Advanced 2015

@aroth26 bit.ly/robots-txt-meta-robots

Meta (& X) Robotsvs Robots.txt

Ari RothKahenaCon Advanced


Ari Roth• Senior Digital Marketing Manager at Jerusalem’s DriveHill Media• SEO• PPC• Ecommerce consulting• Email marketing

• Admitted SEO Geek


Meta Robots Tag vs Robots.txt• “…it does seem that even senior SEOs don't fully understand the differences between robots.txt and meta tags.”• It’s true! I and another senior SEO asked both didn’t know!


Meta Robots X-Robots=


Meta Robots


X-Robots


Meta & X Robots Rules• Default is index,follow

• Noindex – Search engines should remove the page from their results.

• Nofollow – All links on the page do not pass PageRank, will not be crawled from that page.• Can still be crawled, indexed if linked to

elsewhere• PROTIP: Make sure “nofollow” pages don’t have

any external links pointing at them so as not to lose that link equity.


Speed of Deindexation with Meta Robots• Google says that this will happen the next time they crawl the page, but many claim it goes slower in reality.

• For speedier removal, use Search Console URL removal tool if feasible• Very difficult if you want to remove hundreds /

thousands of pages• Takes up to 24 hours from my experience

• PROTIP: If need for removal is urgent and number of URLs is huge, submit custom XML sitemap with only noindexed URLs to urge faster crawl (not recommended in general)


Robots.txt


• Must be at the top level directory of the subdomain:• site.com/robots.txt for site.com• www.site.com/robots.txt for www.site.com• NO: site.com/subfolder/robots.txt• NO: robots.txt on a root domain if you want it to apply to a

subdomain or vise versa

• Controls crawling, NOT indexingUser-agent: *Disallow: /blocked-page.htmlEND RESULT: /blocked-page.html will not be crawled but it can still show up in the index!

• PRO: More important pages get crawled more often.

• CON: No crawling means that Google cannot see any of the page’s content

The Basics of Robots.txt


Robots Can’t Crawl – So What?• If indexed, no meta data for new URLs

• Previously indexed URLs with new blocks?• See example to the left• Previous title stays• Description unavailable message

• Obviously not great for ranking or CTR


• No credit for links (!!!)

• PROTIP: If blocked pages have attracted / are attracting links, consider unblocking and using 301 redirects if you don’t want it in organic search.

Robots Can’t Crawl – So What?


Robots Can’t Crawl – So What?• Mass Search Console error messages about blocked Javascript & CSS files• Implications for mobile-friendly SEO

• Gary Illyes: use this magic ruleUser-agent: GooglebotAllow: .jsAllow: .css

• Problem is… it doesn’t work like that Gary!


The Gary Illyes Problem• PROTIP #1: Use a wildcard with Gary’s rule

Allow: *.js

• PROTIP #2: Make sure to repeat the allow rule for each disallowed folder, otherwise the disallow rule will override the shorter allow rule

Disallow: /admin/Allow: /admin/*.js

• See testing I did on this subject on the DriveHill Media blog.

http://drivehillmedia.com/making-sense-of-googles-robots-txt-contradictions/


The Gary Illyes Problem• Adding instructions just for Googlebot will cause Googlebot to ignore the directives for all bots (User-agent: *).

• PROTIP #3: You need to copy and paste those rules to the Googlebot section (separated by linebreak) as well if you want Googlebot to follow them!


The Controversial Rule• You *can* noindex something in robots.txt!

User-agent: *Noindex: /blocked-page.html

• Stone Temple test: worked for 11 of 12 sites!

• John Mueller: “I’d really avoid using the noindex there (in robots.txt)”

• PROTIP: Use meta robots or X-Robots for indexation where possible, but don’t write off the robots.txt option if the others can’t be done.

Ignore me!

https://www.stonetemple.com/does-google-respect-robots-txt-noindex-and-should-you-use-it/


Speed of Deindexation with Robots.txt

• Never deindexed with “Disallow” rule• Just buried – try “site:site.com inurl:pattern”

search

• With “Noindex” rule, see Stone Temple test ->• 11/12 deindexed overall, most within two weeks,

the rest within a month total


Which Rule Wins?Google vs Other Crawlers• Google uses length of the path as a guide to follow the most specific rule.• Longer = better.• * are undefined length – use the testing tool!• See chart to the right

• Other crawlers use order of commands so list allow directives first:

Allow: /admin/*.jsDisallow: /admin/

Disallow: /admin/Allow: /admin/*.js


Using BOTH Robots.txt & Meta Robots?

• You may think:• I’ll use Robots.txt to stop crawling• I’ll use Meta Robots to stop indexing• Best of both worlds!

• WRONG!

• The Robots.txt block means Google can’t crawl the page to see the Meta Robots tag, so it’s as if it does not exist, and you will still see a result like the one to the left.

• SOLUTION: Remove Robots.txt block or add “Noindex” rule:

Disallow: /blocked-page.htmlNoindex: /blocked-page.html


1. Robots.txt & the Meta Robots tag don’t do the same things, so make sure you’re using the right one to fulfill your goals.

2. Make sure the robots.txt rules you add don’t have unintended consequences.

3. Consider using the “Noindex: “ rule in robots.txt as a last resort

4. Make sure you aren’t losing link equity via robots.txt or meta robots blocks.

Takeaways


Thank You!• Senior Marketing Manager - DriveHill Media

• Find the full presentation here

• Tweet me here

Meta & X Robots vs Robots.txt - Ari Roth at KahenaCon Advanced 2015

Marketing

Transcript of Meta & X Robots vs Robots.txt - Ari Roth at KahenaCon Advanced 2015