Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

20
#SMX #25A Christine Smith @websmithc ...and how to prevent the problem! Thousands of Pages Missing from Google SERPs

Transcript of Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

Page 1: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

...and how to prevent the problem!

Thousands of Pages Missing from Google SERPs

Page 2: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

§ >1M technical documents

§ Self-support for server and software admins

§ Searchable by error codes, etc.

IBM’s self-support site

http://www-01.ibm.com/support/docview.wss?uid=swg21363866

Page 3: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

Thousands of pages missing from SERPs!

Page 4: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

Google monthly referrals dropped 28%

Page 5: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

ü Pages/URLs displayed correctly

ü Redirects were working normally §  302 www.ibm.com -> www-01.ibm.com (multiple infrastructures)

ü Canonical URLs were correct <link href=“http://www.ibm.com/support/docview.wss?uid=swg21363866" rel="canonical"/>

ü Robots.txt was not blocking anything

The Good News: The pages were fine

Page 6: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

§ Only 10% of Sitemap URLs were indexed § Regenerated sitemaps

ü Corrected URLs to match canonical URLs ü Improved to 60% of Sitemap URLs indexed

ü  Today, 88% are indexed

§ But, still no Google referral improvement

Sitemaps were not optimal

Page 7: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

§ Opened Google Site Search support ticket § Google findings: § A sampling of the missing URLs were:

1.  Marked as a duplicates of the Support Registration page and

2.  Were last crawled five months before!

Engaged Google

Page 8: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

§  Since Google’s index marked the pages as duplicates…

Thousands of Support pages were effectively

deindexed

In other words…

Page 9: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

§ There was no way to get a list of all URLs affected

§ Panda was updated and Data Refreshed within a few weeks

Meanwhile, according to Google

Page 10: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

§ Submitted a Manual Actions Reconsideration Request §  Some technical docs were incorrectly flagged

§ Requested increased crawl rate for the domain §  Hoping the pages would be revisited

We kept pulling every lever available…

Page 11: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

Google referrals increased 22% the following month

Page 12: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

§ Traffic bounced back.. Almost overnight. § GSS ticket was closed, but not resolved § Was it.. Ø Increase crawl rate? Ø Better sitemaps? Ø Panda and data refresh? Ø Normal re-crawl of the pages?

What was the fix?

Page 13: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

§ Likely culprit: §  Faulty redirect or §  A bad site maintenance redirect

§ Typical server response during outage: §  404 or 500 or 504 HTTP response §  Or 302 redirect to a maintenance page

More importantly: What was the cause?

Page 14: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

§ Give a 503 Service Unavailable HTTP response!

§  Retry-after time in the header helpful §  Tells Google to come back later

§ Do not set all 5xx responses to 503 §  Google will ignore and assume site is down

Site maintenance – the right way

Reference: http://googlewebmastercentral.blogspot.com/2011/01/how-to-deal-with-planned-site-downtime.html

Page 15: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

§ Some web platforms automatically give 503 HTTP response during an upgrade §  e.g. Wordpress

§ Others require workarounds, like §  Apache, IHS (IBM), IIS (Microsoft) rewrite rules §  Akamai logic

How to indicate site maintenance

Page 16: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

RewriteEngine on RewriteCond %{REQUEST_URI} !^(/maintenance.html)$

#RewriteRule ^(.*)$ - [R=503]

#Planned outage for /services/ path

RewriteRule ^(/services.*)$ - [R=503]

#Planned outage for /support/ path

RewriteRule ^(/support.*)$ - [R=503]

ErrorDocument 503 /maintenance.html

Update httpd.conf during maintenance

•  Requires Apache / IBM HTTP Server restart to update httpd.conf before and after maintenance •  Remember to both port 80 and port 443 •  If the entire site is down, return a 503 response on robots.txt

Page 17: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

Akamai flow for 503 HTTP response During planned outage, upload a filename matching the application path to a NetStorage /maintenance/ directory Custom Flow •  If Origin gives 500 or 504 response,

then •  If the maintenance file exists,

then •  Serve maintenance page

with 503 response •  Set retry-after to 1 day

•  Otherwise, serve error page matching Origin response (500/504)

•  Otherwise, serve Origin response

Page 18: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

ü  Never give 200 or 301 HTTP response when site is down! ü  Confuses the crawler/indexer ü  May cause pages to be “deindexed” as duplicates

ü  Configure 503 HTTP responses only during planned outages and…. ü  Don’t forget to remove when maintenance is over!

Lessons Learned

Page 19: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

THANK YOU!

SEE YOU @SMX WEST SAN JOSE, CA

MARCH 1-3, 2016

Page 20: Thousands of Pages Missing From Google SERPs...And How to Prevent the Problem! By Christine Smith

#SMX #25A Christine Smith @websmithc

§  Apache: §  https://gist.github.com/jjulian/1889874

§  IBM HTTP Server (IHS): §  http://www.ibm.com/support/docview.wss?

uid=swg21397422 §  Microsoft IIS:

§  http://serverfault.com/questions/483145/how-to-add-a-site-wide-downtime-error-message-in-iis-with-a-custom-503-error-co

References: Rewrite Rules for 503 Response