1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web...

29
1 G o ooo g l e atMIT IT Partners,April 2005 Suzana Lisanti,H ubertPham [email protected]

Transcript of 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web...

Page 1: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

1

Goooogle at MIT

IT Partners, April 2005

Suzana Lisanti, Hubert Pham

[email protected]

Page 2: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

2

Google Session

1. About MIT’s Google Search Appliance (GSA)

2. Adding Google search to your web site

3. Customizing search results

4. Tips on improving a site’s rankings

5. Q&A – actually, ask questions anytime!

Page 3: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

3

MIT's Google Configuration

• MIT license is for 3M documents

• Two collections of 1.5M documents each

• MIT has over 1M web pages on 1,000 web servers

• Google follows links from the MIT Home Page

• web.mit.edu – crawled three times a week

• Other MIT web servers – crawled twice a week

Page 4: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

4

MIT Google does

• Performs twice as well as Inktomi in a “blind test”

• Indexes 220 different file formats

• Provides control over our own crawling schedule

• Allows user customization of search results format

• Indexes certificate-restricted content(not implemented yet)

Page 5: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

5

MIT Google does NOT

• Cache old pages

• Index image files (our decision)

• Index image ALT tags (Google’s decision)

• Allow us to fiddle with the relevancy algorithm

• Tell you “who’s linking to my page” because the GSA does not share that information across collections.

When your pages move, we recommend using a 301 redirect.

Page 6: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

6

MIT Google does NOT index

Java, Perl, Python documentation

Debian, GNU/Linux mirrors

URLs containing these strings:

sipb.mit.edu

dev.mit.edu

net.mit.edu

lees.mit.edu

ops.mit.edu

classics.mit.edu

hypermail

pipermail

Certificate protected pages

No robots sites, no index pages

Dynamically generated pages

containing ‘?’ except by request

URLs containing cgi-bin

URLs containing /afs/

Page 7: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

7

Telling Google not to index

• No robots in server

• No robots in locker/directory

• No robots in html file

• No index, follow

Page 8: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

8

Avg. daily views - January 2005

0

5000

10000

15000

20000

25000

30000

5:00

7:00

9:00

11:0

013

:00

15:0

017

:00

19:0

021

:00

23:0

01:

003:

00

Series1

Total queries Jan 1 - 26: 340,656

Page 9: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

9

Gooogle search forms

Page 10: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

10

Simple search form

Page 11: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

11

Sample search code

1. <form method='get' action='http://gb-server.mit.edu/search'>2. <input type='text' name='q' size='32' maxlength='255' value=''/>3. <input type='submit' name='btnG' value='Search'/>4. <input type='hidden' name='site' value='mit'/>5. <input type='hidden' name='client' value='mit'/>6. <input type='hidden' name='proxystylesheet‘ value='http://web.mit.edu/xsl/google-mit.xsl'/>7. <input type='hidden' name='output' value='xml_no_dtd'/>8. <input type='hidden' name='as_dt' value='i'/>9. <input type='hidden' name='as_sitesearch' value= 'web.mit.edu/newsoffice'/>10.</form>

Doc

Page 12: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

12

Restrict to one directory tree

• name='as_sitesearch' value='<yoururl>'

use web.mit.edu/newsoffice not web/newsoffice

• The slash / matters  

web.mit.edu/newsoffice to include sub-directories

web.mit.edu/newsoffice/ to exclude sub-directories

• as_sitesearch allows allows you to specify one directory (and all its

sub-directories) as the domain to be searched—you cannot specify

multiple disparate directories using this option

• If you want the search feature on your site to search the entire MIT web

site, delete this parameter.

Doc

Page 13: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

13

Restrict to multiple directories or servers

Doc

• Contact [email protected] and we will create a subcollection for you.

• A subcollection is a list of URL patterns that can be referred to by a single name, such as "Library".

Page 14: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

14

Advanced search example

Page 15: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

15

Gooogle Custom Results

You can customize the look and feel of

Google’s search results by providing a stylesheet.

Page 16: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

16

Site-wide MIT template

Page 17: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

17

IS&T custom results

Page 18: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

18

IS&T Search

Page 19: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

19IS&T Custom Results

Page 20: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

20

Customizing results

• You provide the header and footer (HTML) wrapper, and any desired content formatting

• Google provides the raw data (XML)

GoogleResults Data

Your HTMLheader/footer

Page 21: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

21

Results content “title” only

Page 22: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

22

How customization works• The form points to an XSLT stylesheet

• Google returns results to query in XML

• An XSLT document translates the XML into your custom HTML

MIT-GoogleIndex

MIT-GoogleIndex

MIT-GoogleIndex

MIT-GoogleIndex

Search Query

<XML/>

Search Results

<XSLT>

Stylesheet

+HTMLResults

=

Page 23: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

23

Notes

• It is not necessary to customize the results.

– You can place a search form on your site, and Google will use the site-wide MIT XSLT stylesheet.

• Updates to the Google service may require you to make changes in your stylesheet.

– Subscribe to [email protected]

• WCS will provide fee-based production services for custom search results.

Page 24: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

24

How to customize the results

• Plan how you want the results to look

• Copy the MIT Google XSLT stylesheet

http://web.mit.edu/xsl/google-mit.xsl

• Save it to web readable space, naming it

google-mysite.xsl

Page 25: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

25

Point to your XSL

<form method='get' action='http://gb-server.mit.edu/search'><input type='text' name='q' size='32' maxlength='255' value=''/><input type='submit' name='btnG' value='Search'/><input type='hidden' name='site' value='mit'/><input type='hidden' name='client' value='mit'/><input type='hidden' name='proxystylesheet' value='http://web.mit.edu/my_dept/google-mydept.xsl'/><input type='hidden' name='output' value='xml_no_dtd'/></form>

• Update your search form to point the MIT-Google server to your custom XSLT style sheet.

Page 26: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

26

Step-by-step customization

See

http://web.mit.edu/ist/google/stylesheets.html

Page 27: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

27

Documentation

• http://web.mit.edu/ist/google/

(Includes the “official” Google documentation, including their XML specification; also XSLT tips.)

• Search Engine Submission Tips http://searchenginewatch.com/webmasters/Using SS for an

• Effective SEO Campaignhttp://www.alistapart.com/articles/seo/

Page 28: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

28

Support

• The MIT Google team will support your creating a Google search form and answer queries sent to [email protected]

• WCS offers fee-based production services for custom search results

HTMLResults

Page 29: 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web site 3.Customizing search results 4.Tips on improving.

29

Q&A