1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web...
-
Upload
cynthia-robertson -
Category
Documents
-
view
221 -
download
0
Transcript of 1. 2 Google Session 1.About MIT’s Google Search Appliance (GSA) 2.Adding Google search to your web...
2
Google Session
1. About MIT’s Google Search Appliance (GSA)
2. Adding Google search to your web site
3. Customizing search results
4. Tips on improving a site’s rankings
5. Q&A – actually, ask questions anytime!
3
MIT's Google Configuration
• MIT license is for 3M documents
• Two collections of 1.5M documents each
• MIT has over 1M web pages on 1,000 web servers
• Google follows links from the MIT Home Page
• web.mit.edu – crawled three times a week
• Other MIT web servers – crawled twice a week
4
MIT Google does
• Performs twice as well as Inktomi in a “blind test”
• Indexes 220 different file formats
• Provides control over our own crawling schedule
• Allows user customization of search results format
• Indexes certificate-restricted content(not implemented yet)
5
MIT Google does NOT
• Cache old pages
• Index image files (our decision)
• Index image ALT tags (Google’s decision)
• Allow us to fiddle with the relevancy algorithm
• Tell you “who’s linking to my page” because the GSA does not share that information across collections.
When your pages move, we recommend using a 301 redirect.
6
MIT Google does NOT index
Java, Perl, Python documentation
Debian, GNU/Linux mirrors
URLs containing these strings:
sipb.mit.edu
dev.mit.edu
net.mit.edu
lees.mit.edu
ops.mit.edu
classics.mit.edu
hypermail
pipermail
Certificate protected pages
No robots sites, no index pages
Dynamically generated pages
containing ‘?’ except by request
URLs containing cgi-bin
URLs containing /afs/
7
Telling Google not to index
• No robots in server
• No robots in locker/directory
• No robots in html file
• No index, follow
8
Avg. daily views - January 2005
0
5000
10000
15000
20000
25000
30000
5:00
7:00
9:00
11:0
013
:00
15:0
017
:00
19:0
021
:00
23:0
01:
003:
00
Series1
Total queries Jan 1 - 26: 340,656
9
Gooogle search forms
10
Simple search form
11
Sample search code
1. <form method='get' action='http://gb-server.mit.edu/search'>2. <input type='text' name='q' size='32' maxlength='255' value=''/>3. <input type='submit' name='btnG' value='Search'/>4. <input type='hidden' name='site' value='mit'/>5. <input type='hidden' name='client' value='mit'/>6. <input type='hidden' name='proxystylesheet‘ value='http://web.mit.edu/xsl/google-mit.xsl'/>7. <input type='hidden' name='output' value='xml_no_dtd'/>8. <input type='hidden' name='as_dt' value='i'/>9. <input type='hidden' name='as_sitesearch' value= 'web.mit.edu/newsoffice'/>10.</form>
Doc
12
Restrict to one directory tree
• name='as_sitesearch' value='<yoururl>'
use web.mit.edu/newsoffice not web/newsoffice
• The slash / matters
web.mit.edu/newsoffice to include sub-directories
web.mit.edu/newsoffice/ to exclude sub-directories
• as_sitesearch allows allows you to specify one directory (and all its
sub-directories) as the domain to be searched—you cannot specify
multiple disparate directories using this option
• If you want the search feature on your site to search the entire MIT web
site, delete this parameter.
Doc
13
Restrict to multiple directories or servers
Doc
• Contact [email protected] and we will create a subcollection for you.
• A subcollection is a list of URL patterns that can be referred to by a single name, such as "Library".
14
Advanced search example
15
Gooogle Custom Results
You can customize the look and feel of
Google’s search results by providing a stylesheet.
16
Site-wide MIT template
17
IS&T custom results
18
IS&T Search
19IS&T Custom Results
20
Customizing results
• You provide the header and footer (HTML) wrapper, and any desired content formatting
• Google provides the raw data (XML)
GoogleResults Data
Your HTMLheader/footer
21
Results content “title” only
22
How customization works• The form points to an XSLT stylesheet
• Google returns results to query in XML
• An XSLT document translates the XML into your custom HTML
MIT-GoogleIndex
MIT-GoogleIndex
MIT-GoogleIndex
MIT-GoogleIndex
Search Query
<XML/>
Search Results
<XSLT>
Stylesheet
+HTMLResults
=
23
Notes
• It is not necessary to customize the results.
– You can place a search form on your site, and Google will use the site-wide MIT XSLT stylesheet.
• Updates to the Google service may require you to make changes in your stylesheet.
– Subscribe to [email protected]
• WCS will provide fee-based production services for custom search results.
24
How to customize the results
• Plan how you want the results to look
• Copy the MIT Google XSLT stylesheet
http://web.mit.edu/xsl/google-mit.xsl
• Save it to web readable space, naming it
google-mysite.xsl
25
Point to your XSL
<form method='get' action='http://gb-server.mit.edu/search'><input type='text' name='q' size='32' maxlength='255' value=''/><input type='submit' name='btnG' value='Search'/><input type='hidden' name='site' value='mit'/><input type='hidden' name='client' value='mit'/><input type='hidden' name='proxystylesheet' value='http://web.mit.edu/my_dept/google-mydept.xsl'/><input type='hidden' name='output' value='xml_no_dtd'/></form>
• Update your search form to point the MIT-Google server to your custom XSLT style sheet.
26
Step-by-step customization
See
http://web.mit.edu/ist/google/stylesheets.html
27
Documentation
• http://web.mit.edu/ist/google/
(Includes the “official” Google documentation, including their XML specification; also XSLT tips.)
• Search Engine Submission Tips http://searchenginewatch.com/webmasters/Using SS for an
• Effective SEO Campaignhttp://www.alistapart.com/articles/seo/
28
Support
• The MIT Google team will support your creating a Google search form and answer queries sent to [email protected]
• WCS offers fee-based production services for custom search results
HTMLResults
29
Q&A