Integrating Google Search Appliance with Mura CMS

Post on 16-Jan-2015

727 views 0 download

Tags:

description

An overview of integrating Google Search Appliance with Mura CMS. Presented at MuraCon 2012 by Ajay Sathuluri.

Transcript of Integrating Google Search Appliance with Mura CMS

Integrating Google Search Appliance

with Mura CMS

Ajay Sathuluri@sathuluri

Ajay Sathuluri Sr. Architect at ICF International Using ColdFusion since ’98 Server Tuning, Administration, Load Testing I like spending time with my kids and wife.

About Me

Google Search Appliance Configuring a Crawl Control Access to Content Configuring Database Crawl Collections / Front Ends Crawl Diagnostics

Configuring GSA with Mura CMS Plugin (FW/1) Search Search Results

What are we covering?

Google Search Appliance - Home

Before starting a crawl, you must configure the crawl path so that it only includes information that you wants to make available in search results.

Use the Crawl and Index > Crawl URLs page in the Admin Console to enter URLs

URLs are case-sensitive. Configure your network to disallow search appliance

connectivity outside of your intranet.

Configuring a Crawl

Google Search Appliance – Crawl URL

Demo

Configuring a Crawl

robot.txt meta tag no-crawl Directories

Control Access to Content

robot.txt The Google Search Appliance always obeys the rules in

robots.txt and it is not possible to override this feature. robots.txt file is not mandatory. It is located in the Web server's root directory. For the search appliance to be able to access the

robot.txt file, the file must be public. Includes one or more Disallow: or Allow: User-agent: gsa-crawler Disallow: /personal_records/ Disallow: /admin/ Allow: / Allow: /personal_records/mypersonal.doc

Control Access to Content (2)

meta tag Prevent the search appliance crawler (as well as

other crawlers) from indexing or following links in a specific HTML page.

Embed a robots meta tag in the head of the HTML page.

The search appliance crawler obeys the index, noindex, follow, and nofollow in meta tags.

<meta name="robots" content="index, nofollow"><meta name="robots" content="noindex, nofollow">

Control Access to Content (3)

no-crawl Directories The Google Search Appliance does not crawl any

directories named "no_crawl." You can prevent the search appliance from crawling files and directories by: Creating a directory called "no_crawl."

Putting the files and subdirectories you do not want crawled under the no_crawl directory.

Control Access to Content (4)

Database data source information enables the search appliance to access content stored in a database.

To configure a database crawl, provide database data source information.

Crawl and Index > Databases page in the Admin Console.

After you create a new database data source, click the Sync link to start a database crawl.

Configuring Database Crawl

Google Search Appliance – Databases

A collection lets you search over a specific part of the index.

For example, you may want to create a products collection or a faq collection that supports searches that are only within the products or faqs part of your index.

Maximum number of collections for a search appliance is 200.

Use the Crawl and Index > Collections - In the Collection Name text box, type a name for the new collection.

Manage collection by Editing a Collection Exporting and Importing a Collection Configuration Deleting a Collection

Collections

Google Search Appliance – Collections

A front end enables you to change the look and feel of the search and search result pages your users access.

You can customize these pages to display your organization's colors, fonts, and design. If you have multiple collections, you can make each front end appear in a different format, and have its own configuration options.

Use the Serving > Front Ends - In the Front End Name field, enter a name for the new front end.

Manage Front End by Editing a Front End Deleting a Front End

Front Ends

Google Search Appliance – Front Ends

Crawl diagnostics provide detailed information about appliance crawl status for a domain, host, directory, or URL.

Crawl Diagnostics

Google Search Appliance - Crawl Diagnostics

Google Search Appliance – Secret Recipe

"The appliance uses a sophisticated algorithm to generate the results

bla… bla ..."

Deploy Mura Plugin

Mura – Plugin

Search Code

GSA Plugin - Search

Search results code

GSA Plugin - Results

DEMO

GSA Plugin – DEMO

Google Search Appliance – Secret Recipe

http://docs.getmura.com/ http://www.getmura.com/marketplace/apps/

fw1-plugin-template/ https://developers.google.com/search-

appliance/documentation/614/ https://developers.google.com/search-

appliance/documentation/614/xml_reference http://www.robotstxt.org/meta.html http://muracms.com/forum/

Resources

Thanks to Oğuz Demirkapi for helping to prepare the presentation.

Acknowledgements

Q & A

?