Open Search Server documentation

16
Open Search Server documentation PRELIMINARY DRAFT Emmanuel Keller Author Emmanuel Gosse Author Sebastien Andrivet Translator, proofreader InfoPro Digital 12-14, rue Mederic Paris France http://www.open-search-server.com © InfoPro Digital 2009

description

Open Search Server (OSS) is a search engine software developed under the GPL v3 open source licence.Built using the best open source technologies available, Open Search Server is a stable, high-performance piece of software. It is both a modern search engine and a suite of high-powered full text search algorithms.

Transcript of Open Search Server documentation

Page 1: Open Search Server documentation

Open Search Server documentation

PRELIMINARY DRAFT

Emmanuel Keller Author Emmanuel Gosse AuthorSebastien Andrivet Translator, proofreader

InfoPro Digital12-14, rue Mederic Paris France

http://www.open-search-server.com© InfoPro Digital 2009

Page 2: Open Search Server documentation

2 | Open Search Server | Introduction

Page 3: Open Search Server documentation

Quick start

Installing the JDK SoftwareOpen Search Server (OSS) requires a Java™ runtime environment (JRE) version 5 or newer.

1. Download the JDK software from either Sun Microsystems or IBM.

• Sun Microsystems provides its JRE (or JDK) for the Windows™ ,Linux and Solaris™ operating systems: Sundownload page .

• IBM® provides its JRE for the AIX™ and Linux operating systems: IBM developer kit2. Select an appropriate JRE/JDK version and download it.3. Install the JRE/JDK using the installation instructions.

Setting up then environment variables on a Windows™ SystemOn Windows, the only thing to do is to add an environment variable named JAVA_HOME.

1. Right click My Computer2. Select Properties3. Select the Advanced tab4. Select Environment Variables5. Edit or create a new entry named JAVA_HOME.6. JAVA_HOME must point toward the JDK software, for example: C:\Program Files\Java

\jdk1.6.0_14

Setting up environment variables on an UNIX SystemYou have to define the JAVA_HOME environment variable.

1. Set JAVA_HOMEReplace [jdk-path] by the location of you JDK. For example: /usr/jdk/jdk1.6.0_14

• Korn or bash shells: export JAVA_HOME=[jdk-path]• If you are using a Bourne shell:

JAVA_HOME=[jdk-path]export JAVA_HOME

• If you are using a C shell: set env JAVA_HOME [jdk-path]2. Set PATH

• Korn or bash shells: export PATH=$JAVA_HOME/bin:$PATH• Bourne shell:

PATH=$JAVA_HOME/bin:$PATHexport PATH

• C shell: set env PATH $JAVA_HOME/bin:$PATH

Downloading Open Search ServerDownload the appropriate package file for your environment.

1. Go to the download pages: http://sourceforge.net/project/showfiles.php?group_id=2608632. Select the release version you need. Usually you will be offered the following options:

Page 4: Open Search Server documentation

4 | Open Search Server | Quick start

• Beta : the beta version. Lastest stage of development cycle.• Stable / Release: Stable releases, intended for production use.• Unstable / Alpha: Usually the lastest trunk version.

3. Choose the appropriate file / archive:

Options Description

documentation.pdf The documentation you are reading now, inPDF format.

open-search-server-XXX.zip Open Search Server archive in ZIP format.

open-search-server-XXX.tar.gz Open Search Server archive in tar.gz format.

open-search-server-XXX.war You can use the war file if you want todeploy it manually on an application server.

4. The download process should start immediately after you click on the name of the file.

Extracting the open-search-server folder from the archiveUncompress and/or unarchive the package file using your favorite tool.

Use your favorite tool to uncompress the archive and extract the open-search-server folder.• Windows / Mac: double clicking on the archive will usually decompress it and extract the folder.• ZIP archive on Unix system: You can use the unzip command line utility, for example: unzip open-

search-server-XXX.zip• TAR.GZ archive on Unix: You can use the tar command line utility, for example: tar -zxvf open-

search-server-XXX.tar.gz

Launching Open Search ServerStart the server by executing the start batch file.

Start the server by executing the start batch file.• On Windows, run the file start.bat as a command.

Page 5: Open Search Server documentation

Open Search Server | Quick start | 5

• On Unix/Linux/Mac OS, open a shell, and execute start.sh.

The server is running, and will now start listening to the tcp port 8080.

Displaying the web interfaceOpen a compatible web browser (Internet Explorer, Firefox/Mozilla, Safari), then enter an url matching your server.

1. Open you favorite web browser.

2. Enter an url matching your server

• If the server runs on your desktop machine, you can use: http://localhost:8080• If OSS runs on a remote server, you should build the appropriate URL, like this: http://[server-

hostname]:8080

Setting up the index directoryYou must provide a path to the directory where you want to store the index data. We recommend that you start withthe web_crawler folder provided in the examples folder.

Enter the absolute path of the index directory.

• On Unix/Linux/Mac systems, enter the absolute path, for example: /home/me/open-search-server/examples/web_crawler

• On Windows systems, enter a Windows UNC pathname, for example: \\ComputerName\SomeFolder\open-search-server\examples\web_crawler

Entering the URL of the web site to be crawledThe pattern list lets you decide which URL will be crawled. Only URLs that match these patterns will be indexed.

Page 6: Open Search Server documentation

6 | Open Search Server | Quick start

1. Select the Crawler panel.2. Then, select the Web sub-panel.3. Finally, select the Pattern list sub-panel.4. Enter, for example, http://www.open-search-server.com*5. Click on the Add button.

Starting the crawl process.The crawl process will download and index the url(s) you inserted in the patterns list.

1. Select the Crawl process sub-panel.2. Click on the Not running - Click to start button.3. Later, you can click on the same button to stop the crawl.

Page 7: Open Search Server documentation

Open Search Server | Quick start | 7

Querying the indexYou can use the web interface to query the data in your index.

1. Select the Query panel.

2. Load the predefined search query template.

3. Enter a word in the field named Enter the query , for example: open

4. Click on the Search button

Testing the XML APITry the same request using the XML API to get an XML result. Open a new web browser with the following url:

Page 8: Open Search Server documentation

8 | Open Search Server | Quick start

1. Open a new window on your web browser

2. Enter the following url: http://localhost:8080/select?qt=search&q=open

Page 9: Open Search Server documentation

API Search / Select

API Search/Select is the interface to query the OSS search engine. The call is sent through a HTTP request. POST ORGET are both available. The engine will answer with a XML result.Url callBasic relative url is : /selectExamplehttp://localhost:8080/OpenSearchServer/select?q=test&qt=searchParameters

Note: Parameters have to be encoded in UTF-8.

Name Description Type Default value Needed?

q Searches forkeywords. Ex:q=try

Text yes (ou query)

query Same asparameterq. Ex.:query=try

Text yes (ou q)

qt Enables you topre-load a setquery in indexconfigurationfile config.xml.Ex.:qt=requestName

Text no

start Indicates thefirst result'srank shown.This parameterallows for apagination.Ex.:start=10

Number 0 no

rows Indicates thenumber ofrecords tobe returned.Associatedwith the 'start'parameter,This parameterallows for apagination.Ex.: rows=5

Number 10 no

lang Indicates thelanguage ofthe keywordspassed to

Text no

Page 10: Open Search Server documentation

10 | Open Search Server | API Search / Select

Name Description Type Default value Needed?

parameter q.The enginewill use thematchinganalyzer. Ex.:lang=fr

collapse.mode Choosecollapsingmethod. Ex.:collapse.mode=optimized

[off|optimized|full]

no

collapse.field Activecollapsingon the fieldpassed as aparameter. Ex.:collapse.field=hostname

field's name no

collapse.max Indicates thenumber ofdocuments tosend beforecollapsingactivation. Ex.:collapse.max=2

Number 2 no

delete If thisparameter ispassed, thedocumentsreturned bythe query areremoved. Ex.:&delete

no

noCache Disables thecache (for thecurrent callonly). Ex.:&noCache

no

debug Enablesthe debuginformation inthe result. Ex.:&debug

no

fq Adds a filter tothe current call.The parameterscan be usedseveral timesin the same callfor successivefilters. Ex.:fq=date:20101201&fq=color:red

Text no

Page 11: Open Search Server documentation

Open Search Server | API Search / Select | 11

Name Description Type Default value Needed?

rf Adds one ormore fieldsto send. Ex.:&rf=date&rf=color

Text (field'sname)

no

fl Same asparameter rf

Text no

sort Controlsresults order.Using theabbreviation+ ou - to sortby ascendingor descendingorder. Ex.:&sort=-date&sort=color

Text no

facet Enablesfaceting for thefield passed asa parameter.You can adda number inparenthesisto specifythe minimumcount. Ex.:&facet=colorou&facet=color(2)

Text(Number) no

facet.multi Same asparameterfacet, for usewith fieldscontainingmultiple values(multi-valuedfields). Ex.:&facet.multi=colorou&facet.multi=color(2)

Text(Number) no

XML result

Note: The answer is in XML format encoded in UTF-8.

Page 12: Open Search Server documentation

12 | Open Search Server | API Search / Select

Page 13: Open Search Server documentation

War deployment guide

This first version of the installation guide demonstrates that it takes few minutes to have a OSS server running andready to be used.1. Install Apache Tomcat or another JAVA server: This installation guide assumes that it is installed. Please refer to

standard installation procedures at the corresponding website. http://tomcat.apache.org/index.html Version 5 ornewer available.

2. Deploy the OSS war file: Put oss.war in 'tomcat/webapps' tomcat directory. Rename it as you want (but keep 'war'extension !). Ex. : oss.war

3. Configuration of war in Tomcat: In 'tomcat/conf/Catalina/localhost/' path, create a xml file named as same as youhave named your war at the step 2.1 (keep 'xml' extension !).Example : oss.xml

<Context docbase="oss.war" debug="0" crossContext="true"> <Environment name="JaeksoftSearchServer/configfile" type="java.lang.String" value="/mnt/all_oss/oss1/config.xml" override="true" /> </Context>

4. Configuration of the physical index: In any folder where you would like to put it (no special needs), use '/mnt/all_oss/', create the place you want to have your physical index at. For instance oss1 ( to match the previous steps).a) put the file config.xml in. (don't change its name !). You can observe that oss.xml refers to it.b) create a single folder named 'index' in oss1, At server start, empty index files will automatically be added

inside it.Example of a basic config.xml:

<configuration> <indices> <index name="index" searchCache="100" filterCache="100" fieldCache="500" /> </indices> <schema> <analyzers> <analyzer name="StandardAnalyzer" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> </analyzer> <analyzer name="TextAnalyzer" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="en" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="SnowballEnglishFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="fr" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="FrenchStemFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="de" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballGermanFilter" />

Page 14: Open Search Server documentation

14 | Open Search Server | War deployment guide

</analyzer> <analyzer name="TextAnalyzer" lang="nl" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="DutchStemFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="es" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballSpanishFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="it" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballItalianFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="pt" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballPortugueseFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="no" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballNorwegianFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="se" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballSwedishFilter" /> </analyzer> <analyzer name="TextAnalyzer" lang="fi" tokenizer="LetterOrDigitTokenizerFactory"> <filter class="LowerCaseFilter" /> <filter class="ISOLatin1AccentFilter" /> <filter class="SnowballFinnishFilter" /> </analyzer> </analyzers> <fields default="content" unique="url"> <field name="lang" indexed="yes" stored="yes" /> <field name="title" analyzer="TextAnalyzer" indexed="yes" stored="compress" termVector="positions_offsets" /> <field name="titleExact" analyzer="StandardAnalyzer" indexed="yes" stored="compress" termVector="positions_offsets" /> <field name="content" analyzer="TextAnalyzer" indexed="yes" stored="compress" termVector="positions_offsets" /> <field name="contentExact" analyzer="StandardAnalyzer" indexed="yes" stored="compress" termVector="positions_offsets" /> <field name="contentBaseType" indexed="yes" stored="yes" /> <field name="url" indexed="yes" stored="yes" /> <field name="urlSplit" indexed="yes" stored="no" analyzer="TextAnalyzer" termVector="positions_offsets" /> <field name="urlExact" indexed="yes" stored="no" analyzer="StandardAnalyzer" termVector="positions_offsets" /> <field name="metaDescription" indexed="no" stored="compress" /> <field name="metaKeywords" indexed="no" stored="compress" /> <field name="host" indexed="yes" stored="yes" />

Page 15: Open Search Server documentation

Open Search Server | War deployment guide | 15

</fields> </schema> <parsers> <parser class="com.jaeksoft.searchlib.parser.HtmlParser" sizeLimit="8388608"> <contentType>text/html</contentType> </parser> <parser class="com.jaeksoft.searchlib.parser.PdfParser" sizeLimit="8388608"> <contentType>application/pdf</contentType> </parser> <parser class="com.jaeksoft.searchlib.parser.DocParser" sizeLimit="8388608"> <contentType>application/msword</contentType> </parser> <parser class="com.jaeksoft.searchlib.parser.PptParser" sizeLimit="8388608"> <contentType>application/vnd.ms-powerpoint</contentType> </parser> </parsers></configuration>

Page 16: Open Search Server documentation

16 | Open Search Server | War deployment guide