Apache Solr + ajax solr

Post on 18-Jun-2015

674 views 7 download

Tags:

description

Apache Solr and ajax-solr overview

Transcript of Apache Solr + ajax solr

+ ajax-solr

Solr (pronounced "solar") is an open source enterprise search platform from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. [...] Solr is the most popular enterprise search engine. Solr 4 adds NoSQL features.

What is Solr (1)

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Apache Tomcat or Jetty. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it usable from most popular programming languages.

(source: wikipedia)

What is Solr (2)

Installing Solr (speedrun)

# tar zxvf solr-4.10.0.tgz

# mv solr-4.10.0 /opt/

Solr comes already configured, for fine tuning the Solr fields, modify the file

/opt/solr-4.10.0/example/solr/collection1/conf/schema.xml

Configuring Solr

Running Solr - JettySolr includes a configured jetty installation, to run it:

# cd /opt/solr-4.10.0/example/ # java -jar start.jar

Create a Tomcat context file:

/var/lib/tomcat6/conf/Catalina/localhost/mysolr.xml

(better yet, create it somewhere else and link it there)

Running Solr - Tomcat (1)

Context file content:<?xml version="1.0" encoding="utf-8"?><Context docBase="/opt/solr-4.10.0/example/webapps/solr.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="/opt/solr-4.10.0/example/solr" override="true"/></Context>

Running Solr - Tomcat (2)

Tomcat: find the app “mysolr” in the manager webapp

Jetty: http://localhost:8983/solr/

Solr Admin interface

Untar the solr-4.10.0.tgz files multiple times, and rename the directories.

# tar zxvf solr-4.10.0.tgz # cp -a solr-4.10.0 /opt/mysolr1 # cp -a solr-4.10.0 /opt/mysolr2

Multiple Solr instances (1)

Create multiple context files with different names, each of them must point to a different solr installation.

/var/lib/tomcat6/conf/Catalina/localhost/solrApp1.xml/var/lib/tomcat6/conf/Catalina/localhost/solrApp2.xml

Multiple Solr instances (2)

Change the context files content, to point to the right solr installation. File solrApp1.xml:<?xml version="1.0" encoding="utf-8"?><Context docBase="/opt/mysolr1/example/webapps/solr.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="/opt/mysolr1/example/solr" override="true"/></Context>

Multiple Solr instances (3)

Change the context files content, to point to the right solr installation. File solrApp2.xml:<?xml version="1.0" encoding="utf-8"?><Context docBase="/opt/mysolr2/example/webapps/solr.war" debug="0" crossContext="true"> <Environment name="solr/home" type="java.lang.String" value="/opt/mysolr2/example/solr" override="true"/></Context>

Multiple Solr instances (4)

The Solr schema must have an unique field, identified in the schema.xml file by something like: <uniqueKey>id</uniqueKey>

where id is the field defined by: <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />

Solr Schema (1)

For most of the other fields of the indexed resources, you will use the Solr dynamic fields defined in the scheme like the following ones:

<dynamicField name="*_i" type="int" indexed="true" stored="true"/> <dynamicField name="*_is" type="int" indexed="true" stored="true" multiValued="true"/>

Solr Schema (2)

Solr field types are defined in the schema too:

<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>

Solr Schema (3)

Text types can have some magic stuff:<!-- A text field that only splits on whitespace for exact matching of words --><fieldType name="text_ws" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> </analyzer></fieldType>

Solr Schema (4)

<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter [...]/> <!-- +other filters --> </analyzer> <analyzer type="query"> <tokenizer [...]/> <filter [...]/> <!-- +other filters --> </analyzer></fieldType>

Solr Schema (5)

Analyzers are components that pre-process input text at index time and/or at search time. It's important to use the same or similar analyzers that process text in a compatible manner at index and query time. For example, if an indexing analyzer lowercases words, then the query analyzer should do the same to enable finding the indexed words.(from https://wiki.apache.org/solr/)

Solr Schema (6)

Tokenizer examples:● <tokenizer

class="solr.WhitespaceTokenizerFactory"/>● <tokenizer

class="solr.StandardTokenizerFactory"/>● <tokenizer

class="solr.LetterTokenizerFactory"/>● <tokenizer

class="solr.LowerCaseTokenizerFactory"/>

Solr Schema (7)

Filter examples:● <filter class="solr.LowerCaseFilterFactory"/>● <filter

class="solr.RemoveDuplicatesTokenFilterFactory"/>

● <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

● <filter class="solr.GermanNormalizationFilterFactory"/>

● <filter class="solr.GermanLightStemFilterFactory"/>

Solr Schema (8)

Although the schema is already set up, you should really go through it and get rid of all the unnecessary stuff!

Get rid of all the fields and types you don’t need, maintainability will be enhanced, and everything will look much clearer!

Solr Schema (9)

Solr can be indexed in many ways, depending on your application technology.

● In PHP: you can use Solarium, a Solr client library

● In Java: you can use Solrj● Using Data Import Request

Handler

Indexing Solr (1)

Synchronous indexing:

PHP or Java you listen to DataBase insertion, deletion and update, and call the related Solr APIs (by using the libraries, possibly)

Indexing Solr (2)

Synchronous indexing, pitfalls:

What if Solr is down? - find a way to be sure the sync is done!

Indexing Solr (3)

AJAX Solr loosely follows the Model-view-controller pattern. The ParameterStore is the model, storing the Solr parameters and, thus, the state of the application. The Manager is the controller; it talks to the ParameterStore, sends requests to Solr, and delegates the response to the widgets for rendering. The widgets are the views, each rendering a part of the interface.

Ajax-Solr (1)

AJAX Solr it’s a javascript application.It offers an autocomplete feature searching in multiple fields (reported in the result list), faceted search based tagcloud, result display

Ajax-Solr (2)

Download zip from https://github.com/evolvingweb/ajax-solr/

unzip:# unzip ajax-solr-master.zip

Ajax-Solr - Deployment (1)

Use /examples/reuters-requirejs/ as a starting point for your application.In /examples/reuters-requirejs/js/reuters.js set Solr address

solrUrl: 'http://localhost:8983/solr/',

Ajax-Solr - Deployment (2)

In the same reuters.js set the fields to be used.Just before:Manager.addWidget(new AjaxSolr.TagcloudWidgetSet the fields var with the fields you want to use in the tag clouds, e.g.

var fields = [‘notebook_label_s’,’user_s’];

Ajax-Solr - Deployment (3)

Now we set parameters for the facet (used in the tag clouds): var params = { facet: true, 'facet.field': [ 'notebook_label_s', 'user_s'], 'facet.limit': 20, 'facet.mincount': 1, 'f.notebook_label_s.facet.limit': 50, 'f.user_s.facet.limit': 50,

Ajax-Solr - Deployment (4)

We may add other parameters to the facet query here

'fq': 'basket_id_s:' +basket_id

or

'fq': 'my_field_i:8'

Ajax-Solr - Deployment (5)

In the Manager.addWidget(new AjaxSolr.AutocompleteWidget({

set the fields to be used for the autocomplete :var fields = [‘notebook_label_s’,’predicate_label_s’];

Ajax-Solr - Deployment (6)

In the index.html example page everything is already set up. You’ll just need to customize the tagclouds, by adding/modifying the tagcloud tags, like

<h2>Notebooks</h2><div class="tagcloud" id="notebook_label_s"> </div> <h2>Users</h2><div class="tagcloud" id="user_s"> </div>

Ajax-Solr - Deployment (7)

One last thing you may want to do, is customize the results output, in the /examples/reuters-requirejs/widgets/ResultWidget.js file, modify the content of:

template: function (doc) {[...]}

Ajax-Solr - Deployment (8)

Solr: http://lucene.apache.org/solr/Ajax-Solr: https://github.com/evolvingweb/ajax-solr/Solarium: http://www.solarium-project.org/Solr wiki: http://wiki.apache.org/solr/

Resources