Dev8d Apache Solr Tutorial
-
Upload
sourcesense -
Category
Technology
-
view
4.169 -
download
0
Transcript of Dev8d Apache Solr Tutorial
Small wins In a small time with Apache Solr
= Upayavira =
Who am I?
My (Buddhist) name is Upayavira
Consultant with Sourcesense, specialising in search and operational technologies
A member of the Apache Software Foundation
Who are Sourcesense?
Open Source integrator, specialising in:
Search
Business Intelligence
Content Management
Application Lifecycle Management
Offices in London, Amsterdam, Milan and Rome
Committers and ContributorsSearch:
Lucene/Solr – contributor
Hibernate Search – committer
Lucene Infinispan integration – lead developer
Apache UIMA – committer
CMS:
Apache Chemistry – contributor
Apache Jackrabbit – contributor
JBoss GateIn Portal – committer
OpenSSO-Alfresco - contributor
What is Lucene?
Lucene is a Java information retrieval library
Provides free text search facilities
Started in 2000, by Doug Cutting
A project of the Apache Software Foundation
It is designed to be embedded in Java apps
What is Solr?
Solr is an enterprise search server based on Lucene
Wraps Lucene with a RESTful web interface
Provides configurable schema
Provides replication functionality
Solr Design
Solrinstance
UpdateRequestHandler
SearchHandler
User queries
Luceneindex
contentapplication
Prerequisites
Java, preferably Java 6
Latest Apache Solr, currently 3.3
http://www.sourcesense.com/dev8d-solr.zip
PrerequisitesExtract your Solr distribution
At a command prompt:
cd into the unzipped distribution directory
cd into the example directory
Enter: java -jar start.jar
Visit http://localhost:8983/solr/ in a browser. If you see a welcome message, your Solr works
Unpack your dev8d-solr.zip file
At another command prompt, cd into your dev8d-solr directory
Checking Solr Works
Visit http://localhost:8983/solr/admin/
You should see the Solr admin page.
Click statistics link
You'll see NumDocs: 0
There's nothing in the index, so searches won't show much
So we need to index some sample content
Indexing Sample Content
In your dev8d-solr directory (extracted from the zip), at a command prompt:
Java -jar post.jar wikipedia-basic.xml
Searching
http://localhost:8983/solr/select?q=*:*
Searching
http://localhost:8983/solr/select?q=computers
Searching
http://localhost:8983/solr/select?q=computer systems
Searching
http://localhost:8983/solr/select?q=computers OR systems
Searching
http://localhost:8983/solr/select?q=computers AND systems
Searching
http://localhost:8983/solr/select?q="computer systems"
Searching
http://localhost:8983/solr/select?q="computer systems"~10
Searching
http://localhost:8983/solr/select?q=computers NOT data
Searching
http://localhost:8983/solr/select?q=computers -data
Searching
http://localhost:8983/solr/select/?q=computers&fl=title
Searching
http://localhost:8983/solr/select/?q=computers&fq=author:yobot
Searching
http://localhost:8983/solr/select/? q=computers&fq=author:yobot&fl=title,author
Searching
http://localhost:8983/solr/select/?q=computers&rows=10&start=10&fl=title
Searching
http://localhost:8983/solr/select/?q=title:system&fl=title
Searching
http://localhost:8983/solr/select/?q=computers&fl=title,author&sort=author+desc
Advanced Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author
Advanced Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=lex
Advanced Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count
Advanced Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.mincount=2
Advanced Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3
Advanced Searching
http://localhost:8983/solr/select/?q=computers&facet=true&facet.field=author&rows=0&facet.sort=count&facet.limit=3&debugQuery=true
Advanced Searching
http://localhost:8983/solr/select?q=computer&wt=json
Advanced Searching
http://localhost:8983/solr/select?q=computer&wt=javabin
Advanced Searching
http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text
Advanced Searching
Look for list after main responses
Nothing there.
Edit 'text' field in schema.xml, changing it to stored=”true”
Reindex (java -jar post.jar wikipedia-enhanced.xml)
Advanced Searching
http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text
You should now see highlighted content
Advanced Searching
http://localhost:8983/solr/select?q=computer&hl=true&hl.fl=text&hl.simple.pre=<b>&hl.simple.post=</b>
Advanced Searching
Indexing
Indexing
Load wikipedia-basic.xml into a text editor or web browser
Load wikipedia-enhanced.xml into a text editor or browser
Load example/solr/conf/schema.xml into a text editor
Indexing
schema.xml defines field types and fields used in Solr
Equivalent to your database schema in a RDBMS
Indexing
Change this field in schema.xml to be of type “string” and add multiValued=”true” for each.
<field name="category" type="string" indexed="true" stored="true" multiValued="true"/>
Indexing
Now add this to the <fields> section of solrconfig.xml:
<field name="source" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="text_general" type="text_general" indexed="true" stored="true" multiValued="true"/>
Now search for the “text_general” field type definition, further up in the file.
Indexing
At the bottom of solrconfig.xml add the following:
<copyField source="text" dest="text_general"/>
Indexing
In your window where Solr is running, press CTRL+Cto stop Solr, and then restart it with:
java -jar start.jar
Indexing
At your command prompt, in the dev8d directory, execute:
java -jar post.jar wikipedia-enhanced.xml
More Advanced Searching
http://localhost:8983/solr/select?q=computer%20AND%20babbage&facet=true&facet.field=category&facet.mincount=1
More Advanced Searching
http://localhost:8983/solr/terms?terms.fl=text&terms=true&terms.limit=20
More Advanced Searching
http://localhost:8983/solr/terms?terms.fl=text_general&terms=true&terms.limit=20
More Advanced Searching
http://localhost:8983/solr/terms?terms.fl=text_general&terms=true&terms.limit=20&terms.prefix=at
Indexing
Index segmentation: merge factor
Index optimisation: <optimize/>
schema.xml
Equivalent to RDBMS schema
Seen it before!
Let's look through it in more detail...
solrconfig.xml
Configures the components available to a Solr system
Specific to a Solr 'core', as is schema.xml
In same directory as schema.xml
Let's look through it in more detail...
Hints and Tips
Hints and Tips: Prototyping
Velocity response writer (/browse)
Data Import Handler (DIH)
XSLTUpdateRequestHandler (Solr 3.4)
Hints and Tips: Architecture
A RESTful service
An index, not a data store: keep ability to re-index
Don't make Solr do things you wouldn't have MySQL do
Hints and Tips: Security
There is none
So use a firewall
Beware what Solr internals you expose:
Query syntax
qt= parameter (e.g. qt=update)
Fake document level security with role fields and filter queries
Hints and Tips: Scaling
Index too large: distributed search
Too much traffic: replicated search
How much is too much: unanswerable!
Time for Questions
And your questions are...
thank [email protected]
Solr Host Configuration
shard 1
shard 2
shard 3
searches
Solr Host Configuration
shard 1
shard 2
shard 3
co-ordinator
Solr Host Configuration
shard 1
shard 2
shard 3
co-ordinator
load balancer
Solr Host Configuration
shard 1
shard 2
shard 3
co-ordinator
load balancer
shard 1
shard 2
shard 3
co-ordinator
Solr Host Configuration
shard 1
shard 2
shard 3
co-ordinator
load balancer
shard 1
shard 2
shard 3
co-ordinator