Coffee at DBG- Solr introduction

21
Apache Solr Prepared by Nithin S, Sajin TM Digital Brand Group

description

An event conducted at DBG about Apache Solr as part of Coffee at DBG program.

Transcript of Coffee at DBG- Solr introduction

Page 1: Coffee at DBG- Solr introduction

Apache SolrPrepared by

Nithin S, Sajin TMDigital Brand Group

Page 2: Coffee at DBG- Solr introduction

Apache solr is a search server written in Java using the java search library “lucene”.

Open source Get results using web service as JSON/XML UTF-8 support

Introduction

Page 3: Coffee at DBG- Solr introduction

Ebay Hp Guardian Cisco At&t Intoit Ford http://wiki.apache.org/solr/PublicServers

Who uses Solr?

Page 4: Coffee at DBG- Solr introduction

Text based library in Java Fast , feature rich with active apache

development community Inverted Index mechanism - Index the

content related to the terms/words

What is Lucene?

Page 5: Coffee at DBG- Solr introduction

Server

Solr 4.3.0 Java server containers ( Tomcat/Jetty Servers ) Java 1.6 and above

Client

Any system which can post and get data through http

Requirements

Page 6: Coffee at DBG- Solr introduction

Solr Model

Page 7: Coffee at DBG- Solr introduction

Schema – can consider as a db table

Core - schema container

Collection – multiple core handling

DIH - Data import handler

Request handler - StandardRequestHandler , DisMaxRequestHandler (multiple fields), IndexInfoRequestHandler 

Response handler - xml , json , python,ruby

Common terms

Page 8: Coffee at DBG- Solr introduction

Start Solr java -jar start.jar

This will start up t he Jetty application server on port 8983, and use your terminal to display the logging information from Solr.

Index your data java -jar post.jar *.xml

Interface http://localhost:8983/solr

Start server

Page 9: Coffee at DBG- Solr introduction
Page 10: Coffee at DBG- Solr introduction

The Solr Home directory typically contains the following sub-directories...

conf/ This directory is mandatory and must contain your solrconfig.xml and schema.xml. Any other optional configuration files would also be kept here.

data/ This directory is the default location where Solr will keep your index, and is used by the replication scripts for dealing with snapshots. You can override this location in the conf/solrconfig.xml. Solr will create this directory if it does not already exist.

lib/ This directory is optional. If it exists, Solr will load any Jars found in this directory and use them to resolve any "plugins" specified in your solrconfig.xml or schema.xml (ie: Analyzers, Request Handlers, etc...). Alternatively you can use the <lib> syntax in conf/solrconfig.xml to direct Solr to your plugins. See the example conf/solrconfig.xml file for details.

Basic Directory Structure

Page 11: Coffee at DBG- Solr introduction

solr-php-client Pecl extention for solr

PHP Clients

Page 12: Coffee at DBG- Solr introduction

Structuring Solr schema

Field options

Indexed Stored multiValued compressed

Page 13: Coffee at DBG- Solr introduction

add/update  - allows you to add or update a document to Solr. Additions and updates are not available for searching until a commit takes place.

commit  - tells Solr that all changes made since the last commit should be made available for searching.

optimize  - restructures Lucene's files to improve performance for searching. Optimization is generally good to do when indexing has completed. If there are frequent updates, you should schedule optimization for low-usage times. An index does not need to be optimized to work properly. Optimization can be a time-consuming process. 

delete  - can be specified by id or by query. Delete by id deletes the document with the specified id; delete by query deletes all documents returned by a query.

Indexing options

Page 14: Coffee at DBG- Solr introduction

Supported formats  XML, JSON, CSV, or javabin.Supported document types are Microsoft office docs, PDF’s

curl http://localhost:8983/solr/collection1/update/csv -H Content-type:text/csv; charset=utf-8 --data-binary @D:/Projects/solr-4.3.0/example/exampledocs/books.csv

http://localhost:8983/solr/collection1/update?stream.body=%3Ccommit/%3E

Upload schema data

Page 15: Coffee at DBG- Solr introduction

Query parametersq The query to search with in Solr. See "Lucene QueryParser

Syntax" in Resources for a full description of the syntax. Sorting information can be included by appending a semi-colon and the name of an indexed, non-tokenized field (explained below). The default sort is score desc, which means sort by descending score.

q=myField:Java AND otherField:developerWorks; date ascThis query searches the two fields specified and sorts the results based on a date field.

start Specifies the starting offset into the result set. Useful for paging through results. The default value is 0.

start=15Returns results starting with the fifteenth ranked result.

rows The maximum number of documents to return. The default value is 10.

rows=25

fq Provide an optional filtering query. Results of the query are restricted to searching only those results returned by the filter query. Filtered queries are cached by Solr. They are very useful for improving the speed of complex queries.

Any valid query that could be passed in the q parameter, not including sort information.

hl When hl=true, highlight snippets in the query response. Default is false. See the Solr Wiki section on highlighting parameters for more options (in Resources).

hl=true

fl Specify as a comma-separated list the set of Fields that should be returned in the document results. "*" is the default and means all fields. "score" indicates the score should be returned as well.

*,score

Page 16: Coffee at DBG- Solr introduction

Full text search http://localhost:8983/solr/select?q=Searchtext

Search only within a field http://localhost:8983/solr/select?q=fieldname:searchtext

Control which fields are displayed in result http://localhost:8983/solr/select?q=video&fl=id,category

Provide ranges to fields http://localhost:8983/solr/select?q=price:[0 TO400]&fl=id,name,price

More like this (MLT) http://localhost:8983/solr/select?

q=Searchtext&mlt=true&mlt.fl=headline&mlt.mindf=1&mlt.mintf=1&fl=id,score&rows=100

More information on how this works and the options available can be found at http://wiki.apache.org/solr/MoreLikeThis

Search

Page 17: Coffee at DBG- Solr introduction

Sample search result

Page 18: Coffee at DBG- Solr introduction

Faceted searchhttp://localhost:8983/solr/query?q=camera&facet=true&facet.field=manu

Page 19: Coffee at DBG- Solr introduction

Features Hit Highlight Auto suggest Spell suggestion Spatial search

Page 20: Coffee at DBG- Solr introduction

Removing Data from Indexcurl http://localhost:8983/solr/collection1/update -H "Content-Type: text/xml“ --data-binary “<delete><query>*:*</query></delete>”

Page 21: Coffee at DBG- Solr introduction

Thank you