Apache Solr for TYPO3 CMS 101

25
Apache Solr 101 Killing the Vampires of Search Cluj, 2013 Olivier Dobberkau

description

The TYPO3 Extension EXT:solr adds a fast, precise and extendable modern search the TYPO3 CMS. In this Presentation you will be informed about the current Status of development of the Extension and its Add-Ons. We will give you an overview on common indexing strategies and offer you insights into the best practices for your implementation

Transcript of Apache Solr for TYPO3 CMS 101

Page 1: Apache Solr for TYPO3 CMS 101

Apache Solr 101Killing the Vampires of Search

Cluj, 2013Olivier Dobberkau

Page 2: Apache Solr for TYPO3 CMS 101

Some Vampirology first

● Nosferatu● Dracula● van Helsing● Selene● Edward & Bella

http://en.wikipedia.org/wiki/Vampire_film

Page 3: Apache Solr for TYPO3 CMS 101

Agenda

● About me● History of EXT:solr● Current status● Solr Basics● Caveats● Books & Documents

Page 4: Apache Solr for TYPO3 CMS 101

About me

Olivier DobberkauCEO of dkd Internet Service GmbHResearch and Developmentover 10 years of TYPO3 CMSMember of the T3A [email protected]: T3RevNeverEnd

Page 5: Apache Solr for TYPO3 CMS 101

Scratching ..... the TYPO3 CMS search itch

Page 6: Apache Solr for TYPO3 CMS 101

History of EXT:solr

We all know when a solution fails ...

Page 7: Apache Solr for TYPO3 CMS 101

History of EXT:solr

● Indexed Search gave us some pain● First prototype 2009● What you get in one or two days of work● Started Funding of Development● over 70 Sponsors● Its possible to offer services around it● Support and Consulting available

Page 8: Apache Solr for TYPO3 CMS 101

Current Status

Version 2.8.2 was released November 2012Introduced the Add-ons for additional featuresSupported TYPO3 CMS Versions4.5, 4.6 & 4.7Supported Solr Server3.6.2 (Time flies when you are having fun!)

Page 9: Apache Solr for TYPO3 CMS 101

The last TER Release

TER: 2.8.3

Introduce support for TYPO3 CMS Versions 4.5 - 6.1Loads of bug-fixes

Maintenance Release

Page 10: Apache Solr for TYPO3 CMS 101

Next Major Version

EXT:solr 3.x will be the next versionRelease will be hopefully soon(tm)

Will have no new features on the TYPO3 sideSupport for TYPO3 CMS 4.5 - 6.1Add Apache Solr 4.4 as a Server

Page 11: Apache Solr for TYPO3 CMS 101

Roadmap for EXT:solr 4.x

● Backend parts of the EXT all in Extbase● Templates go FLUID● Frontend goes Extbase● 4.x will be 6.2 only!● Effort estimated 2 to 4 man months

Page 12: Apache Solr for TYPO3 CMS 101

The EXT:solr ecosystem

The base is EXT:solrFeatures are added thru Add-ons● EXT:solrfile (File-Indexing for CMS 4.5 - 4.7)● EXT:solrdam (File-Indexing with DAM)● EXT:solrfal (File-Indexing for CMS 6.1 & 6.2)● EXT:solrmlt (More like this)● EXT:solrgrouping● EXT:tika (Extracting Service)

Page 13: Apache Solr for TYPO3 CMS 101

EXT:solr

So what does it do?● Indexing● Querying● Results Listing● Logging / Analysis

Page 14: Apache Solr for TYPO3 CMS 101

Indexing

● Indexing of pages● Indexing of TCA records● Indexing of Files (Add-On)● Index Queue

○ List of all to be indexed items○ Every time an items is touched/changed an update

is sent to the solr server○ No need for a crawler / instant results

Page 15: Apache Solr for TYPO3 CMS 101

Indexing

● Indexing is very easy and can be achieved thru simple typoscript configuration

● Additionally you can use Apache Nutch to index non TYPO3 websites

● Support for more than 30 Languages

Page 16: Apache Solr for TYPO3 CMS 101

Querying

● Easy to set up● Apply Lucene query language if you want to

search for specific items (only news i.e)● You can tell solr to boost results if query

terms are in the fields you are searching● Use elevation to rank terms● Correct Stemming available● Range queries (Intelligent dates)

Page 17: Apache Solr for TYPO3 CMS 101

Results Listing

● Results can be fully individualized○ Templates for different results types

● Sorting of the Results List○ Relevance○ Date○ Title○ any other field

● Can be toggled

Page 18: Apache Solr for TYPO3 CMS 101

Result Listings

● Facettes○ Filter the results based of attributes○ Hierarchical Facettes

● Suggestions / Autocomplete● Stopwords● Protected words● Did you mean?

Page 19: Apache Solr for TYPO3 CMS 101

Logging / Analysis

● Built in query logging● Can be used with your favorite Analytics

suite● Feature rich analysis & debugging options

Page 20: Apache Solr for TYPO3 CMS 101

Caveats

● Junk in / Junk out● Get your data right● A String is not Text

○ Be aware of the difference between Strings and Text○ Protect proper names from stemming○ Example

Page 21: Apache Solr for TYPO3 CMS 101

Caveats

● Synonyms are nice, but don't abuse them● Don't confuse Solr with a Database

○ %WORD% does not work● Search with “WORD” if you want your query

to remain untouched● * work only at the end of a word

○ cat* will find catapult, cats, catastrophe etc○ *cat will yield with no results

Page 22: Apache Solr for TYPO3 CMS 101

Caveats

● Beware of indexing time○ Pages index slower than TCA records○ Files might be too big for initial settings

Page 23: Apache Solr for TYPO3 CMS 101

Some web resources

● You will find a lot of infos around the Apache Solr Extension: www.typo3-solr.com

● http://forge.typo3.org/projects/show/extension-solr

● Mailing List / Newsgroup / Forums● Afraid of Solr? try www.hosted-solr.com

Page 24: Apache Solr for TYPO3 CMS 101

Books & Documentation

● Taming Text● Apache Solr Cookbook● Administering Solr● Apache Solr 4.x● WIKI of Apache Solrhttps://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

Page 25: Apache Solr for TYPO3 CMS 101

Merci!Thank you!