Apache Solr for TYPO3 CMS 101
-
Upload
olivier-dobberkau -
Category
Technology
-
view
825 -
download
12
description
Transcript of Apache Solr for TYPO3 CMS 101
Apache Solr 101Killing the Vampires of Search
Cluj, 2013Olivier Dobberkau
Some Vampirology first
● Nosferatu● Dracula● van Helsing● Selene● Edward & Bella
http://en.wikipedia.org/wiki/Vampire_film
Agenda
● About me● History of EXT:solr● Current status● Solr Basics● Caveats● Books & Documents
About me
Olivier DobberkauCEO of dkd Internet Service GmbHResearch and Developmentover 10 years of TYPO3 CMSMember of the T3A [email protected]: T3RevNeverEnd
Scratching ..... the TYPO3 CMS search itch
History of EXT:solr
We all know when a solution fails ...
History of EXT:solr
● Indexed Search gave us some pain● First prototype 2009● What you get in one or two days of work● Started Funding of Development● over 70 Sponsors● Its possible to offer services around it● Support and Consulting available
Current Status
Version 2.8.2 was released November 2012Introduced the Add-ons for additional featuresSupported TYPO3 CMS Versions4.5, 4.6 & 4.7Supported Solr Server3.6.2 (Time flies when you are having fun!)
The last TER Release
TER: 2.8.3
Introduce support for TYPO3 CMS Versions 4.5 - 6.1Loads of bug-fixes
Maintenance Release
Next Major Version
EXT:solr 3.x will be the next versionRelease will be hopefully soon(tm)
Will have no new features on the TYPO3 sideSupport for TYPO3 CMS 4.5 - 6.1Add Apache Solr 4.4 as a Server
Roadmap for EXT:solr 4.x
● Backend parts of the EXT all in Extbase● Templates go FLUID● Frontend goes Extbase● 4.x will be 6.2 only!● Effort estimated 2 to 4 man months
The EXT:solr ecosystem
The base is EXT:solrFeatures are added thru Add-ons● EXT:solrfile (File-Indexing for CMS 4.5 - 4.7)● EXT:solrdam (File-Indexing with DAM)● EXT:solrfal (File-Indexing for CMS 6.1 & 6.2)● EXT:solrmlt (More like this)● EXT:solrgrouping● EXT:tika (Extracting Service)
EXT:solr
So what does it do?● Indexing● Querying● Results Listing● Logging / Analysis
Indexing
● Indexing of pages● Indexing of TCA records● Indexing of Files (Add-On)● Index Queue
○ List of all to be indexed items○ Every time an items is touched/changed an update
is sent to the solr server○ No need for a crawler / instant results
Indexing
● Indexing is very easy and can be achieved thru simple typoscript configuration
● Additionally you can use Apache Nutch to index non TYPO3 websites
● Support for more than 30 Languages
Querying
● Easy to set up● Apply Lucene query language if you want to
search for specific items (only news i.e)● You can tell solr to boost results if query
terms are in the fields you are searching● Use elevation to rank terms● Correct Stemming available● Range queries (Intelligent dates)
Results Listing
● Results can be fully individualized○ Templates for different results types
● Sorting of the Results List○ Relevance○ Date○ Title○ any other field
● Can be toggled
Result Listings
● Facettes○ Filter the results based of attributes○ Hierarchical Facettes
● Suggestions / Autocomplete● Stopwords● Protected words● Did you mean?
Logging / Analysis
● Built in query logging● Can be used with your favorite Analytics
suite● Feature rich analysis & debugging options
Caveats
● Junk in / Junk out● Get your data right● A String is not Text
○ Be aware of the difference between Strings and Text○ Protect proper names from stemming○ Example
Caveats
● Synonyms are nice, but don't abuse them● Don't confuse Solr with a Database
○ %WORD% does not work● Search with “WORD” if you want your query
to remain untouched● * work only at the end of a word
○ cat* will find catapult, cats, catastrophe etc○ *cat will yield with no results
Caveats
● Beware of indexing time○ Pages index slower than TCA records○ Files might be too big for initial settings
Some web resources
● You will find a lot of infos around the Apache Solr Extension: www.typo3-solr.com
● http://forge.typo3.org/projects/show/extension-solr
● Mailing List / Newsgroup / Forums● Afraid of Solr? try www.hosted-solr.com
Books & Documentation
● Taming Text● Apache Solr Cookbook● Administering Solr● Apache Solr 4.x● WIKI of Apache Solrhttps://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide
Merci!Thank you!