Drupal 7 and SolR
-
Upload
patrick-morin -
Category
Internet
-
view
33 -
download
0
Transcript of Drupal 7 and SolR
Drupal 7 + SolrTools’ overview / Integration / Usage / Case studies
IntroductionDigital agency established since
2008 in Mauritius
Recognized as one of the most expert offshore web agencies in Drupal
More than 150 projects in Drupal
IntroductionTechnical Director at Esokia
10 years of experience in PHP
7 years of experience in Drupal
Introduction
What is Drupal?Free, community-built website development
toolModular and extensible content managementOpen sourceBuilt on PHPCreated by Dries BuytaertFirst release in January 2001
Showcase
Showcase
Showcase
More closer
Key concept in DrupalFlexibility / Simplicity / UtilityHigh standard of usability for developers, administrators, and users
Modularity / Extensibility / MaintenabilitySlim and powerful core that can be readily extended through custom modules
Drupal in the future?Drupal 8 expected on October, 2015 (approx.)
Big architectural changes
Built with
What is Solr?Standalone enterprise search serverFull-text search, faceted navigationREST-like APIOpen sourceWritten in JavaSupported by Apache Software FoundationFirst release in January 2007
Main FeaturesFull-Text SearchPowerful matching capabilities including phrases, wildcards, joins, grouping...
Faceted SearchSlicing of data using a large array of faceting algorithms
Main FeaturesHigh Volume TrafficProven on a high scale all over the world
Extensible Plugin ArchitectureWell-defined extension points for indexing, analysis, request handling, query parsing...
Main FeaturesGeospatial SearchMake location-based search with built-in support for spatial search
Rich Document ParsingIndex rich content such as Adobe
PDF, Microsoft Word and more
Other featuresQuery suggestionsProviding suggestions to users as they type in their queries
Spell checking“Did you mean… ?”
HighlightingHelp users to focus on their search
External configuration via XMLAdjust and extend setup with XML files
What’s next?Now in version 5.0 (since February, 2015)Cluster oriented with ZooKeeper & SolrCloudEasier installationBetter admin UI
Solr is now a mature product, with an easier handling
Drupal & Solr - How it works?
HTTPPOST/GET
DB
INDEX
Drupal Application
Apache SOLR Server
Drupal & Solr - How it works?Drupal send content to Solr on cron’s run.Each new or updated content is marked for indexation.Deleted content is removed from the index on cron’s run.
Drupal & Solr - How it works?In Solr, a Document is the unit of search and index.An index consists of one or more Documents, a Document consits of one or more Fields.Each Drupal’s entity is a Document.
Drupal & Solr - How it works?Index & Analysis in Solr:
➩Solr store keywords instead of pages and build an inverted index
➩All data go through many transformations during the analysis phase
Drupal & Solr - Drupal’s side
One module: Apache Solr Search
Custom XML files for Solr’s configuration
Full entity support
Hooks for indexing, querying, displaying data
Many related modules to extend capabilities
Drupal & Solr - Drupal’s sideDrupal’s specific fields for Solr
Drupal & Solr - Drupal’s sideExample of hook
Drupal & Solr - Drupal’s sideExample of hook
Drupal & Solr - Drupal’s sideExample of hook
Drupal & Solr - Solr’s side
Solr 4.x supported by Drupal
Multicore support (one core per application)
Least amount of software dependencies
Drupal & Solr - Solr’s sideSolr Admin UI in 3.x
Drupal & Solr - Solr’s sideSolr Admin UI in 4.x
Drupal & Solr - Solr’s sideRequirements for Solr:✓Apache Solr✓Java (just the JRE)✓… And that’s all!
An embeded Jetty comes with Solr.
Case Study n°1: La Lettre M
Context✓ Business directory
➩ Thousands companies to index➩Not only company, but also companies’ directors➩Financial data and number of employees
✓ External database➩ Huge database with different update sources
✓ Unstructured data➩ Unusable for faceted search
Solutions✓ Preprocessing data
➩Using hook to format data✓ Custom Solr field
➩Created custom data to index in Solr✓ Custom facets
➩ Add custom facet for filtering purpose
SolutionsPreprocessing data with hook implementation
➩ Standardization of financial data➩ Standardization of number of employees
✓ For faceted search, data was structured in range
SolutionsAdd custom indexable Solr field
➩ Hooks➩ Class functions
✓ New indexabledata
SolutionsCreate custom facet for user experience
➩ Defining new widget➩ Use class inherit
✓ Better user experience in frontend
Case Study n°2: Eramet
Context✓ International listed company with official documents
➩Need to publish financial and official reports➩Create public charter and politic documents➩Strategic and essential data in documents➩French and english documents
Solutions✓ Extract and index data from documents
➩Using Apache Tika as dependency✓ Distinction between french and english documents
➩ Use of Drupal’s File API and i18n functionnalities
SolutionsExtract data from document with Tika
➩ Extract metadata and text➩ Extracted datas are added to the index➩ Tika’s call in integrated to Solr config XML file
✓ Easy to increase search capability
SolutionsSeparation of files based on language
➩ Creation of indexable entity➩ Add language as an indexable field
✓ Default behavior of Apache Solr Attachmentmodule
InterludeWhat is Tika?
A content analysis toolkitSupported by Apache Software FoundationOpen sourceWritten in Java
InterludeSupported formats (non exhaustive list)
Case study n°3: GFM
Context✓ B2B directory
➩Thousands entries to index➩Cross data capabilities➩Sticky and highlighted entries
✓ Migration context➩From SQL Server to Solr
✓ Unstructured data➩Old database with different mainteners
Solutions✓ Preprocessing data
➩Using hook to format data✓ Dual Solr index
➩One for sticky entries, one for standard entries✓ Usage of taxonomy
➩Categorized content for cross data
SolutionsData standardization for search purposes
➩ Storing entries’ number➩ Managing data update
✓ Volume and recency as search criteria
SolutionsCreate specific indexes
➩ Separate Solr query for result’s limitation
➩ Maintain a display counter
✓ One query per index and combining results
SolutionsTransform taxomony result in search result
➩ Create a query on Solr
✓ Full transparencyfor the user
Other applicationsTraining catalog
Search by filtering sessions by topics and/or dateProduct catalog
Fine search based on various attributes and scopeVideo database
Catchup TVUser directory
Filtering by function, localization...
Other solutions
Open source / Full-text search / Written in C++
Open source / Rich document parsing / Written in C++
Open source / Full-text search / RESTful API
Other solutions
Module available for Drupal
Less popular than Solr
Elasticsearch as an outsider
Questions?
Thank you!