Drupal 7 and SolR

55
Drupal 7 + Solr Tools’ overview / Integration / Usage / Case studies

Transcript of Drupal 7 and SolR

Page 1: Drupal 7 and SolR

Drupal 7 + SolrTools’ overview / Integration / Usage / Case studies

Page 2: Drupal 7 and SolR

IntroductionDigital agency established since

2008 in Mauritius

Recognized as one of the most expert offshore web agencies in Drupal

More than 150 projects in Drupal

Page 3: Drupal 7 and SolR

IntroductionTechnical Director at Esokia

10 years of experience in PHP

7 years of experience in Drupal

Page 4: Drupal 7 and SolR

Introduction

Page 5: Drupal 7 and SolR

What is Drupal?Free, community-built website development

toolModular and extensible content managementOpen sourceBuilt on PHPCreated by Dries BuytaertFirst release in January 2001

Page 6: Drupal 7 and SolR

Showcase

Page 7: Drupal 7 and SolR

Showcase

Page 8: Drupal 7 and SolR

Showcase

Page 9: Drupal 7 and SolR

More closer

Page 10: Drupal 7 and SolR

Key concept in DrupalFlexibility / Simplicity / UtilityHigh standard of usability for developers, administrators, and users

Modularity / Extensibility / MaintenabilitySlim and powerful core that can be readily extended through custom modules

Page 11: Drupal 7 and SolR

Drupal in the future?Drupal 8 expected on October, 2015 (approx.)

Big architectural changes

Built with

Page 12: Drupal 7 and SolR

What is Solr?Standalone enterprise search serverFull-text search, faceted navigationREST-like APIOpen sourceWritten in JavaSupported by Apache Software FoundationFirst release in January 2007

Page 13: Drupal 7 and SolR

Main FeaturesFull-Text SearchPowerful matching capabilities including phrases, wildcards, joins, grouping...

Faceted SearchSlicing of data using a large array of faceting algorithms

Page 14: Drupal 7 and SolR

Main FeaturesHigh Volume TrafficProven on a high scale all over the world

Extensible Plugin ArchitectureWell-defined extension points for indexing, analysis, request handling, query parsing...

Page 15: Drupal 7 and SolR

Main FeaturesGeospatial SearchMake location-based search with built-in support for spatial search

Rich Document ParsingIndex rich content such as Adobe

PDF, Microsoft Word and more

Page 16: Drupal 7 and SolR

Other featuresQuery suggestionsProviding suggestions to users as they type in their queries

Spell checking“Did you mean… ?”

HighlightingHelp users to focus on their search

External configuration via XMLAdjust and extend setup with XML files

Page 17: Drupal 7 and SolR

What’s next?Now in version 5.0 (since February, 2015)Cluster oriented with ZooKeeper & SolrCloudEasier installationBetter admin UI

Solr is now a mature product, with an easier handling

Page 18: Drupal 7 and SolR

Drupal & Solr - How it works?

HTTPPOST/GET

DB

INDEX

Drupal Application

Apache SOLR Server

Page 19: Drupal 7 and SolR

Drupal & Solr - How it works?Drupal send content to Solr on cron’s run.Each new or updated content is marked for indexation.Deleted content is removed from the index on cron’s run.

Page 20: Drupal 7 and SolR

Drupal & Solr - How it works?In Solr, a Document is the unit of search and index.An index consists of one or more Documents, a Document consits of one or more Fields.Each Drupal’s entity is a Document.

Page 21: Drupal 7 and SolR

Drupal & Solr - How it works?Index & Analysis in Solr:

➩Solr store keywords instead of pages and build an inverted index

➩All data go through many transformations during the analysis phase

Page 22: Drupal 7 and SolR

Drupal & Solr - Drupal’s side

One module: Apache Solr Search

Custom XML files for Solr’s configuration

Full entity support

Hooks for indexing, querying, displaying data

Many related modules to extend capabilities

Page 23: Drupal 7 and SolR

Drupal & Solr - Drupal’s sideDrupal’s specific fields for Solr

Page 24: Drupal 7 and SolR

Drupal & Solr - Drupal’s sideExample of hook

Page 25: Drupal 7 and SolR

Drupal & Solr - Drupal’s sideExample of hook

Page 26: Drupal 7 and SolR

Drupal & Solr - Drupal’s sideExample of hook

Page 27: Drupal 7 and SolR

Drupal & Solr - Solr’s side

Solr 4.x supported by Drupal

Multicore support (one core per application)

Least amount of software dependencies

Page 28: Drupal 7 and SolR

Drupal & Solr - Solr’s sideSolr Admin UI in 3.x

Page 29: Drupal 7 and SolR

Drupal & Solr - Solr’s sideSolr Admin UI in 4.x

Page 30: Drupal 7 and SolR

Drupal & Solr - Solr’s sideRequirements for Solr:✓Apache Solr✓Java (just the JRE)✓… And that’s all!

An embeded Jetty comes with Solr.

Page 31: Drupal 7 and SolR

Case Study n°1: La Lettre M

Page 32: Drupal 7 and SolR

Context✓ Business directory

➩ Thousands companies to index➩Not only company, but also companies’ directors➩Financial data and number of employees

✓ External database➩ Huge database with different update sources

✓ Unstructured data➩ Unusable for faceted search

Page 33: Drupal 7 and SolR

Solutions✓ Preprocessing data

➩Using hook to format data✓ Custom Solr field

➩Created custom data to index in Solr✓ Custom facets

➩ Add custom facet for filtering purpose

Page 34: Drupal 7 and SolR

SolutionsPreprocessing data with hook implementation

➩ Standardization of financial data➩ Standardization of number of employees

✓ For faceted search, data was structured in range

Page 35: Drupal 7 and SolR

SolutionsAdd custom indexable Solr field

➩ Hooks➩ Class functions

✓ New indexabledata

Page 36: Drupal 7 and SolR

SolutionsCreate custom facet for user experience

➩ Defining new widget➩ Use class inherit

✓ Better user experience in frontend

Page 37: Drupal 7 and SolR

Case Study n°2: Eramet

Page 38: Drupal 7 and SolR

Context✓ International listed company with official documents

➩Need to publish financial and official reports➩Create public charter and politic documents➩Strategic and essential data in documents➩French and english documents

Page 39: Drupal 7 and SolR

Solutions✓ Extract and index data from documents

➩Using Apache Tika as dependency✓ Distinction between french and english documents

➩ Use of Drupal’s File API and i18n functionnalities

Page 40: Drupal 7 and SolR

SolutionsExtract data from document with Tika

➩ Extract metadata and text➩ Extracted datas are added to the index➩ Tika’s call in integrated to Solr config XML file

✓ Easy to increase search capability

Page 41: Drupal 7 and SolR

SolutionsSeparation of files based on language

➩ Creation of indexable entity➩ Add language as an indexable field

✓ Default behavior of Apache Solr Attachmentmodule

Page 42: Drupal 7 and SolR

InterludeWhat is Tika?

A content analysis toolkitSupported by Apache Software FoundationOpen sourceWritten in Java

Page 43: Drupal 7 and SolR

InterludeSupported formats (non exhaustive list)

Page 44: Drupal 7 and SolR

Case study n°3: GFM

Page 45: Drupal 7 and SolR

Context✓ B2B directory

➩Thousands entries to index➩Cross data capabilities➩Sticky and highlighted entries

✓ Migration context➩From SQL Server to Solr

✓ Unstructured data➩Old database with different mainteners

Page 46: Drupal 7 and SolR

Solutions✓ Preprocessing data

➩Using hook to format data✓ Dual Solr index

➩One for sticky entries, one for standard entries✓ Usage of taxonomy

➩Categorized content for cross data

Page 47: Drupal 7 and SolR

SolutionsData standardization for search purposes

➩ Storing entries’ number➩ Managing data update

✓ Volume and recency as search criteria

Page 48: Drupal 7 and SolR

SolutionsCreate specific indexes

➩ Separate Solr query for result’s limitation

➩ Maintain a display counter

✓ One query per index and combining results

Page 49: Drupal 7 and SolR

SolutionsTransform taxomony result in search result

➩ Create a query on Solr

✓ Full transparencyfor the user

Page 50: Drupal 7 and SolR

Other applicationsTraining catalog

Search by filtering sessions by topics and/or dateProduct catalog

Fine search based on various attributes and scopeVideo database

Catchup TVUser directory

Filtering by function, localization...

Page 51: Drupal 7 and SolR

Other solutions

Open source / Full-text search / Written in C++

Open source / Rich document parsing / Written in C++

Open source / Full-text search / RESTful API

Page 52: Drupal 7 and SolR

Other solutions

Module available for Drupal

Less popular than Solr

Elasticsearch as an outsider

Page 53: Drupal 7 and SolR

Questions?

Page 54: Drupal 7 and SolR

Thank you!

Page 55: Drupal 7 and SolR