Apache Solr: beyond the search page, from Drupalcon London 2014

Post on 30-Nov-2014

446 views 1 download

description

A talk on how to modify a default install of Drupal's Apache Solr integration module to provide a customised search experience. http://2014.drupalcamplondon.co.uk/drupalcamp-london-2014/session/apache-solr-beyond-search-page

Transcript of Apache Solr: beyond the search page, from Drupalcon London 2014

Apache Solr: Beyond the Search Page.Rupert Jabelman drupal.org/IRC: rupertj Twitter: @rupertjabelman

Who are you again?

I work for Aroq - an online publisher.

We provide daily information in 4 industries: Auto, Food, Drink, Clothing.

Auto is where I spend most of my time, working on QUBE: a site for research and intelligence, rather than news. QUBE uses Solr extensively, with ~ 65000 documents in its index.

What’s this about then?

How to use Solr for things that aren’t just the bog standard search page:

1. Content admin page

2. Data visualisation

You can use these techniques to enhance normal search pages too.

Text

A (mostly) bog standard search page.

Text

An admin-focused search

Text

Data visualisation

The Lie.

Differentiating your search pages.

Text

Specialised image search

Text

So what is Solr anyway?

Solr is a web serviceAnd a request looks like this: start=0&rows=20&&spellcheck=true&q=&fl=id%2Centity_id%2Centity_type%2Cbundle%2Cbundle_name%2Clabel%2Css_language%2Cis_comment_count%2Cds_created%2Cds_changed%2Cscore%2Cpath%2Curl%2Cis_uid%2Ctos_name%2Cteaser%2Czm_parent_entity%2Css_filemime%2Css_file_entity_title%2Css_file_entity_url%2Csm_vid_Segment%2Cim_segment%2Cds_production_dates%2Cds_production_dates_end%2Css_platform_reference%2Css_platform_name%2Css_group_reference%2Css_group_name%2Css_make_reference%2Css_make_name%2Csm_model_name_reference%2Csm_model_name_name%2Csm_parent_model_reference%2Csm_parent_model_name%2Css_codename%2Csm_plant_reference%2Csm_plant_name%2Csm_vid_Production_status%2Cds_next_product_action_date%2Cim_next_product_action%2Csm_vid_Product_action&mm=1&mm=100%25&pf=content%5E2.0&ps=15&hl=true&hl.fl=content&hl.snippets=3&hl.mergeContigious=true&f.content.hl.alternateField=teaser&f.content.hl.maxAlternateFieldLength=256&qf=content%5E40&qf=label%5E5.0&qf=tags_h2_h3%5E3.0&qf=tags_h4_h5_h6%5E2.0&qf=tags_inline%5E1.0&qf=taxonomy_names%5E2.0&qf=tos_name%5E3.0&qf=ts_comments%5E20&facet=true&facet.sort=count&facet.mincount=1&facet.field=im_segment&facet.field=im_country&facet.field=im_production_status&facet.field=sm_plant_reference&facet.field=sm_group_reference&facet.field=%7B%21ex%3Dsm_make_reference%7Dsm_make_reference&facet.field=sm_model_name_reference&facet.field=ss_platform_reference&facet.field=is_eop_year&facet.field=is_sop_year&facet.field=sm_revision_name_formatted&facet.field=is_owner&facet.field=im_field_priority&facet.field=im_field_company_reference&facet.field=im_field_article_type&facet.field=is_uid&facet.field=im_field_intelligence_sector&facet.field=im_field_theme&facet.field=bundle&f.im_segment.facet.limit=-1&f.im_segment.facet.mincount=1&f.im_country.facet.limit=-1&f.im_country.facet.mincount=1&f.im_production_status.facet.limit=50&f.im_production_status.facet.mincount=1&f.sm_plant_reference.facet.limit=100&f.sm_plant_reference.facet.mincount=1&f.sm_group_reference.facet.limit=50&f.sm_group_reference.facet.mincount=1&f.sm_make_reference.facet.limit=50&f.sm_make_reference.facet.mincount=1&f.sm_model_name_reference.facet.limit=50&f.sm_model_name_reference.facet.mincount=1&f.ss_platform_reference.facet.limit=100&f.ss_platform_reference.facet.mincount=1&f.is_eop_year.facet.limit=50&f.is_eop_year.facet.mincount=1&f.is_sop_year.facet.limit=50&f.is_sop_year.facet.mincount=1&f.sm_revision_name_formatted.facet.limit=50&f.sm_revision_name_formatted.facet.mincount=1&f.is_owner.facet.limit=50&f.is_owner.facet.mincount=1&f.im_field_priority.facet.limit=50&f.im_field_priority.facet.mincount=1&f.im_field_company_reference.facet.limit=50&f.im_field_company_reference.facet.mincount=1&f.im_field_article_type.facet.limit=50&f.im_field_article_type.facet.mincount=1&facet.date=ds_created&facet.date=ds_changed&f.ds_created.facet.date.start=1997-01-01T00%3A00%3A00Z%2FYEAR&f.ds_created.facet.date.end=2014-01-01T00%3A00%3A00Z%2B1YEAR%2FYEAR&f.ds_created.facet.date.gap=%2B1YEAR&f.ds_created.facet.limit=50&f.is_uid.facet.limit=50&f.is_uid.facet.mincount=1&f.ds_changed.facet.date.start=2009-01-01T00%3A00%3A00Z%2FYEAR&f.ds_changed.facet.date.end=2014-01-01T00%3A00%3A00Z%2B1YEAR%2FYEAR&f.ds_changed.facet.date.gap=%2B1YEAR&f.ds_changed.facet.limit=50&f.im_field_intelligence_sector.facet.limit=50&f.im_field_intelligence_sector.facet.mincount=1&f.im_field_theme.facet.limit=50&f.im_field_theme.facet.mincount=1&f.bundle.facet.limit=50&f.bundle.facet.mincount=1&sort=ds_changed%20desc&q.alt=%28entity_type%3Aqube_entity%29%20%28bundle%3Aproduction_run%29%20%28im_production_status%3A1167%29&wt=json&json.nl=map!

Solr is a web serviceAnd a (helpfully formatted) request looks like this: q = Toyota!fq = entity_type:qube_entity!fq = bundle:production_run!fq = ss_platform_reference:”qube_entity:3505"!fl = id, entity_id, entity_type, bundle, bundle_name, label, ss_language, is_comment_count, ds_created, ds_changed, score, path, url, is_uid, tos_name, teaser, zm_parent_entity, ss_filemime, ss_file_entity_title, ss_file_entity_url, sm_vid_Segment, im_segment, ds_production_dates, ds_production_dates_end, ss_platform_reference, ss_platform_name, ss_group_reference, ss_group_name, ss_make_reference, ss_make_name, sm_model_name_reference, sm_model_name_name, sm_parent_model_reference, sm_parent_model_name, ss_codename, sm_plant_reference, sm_plant_name, sm_vid_Production_status, ds_next_product_action_date, im_next_product_action, sm_vid_Product_action!mm = 100%!start = 0!rows = 20!spellcheck = true!pf = content^2.0!ps = 15!hl = true!hl.fl = content!hl.snippets = 3!hl.mergeContigious = true!qf = content^40!qf = label^5.0!qf = tags_h2_h3^3.0!qf = tags_h4_h5_h6^2.0!qf = tags_inline^1.0!qf = taxonomy_names^2.0!qf = tos_name^3.0!qf = ts_comments^20!sort = ds_changed desc!wt = json!json.nl = map!

Dynamic FieldsSo any field you’ve added through Field API can be shown in solr without changing the schema, dynamic fields are used. These have their properties defined by their prefix.

EG:

• im_foo => integer, multi-valued. EG Taxonomy term IDs.

• sm_bar => string, multi-valued. EG Taxonomy term names.

• bs_grill => boolean, single. EG Checkbox fields.

Most fields are added for you, but you can always add your own. In fact, you’ll probably have to.

You’ll need to feed Solr more data

Adding extra data.

hook_apachesolr_index_documents_alter(array &$documents, $entity, $entity_type, $env_id) {

$document = reset($documents);

$document->setField('ss_foo', $entity->field_foo…);

}

Most field data is included for you already (with a few exceptions). Data you add here can be anything you like though.

Checking your work: Solr admin.

Modifying queries.Add extra fields to the results:

hook_apachesolr_query_alter($query) {

$query->addParam('fl', ‘sm_foo’);

}

Changing the query fields:

hook_apachesolr_query_alter($query) {

$query->replaceParam('qf', array(

‘label^2.0',

'content^1.0',

));

}

Modifying queries.Changing the number of results to return:

hook_apachesolr_query_alter($query) {

// Solr doesn't have a value for “unlimited"

// so we'll pass in a Very Large Number.

$query->addParam('rows', 999999);

}

Changing the sort order (setting a default in this case):

hook_apachesolr_query_alter($query) {

if (!isset($_GET['solrsort']) && ($query->getParam('q') == '')) {

$query->setSolrSort('ds_changed', 'desc');

}

}

Adding sortsOverride SolrBaseQuery. apachesolr_query_class variable controls which is instantiated.

class MySolrBaseQuery extends SolrBaseQuery {

protected function defaultSorts() {

$sorts = parent::defaultSorts();

// Add in core changed property. Missing by default.

$sorts['ds_changed'] =

array('title' => t('Updated Date'), 'default' => 'desc');

}

}

apachesolr_sort module can be used to manage sorts:

• Choose a default sort & order.

• Enable / Disable the available sorts.

Bringing it all together.Create a search page.

Apply the limits you need. Eg entity type, bundles, etc.

Add the data you need to solr documents.

Alter the query to add any additional fields you want to display.

Place any facet blocks you want.

Theme it.

Text

An admin-focused search

Text

Data visualisation

Swopping out theme functions.function hook_apachesolr_search_page_alter(&$build, $search_page) {

if ($search_page['page_id'] == 'pldb') {

// If a timeseries view was requested, switch the theme function to one we define that draws timeseries.

if (qube_pldb_search_results_view() == 'timeseries') {

$env_id = 'solr';

if (apachesolr_has_searched($env_id)) {

$query = apachesolr_current_query($env_id);

if (qube_pldb_search_results_view_timeseries_allowed($query)) {

$build['search_results']['#theme'] = 'search_results_timeseries';

}

….

Fin.

Any questions?