2016-01 Lucene Solr spatial in 2015, NYC Meetup

Post on 16-Apr-2017

261 views 0 download

Transcript of 2016-01 Lucene Solr spatial in 2015, NYC Meetup

Lucene/Solr Spatial in 2015David Smiley

Search Engineer/Consultant (Freelance)

About David Smiley

Freelance Search Developer/ConsultantExpert Lucene/Solr development skills,advise (consulting), trainingJava, spatial, and full-stack experience

Apache Lucene/Solr committer & PMC memberPrimary author of “Apache Solr Enterprise Search Server”

More Spatial Contributors!

Spatial4j Lucene Solr

David Smiley ✔️ ✔️ ✔️

Ryan McKinley ✔️

Justin Deoliveira ✔️

Mike McCandless ✔️

Nick Knize ✔️

Karl Wright ✔️

Ishan Chattopadhyaya ✔️

Agenda

New Features / CapabilitiesNew ApproachesImprovementsPending

Lucene’s Spatial Module• Multiple approaches to index spatial dataabstract class SpatialStrategy

(5+ concrete implementations)• RecursivePrefixTreeStrategy (RPT) is most prominent, versatile

• Grid based

• Uses Spatial4j lib for shapes, distance calculations, and WKT• Uses JTS Topology Suite lib for polygons

Shape

SpatialPrefixTree / Cell PrefixTreeStrategyIntersectsPrefixTreeFilterContains…Within…Geohash | Quad

Topic: New Features

Heatmaps / grid faceting — Lucene, SolrSurface-of-sphere shapes (Geo3d) — LuceneAccurate indexed geometries — Lucene, SolrGeoJSON read/write — Spatial4j

Heatmaps: Spatial Grid Faceting

Spatial density summary grid faceting,also useful for point-plotting search results

Usually rendered with a gradient radiusLucene & Solr APIsScalable & fast usually…

v5.2

Heatmaps Under the Hood

Requires a PrefixTreeStrategy Lucene field — grid basedAlgorithm enumerates the underlying cell/terms and accumulates the counter in a corresponding grid

Conceptually facet.method=enum for spatialWorks on non-point indexed shapes tooComplexity: O(cells * cellDepthFactor) not O(docs)No/low memory; mainly the grid of integers

Solr will distribute to shards and mergeCould be faster still; a BFS (vs DFS) layout would be perfect

Solr Heatmap Faceting

On an RPT field (SpatialRecursivePrefixTreeFieldType)

prefixTree=“packedQuad” (optional)Query: /select?facet=true&facet.heatmap=geo_rpt&facet.heatmap.geom= ["-180 -90" TO "180 90”]facet.heatmap.format=ints2D or png

// Normal Solr response..."facet_counts":{ ... // facet response fields "facet_heatmaps":{ "loc_srpt":[ "gridLevel",2, "columns",32, "rows",32, "minX",-180.0, "maxX",180.0, "minY",-90.0, "maxY",90.0, "counts_ints2D", [null, null, [0, 0, ... ]]...

Solr Heatmap Resources

Solr Ref guide: https://cwiki.apache.org/confluence/display/solr/Spatial+SearchJack Reed’s Tutorial: http://www.jack-reed.com/2015/06/29/visualizing-10-million-geonames-with-leaflet-solr-heatmap-facets.htmlLive Demo: http://worldwidegeoweb.comOpen-source JavaScript Solr Heatmap Libraries

https://github.com/spacemansteve/SolrHeatmapLayerhttps://github.com/mejackreed/leaflet-solr-heatmaphttps://github.com/voyagersearch/leaflet-solr-heatmap

Geo3D: Shapes on the Surface of a Sphere

… or Ellipsoid of configurable axisNot a general 3D space geometry libInternally uses geocentric X, Y, Z coordinates (hence 3D) with 3D planar geometry mathematicsShapes: Point, Lat-Lon Rect, Circle, Polygons, Path (LineString) with optional bufferDistance computations: Arc (angular or surface), Linear (straight-line), Normal

All 2D Maps of the Earth Distort Straight Lines

A straight bird-flies path from Anchorage to Miami doesn’t actually cross the ocean!

Geo3D, continued…

BenefitsInherently more accurate than 2D projected spatial

especially for big shapes or near polesMany computations are fast; no expensive trigonometryAn alternative to JTS without the LGPL license (still)

Has own Lucene module (spatial3d), thus jar fileMaven groupId: org.apache.lucene, artifact: lucene-spatial3d

No Solr integration yet; pending more Spatial4j integrationIn progress!

Index & Search Geo3D Geometries

Spatial4j Geo3dShape wrapper with RPT

In Lucene-spatial for nowIndex Geo3d shapes

Limited to grid accuracy

Query by Geo3d shapeLimited distance sortHeatmaps

Geo3DPointField & PointInGeo3DShapeQuery

Based on a 3D BKD index

In spatial3d moduleIndex points-onlyQuery by Geo3d shapeNo distance sortLeaner & faster than RPT?

v5.4v5.2

RPT/SpatialPrefixTrees and Accuracy

RecursivePrefixTree (RPT) uses Lucene’s index as a PrefixTree

Thus represents shapes as grid cells of varying precision by prefix

Example, a point shape:D, DR, DRT, DRT2, DRT2YMore accuracy scales

Example, a polygon shape:Too many to list… 508 cellsMore accuracy does NOT scale

Combining RPT with Serialized Geometry

RPT (RecursivePrefixTreeStrategy) is the grid index (inaccurate)SDV (SerializedDVStrategy) stores serialized geometry (accurate)RPT + SDV → CompositeSpatialStrategy

Accuracy & speed & smaller indexesOptimized intersects predicate avoids some geometry checks> 80% faster intersects queries, 75% smaller index

Solr adapter: RptWithGeometrySpatialFieldCompatible with the Heatmaps featureIncludes a shape cache (per-segment); configurable

v5.2

Topic: New Approaches

LuceneDimensionalValues (BKD Tree Indexes)GeoPointField

New Lucene index type for numeric valuesIncluding multi-dimensional values!Old: IntField, FloatField etc., trie indexing is now legacyNew: DimensionalIntField, DimensonalFloatField, etc. with DimensionalRangeQuery, …

Implemented using a BKD IndexPaper: https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdfMuch faster and compact than trie/prefix-tree based indexes

Wither term auto-prefixing? LUCENE-5879 Defunct?

v6.0DimensionalValues (BKD Index)

Multiple Fields/Queries using this:(1D) DimensionalIntField(2D) DimensionalLatLonField(3D) Geo3DPointField (previously described)And you can write your own

…continued

Efficient range search on single/multi-valued numbers or termsCould be used for numbers, dates, IPV6 bytes, …Alternatives: LegacyIntField etc. (trie), DateRangeField (RPT)

Would love to see a benchmark!How-To:

Dimensional___Field: Int, Long, Float, Double, BinaryDimensionalRangeQuery (or DimensionalQuery?)

v5.3DimensionalValues 1D

Efficient 2D geospatial point indexAlternative to RPT or GeoPointFieldIn lucene-sandboxNo Lucene-spatial module SpatialStrategy wrappers yet, thus no Spatial4j Shape integration nor Solr integration yet

How-To:Index: DimensionalLatLonFieldQuery:

DimensionalPointInBBoxQueryDimensionalPointInPolygonQuerypoint-radius (circle) — in-progress LUCENE-6698

v5.3DimensionalValues 2D: DimensionalLatLonField

Cool video: https://www.youtube.com/watch?v=x9WnzOvsGKs

GeoPointField

2D geospatial point fieldIndexed point-only data, single/multi-valuedSpatial 2D Trie/PrefixTree terms index

But not affiliated with Lucene-spatial SpatialPrefixTree/RPTConfigurable 2x grid size (defaults to 512)Compact bit interleaved Z-order encodingRe-uses much of Lucene’s numeric precisionStep & MultiTermQuery logic2-phase grid/postings then doc-values algorithm

v5.3

…continued

Has no affiliation with Spatial4j, RPT, JTS, or SpatialStrategyNo Heatmaps, No custom Shape implementationsNo Solr support yetNo dependencies

Easy to use compared to RPT; simpler internally tooHow-To:

doc.add(new GeoPointField(name, lon, lat, Store.YES))GeoPointDistanceQuery (sphere only) or GeoPointInBBoxQuery or GeoPointInPolygonQuery or GeoPointDistanceRangeQuery

Cool video: https://www.youtube.com/watch?v=l2zB9TDUAL4

Topic: Some Pending Spatial TODOs

Spatial4jJTS-free polygon API (in-progress)Geo3D adapter

LuceneFlexPrefixTree — LUCENE-4922Heatmap optimized FlexPrefixTree (Breadth First Search layout)SpatialStrategy adapters for GeoPointField, DimensionalLatLonField, Geo3DPointField

SolrBetter spatial Solr QParsers — SOLR-4242GeoJSON parsingMore FieldType adapters for latest Lucene spatialNearest-neighbor searchDateRangeField faceting

That’s all for now; thanks for coming!

Need Lucene/Solr guidance or custom development?

Contact me!Email: dsmiley@apache.orgLinkedIn: http://www.linkedin.com/in/davidwsmileyG+: +DavidSmileyTwitter: @DavidWSmiley