GeoMesa: Scalable Geospatial Analytics

39
GeoMesa: Scalable Geospatial Analytics Chris Eichelberger [email protected]

description

GeoMesa: Scalable Geospatial Analytics

Transcript of GeoMesa: Scalable Geospatial Analytics

Page 1: GeoMesa:  Scalable Geospatial Analytics

GeoMesa: Scalable Geospatial Analytics

Chris [email protected]

Page 2: GeoMesa:  Scalable Geospatial Analytics

terms

• GeoMesa: an open-source project organized under LocationTech

• scalable: if you can continue to solve problems as N >> 1 with no more change than

adding hardware and minor tweaks, you scale

• geospatial: data that contain a geographic reference, a date/time, and zero

or more additional attributes

• analytics: formally, a logical decomposition via truth-preserving transformations;

informally, any useful derivation (whether deductive or inductive)

Page 3: GeoMesa:  Scalable Geospatial Analytics

outline

• part 1: why? ( 3 minutes)

• part 2: how? (10 minutes)

• part 3: what? (10 minutes)

• part 4: who? ( 2 minutes)

Page 4: GeoMesa:  Scalable Geospatial Analytics

part 1: why?

Page 5: GeoMesa:  Scalable Geospatial Analytics

[why] which X (points) are close to location Y?

• hundreds: PostgreSQL and brute force

– full table scan

• hundreds of thousands: PostgreSQL and PostGIS

– GeoTools API

– GiST (think R-trees)

• hundreds of millions: a funny thing happens as you collect much more data...

Page 6: GeoMesa:  Scalable Geospatial Analytics

[why] dissolution of large-volume data

Page 7: GeoMesa:  Scalable Geospatial Analytics

[why] perhaps SQL is the bottleneck?

• NoSQL databases, such as Apache Accumulo

• trade ACID for distributed processing, storage

• but there’s no PostGIS for Accumulo, so how does the canonical diagram of an Accumulo (key,

value) pair help us answer some simple questions...

Page 8: GeoMesa:  Scalable Geospatial Analytics

[why] questions that ought to be easy for an index to answer

• easy question: Which comes first, “Ontario” or “Quebec”?

Page 9: GeoMesa:  Scalable Geospatial Analytics

[why] questions that ought to be easy for an index to answer

• easy question: Which comes first, “Ontario” or “Quebec”?

• similar question: Which comes first, or ?

Page 10: GeoMesa:  Scalable Geospatial Analytics

[why] questions that ought to be easy for an index to answer

• easy question: Which comes first, “Ontario” or “Quebec”?

• similar question: Which comes first, or ?

• simplify, and think only of representative cities, and think of them strictly as points

Page 11: GeoMesa:  Scalable Geospatial Analytics

[why] geohashing

Page 12: GeoMesa:  Scalable Geospatial Analytics

[why] geohashing

Page 13: GeoMesa:  Scalable Geospatial Analytics

[why] geohashing

City Coordinates (courtesy Wikipedia) Geohash

Ottawa 45°25′15″N 75°41′24″W f244m

Montréal 45°30′N 73°34′W f25dv

Charlottesville (Virginia, USA) 38°1′48″N 78°28′44″W dqb0q

● Two unique orders:

○ Order by name: Charlottesville, Montréal, Ottawa

○ Order by longitude or latitude or geohash: Charlottesville, Ottawa, Montréal

● Lexicoding location -> geohash provides a deterministic, repeatable ordering

○ with this, we can index, store, and query points by lexicographic ranges

Page 14: GeoMesa:  Scalable Geospatial Analytics

[why] build-versus-buy remorse

• PostgreSQL+PostGIS has some nice functions

– geometric predicates

– secondary indexes

– standard GeoTools API

• some of our data are (multi) lines, (multi) polygons

• time is often more than a secondary consideration

• sometimes, analysis work needn’t be done on the same old client

– distributed across the tablet servers?

– using tools like Spark?

– streaming?

Page 15: GeoMesa:  Scalable Geospatial Analytics

[why] synthesis

Page 16: GeoMesa:  Scalable Geospatial Analytics

part 2: how?

Page 17: GeoMesa:  Scalable Geospatial Analytics

[how] GeoMesa features

• GeoTools API

• sharding distributes queries uniformly

• flexible SFC can incorporate time

• supports (multi) point, (multi) line, (multi) polygon geometries

• secondary indexes and a multi-stage query planner

• burgeoning raster support via WCS

• GeoServer as a plugin-based GUI

• WPS standards for computation (and function chaining)

Page 18: GeoMesa:  Scalable Geospatial Analytics

[how] GeoTools API

Page 19: GeoMesa:  Scalable Geospatial Analytics

[how] sharding

Page 20: GeoMesa:  Scalable Geospatial Analytics

[how] space-filling curve progression

%~#s%3#r%0,3#gh%yyyyMM#d::%~#s%3,2#gh::%~#s%5,2#gh%HHmm#d%id

Page 21: GeoMesa:  Scalable Geospatial Analytics

[how] multi-step query planning

Page 22: GeoMesa:  Scalable Geospatial Analytics

[how] multi-step query planning

Page 23: GeoMesa:  Scalable Geospatial Analytics

[how] non-point geometries

Page 24: GeoMesa:  Scalable Geospatial Analytics

[how] rasters + GeoWave integration

Page 25: GeoMesa:  Scalable Geospatial Analytics

[how] supporting other frameworks

Page 26: GeoMesa:  Scalable Geospatial Analytics

[how] GeoServer as a plug-in GUI

Page 27: GeoMesa:  Scalable Geospatial Analytics

[how] Web Processing Service

• WPS is another OGC standard

• Think of it as an abstract function definition, mapping input types to output types, and defining

the computation that occurs between the two.

• WPS processes can be chained.

• This provides for a natural extension mechanism to GeoMesa.

Page 28: GeoMesa:  Scalable Geospatial Analytics

[how] synthesis

Those are merely the highlights of some of GeoMesa’s current features…

… so what?

Page 29: GeoMesa:  Scalable Geospatial Analytics

part 3: what?

Page 30: GeoMesa:  Scalable Geospatial Analytics

[what] distributing computation

Page 31: GeoMesa:  Scalable Geospatial Analytics

[what] queries that interpolate both position and time

Page 32: GeoMesa:  Scalable Geospatial Analytics

[what] K-nearest neighbor

Page 33: GeoMesa:  Scalable Geospatial Analytics

[what] clustering (DBSCAN)

Page 34: GeoMesa:  Scalable Geospatial Analytics

[what] near-real-time streaming track analytics with web sockets

Page 35: GeoMesa:  Scalable Geospatial Analytics

[what] track viewer utility

Page 36: GeoMesa:  Scalable Geospatial Analytics

part 3: who?

Page 37: GeoMesa:  Scalable Geospatial Analytics

[who] LocationTech and the greater community

Page 38: GeoMesa:  Scalable Geospatial Analytics

[who] synthesis

Page 39: GeoMesa:  Scalable Geospatial Analytics

questions

For extended questions:

[email protected]

[email protected]

[email protected]

For additional reading:

geomesa.org

For code:

github.com/locationtech/geomesa