GeoMesa: Scalable Geospatial Analytics

Post on 10-Jul-2015

352 views 4 download

Tags:

description

GeoMesa: Scalable Geospatial Analytics

Transcript of GeoMesa: Scalable Geospatial Analytics

GeoMesa: Scalable Geospatial Analytics

Chris Eichelbergerchristopher.eichelberger@ccri.com

terms

• GeoMesa: an open-source project organized under LocationTech

• scalable: if you can continue to solve problems as N >> 1 with no more change than

adding hardware and minor tweaks, you scale

• geospatial: data that contain a geographic reference, a date/time, and zero

or more additional attributes

• analytics: formally, a logical decomposition via truth-preserving transformations;

informally, any useful derivation (whether deductive or inductive)

outline

• part 1: why? ( 3 minutes)

• part 2: how? (10 minutes)

• part 3: what? (10 minutes)

• part 4: who? ( 2 minutes)

part 1: why?

[why] which X (points) are close to location Y?

• hundreds: PostgreSQL and brute force

– full table scan

• hundreds of thousands: PostgreSQL and PostGIS

– GeoTools API

– GiST (think R-trees)

• hundreds of millions: a funny thing happens as you collect much more data...

[why] dissolution of large-volume data

[why] perhaps SQL is the bottleneck?

• NoSQL databases, such as Apache Accumulo

• trade ACID for distributed processing, storage

• but there’s no PostGIS for Accumulo, so how does the canonical diagram of an Accumulo (key,

value) pair help us answer some simple questions...

[why] questions that ought to be easy for an index to answer

• easy question: Which comes first, “Ontario” or “Quebec”?

[why] questions that ought to be easy for an index to answer

• easy question: Which comes first, “Ontario” or “Quebec”?

• similar question: Which comes first, or ?

[why] questions that ought to be easy for an index to answer

• easy question: Which comes first, “Ontario” or “Quebec”?

• similar question: Which comes first, or ?

• simplify, and think only of representative cities, and think of them strictly as points

[why] geohashing

[why] geohashing

[why] geohashing

City Coordinates (courtesy Wikipedia) Geohash

Ottawa 45°25′15″N 75°41′24″W f244m

Montréal 45°30′N 73°34′W f25dv

Charlottesville (Virginia, USA) 38°1′48″N 78°28′44″W dqb0q

● Two unique orders:

○ Order by name: Charlottesville, Montréal, Ottawa

○ Order by longitude or latitude or geohash: Charlottesville, Ottawa, Montréal

● Lexicoding location -> geohash provides a deterministic, repeatable ordering

○ with this, we can index, store, and query points by lexicographic ranges

[why] build-versus-buy remorse

• PostgreSQL+PostGIS has some nice functions

– geometric predicates

– secondary indexes

– standard GeoTools API

• some of our data are (multi) lines, (multi) polygons

• time is often more than a secondary consideration

• sometimes, analysis work needn’t be done on the same old client

– distributed across the tablet servers?

– using tools like Spark?

– streaming?

[why] synthesis

part 2: how?

[how] GeoMesa features

• GeoTools API

• sharding distributes queries uniformly

• flexible SFC can incorporate time

• supports (multi) point, (multi) line, (multi) polygon geometries

• secondary indexes and a multi-stage query planner

• burgeoning raster support via WCS

• GeoServer as a plugin-based GUI

• WPS standards for computation (and function chaining)

[how] GeoTools API

[how] sharding

[how] space-filling curve progression

%~#s%3#r%0,3#gh%yyyyMM#d::%~#s%3,2#gh::%~#s%5,2#gh%HHmm#d%id

[how] multi-step query planning

[how] multi-step query planning

[how] non-point geometries

[how] rasters + GeoWave integration

[how] supporting other frameworks

[how] GeoServer as a plug-in GUI

[how] Web Processing Service

• WPS is another OGC standard

• Think of it as an abstract function definition, mapping input types to output types, and defining

the computation that occurs between the two.

• WPS processes can be chained.

• This provides for a natural extension mechanism to GeoMesa.

[how] synthesis

Those are merely the highlights of some of GeoMesa’s current features…

… so what?

part 3: what?

[what] distributing computation

[what] queries that interpolate both position and time

[what] K-nearest neighbor

[what] clustering (DBSCAN)

[what] near-real-time streaming track analytics with web sockets

[what] track viewer utility

part 3: who?

[who] LocationTech and the greater community

[who] synthesis

questions

For extended questions:

geomesa-user@locationtech.org

geomesa@ccri.com

christopher.eichelberger@geomesa.org

For additional reading:

geomesa.org

For code:

github.com/locationtech/geomesa