Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

26
Company Confidential. ©2010 Nokia Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others Percona Live MySQL Conference and Expo 2013 Henrik Ingo Senior Performance Architect, Nokia (CC) 2013 Nokia. Please share and modify this presentation licensed with the Creative Commons Attribution license.

Transcript of Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

Page 1: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

Company Confidential. ©2010 Nokia

Nokia Internal Use Only

Spatial functions in MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

Percona Live MySQL Conference and Expo 2013Henrik IngoSenior Performance Architect, Nokia

(CC) 2013 Nokia. Please share and modify this presentation licensed with the Creative Commons Attribution license.

Page 2: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 2

GIS is a lot of things.

Open Geospatial Consortium defines lots of standards

• http://www.opengeospatial.org/standards/sfs

The one we are talking about is:

OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 2: SQL option

WHAT is GIS?

Page 3: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 3

Is the world flat, or a sphere?

GEOMETRY types GEOGRAPHY types

Page 4: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 4

It's neither!

But what about mountains and skyscrapers?

Page 5: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 5

Projections?

A

B C

Distance(A, B) = 0.0001 deg = 11 mDistance(B, C) = 0.0001 deg = 8.5 m

(in Manhattan)

A

B C

All the lines above are straight.

Page 6: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 6

POINT(0 0) LINESTRING(0 0,1 1,1 2) POLYGON((0 0,4 0,4 4,0 4,0 0),(1 1, 2 1, 2 2, 1 2,1 1))...

INSERT INTO geotable ( the_geom, the_name ) VALUES ( ST_GeomFromText('POINT(-126.4 45.32)', 312), 'A Place');

db=# SELECT road_id, ST_AsText(road_geom) AS geom, road_name FROM roads;

road_id | geom | road_name--------+-----------------------------------------+-----------

1 | LINESTRING(191232 243118,191108 243242) | Jeff Rd 2 | LINESTRING(189141 244158,189265 244817) | Geordie Rd 3 | LINESTRING(192783 228138,192612 229814) | Paul St 4 | LINESTRING(189412 252431,189631 259122) | Graeme Ave 5 | LINESTRING(190131 224148,190871 228134) | Phil Tce 6 | LINESTRING(198231 263418,198213 268322) | Dave Cres 7 | LINESTRING(218421 284121,224123 241231) | Chris Way

(6 rows)

SELECT the_geomFROM geom_tableWHERE ST_Distance(the_geom, ST_GeomFromText('POINT(100000 200000)')) < 100 AND type="road"

See also: http://blog.mariadb.org/screencast-mariadb-gis-demo/

Example SQL

Page 7: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 7

PostgreSQL MySQL & MariaDB MongoDB Solr SQLite

Standard feature PostGIS + Extension

+ + + Spatialite

Type: Point + + + + +

Type: Geometry (x,y) + + * - +

Type: Geography (lat, lon) + - * - -

Type: 3D (ish) + - - - -

SRID projections + - * - +

Query by radius + ~ + + ~

Precise decimal math - MariaDB - - -

Query by bounding box + + * - +

Notes: Most functions

don't support Geography

MyISAM only WGS84 onlyLimited

function set.

Indexes have to be explicitly

JOINed

Products that implement GIS

* Since MongoDB 2.4. This evaluation was done on v 2.0.

~ No, but you can query with bounding box (uses index) AND sort that result set by radius.

Page 8: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 8

Spatial use cases

-74.001417, 40.719811Canal Street, New York, USA

Geocoding

Reverse Geocoding

(text search)

(GIS)

Points-of-Interest

We are here

Page 9: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 9

• Scan HERE.com with script:40.48, -75.23 to 42.42, -73.38New York City+ 4 neighbor states + Atlantic Ocean

• 0.0001 deg steps = 11 m vertically, 8.5 m horizontally

• 358M points9.6M unique locations

• 7 days

Creating my data set

Page 10: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 10

SELECT * FROM Location JOIN Point ON Location.id=Point.LocationId WHERE Location.id=1;

id Label Country State County

1 E Sawmill Rd, Haycock Twp, PA 18951, United States

USA PA Bucks

PostalCode City District Street HouseNumber

LocationType

18951 Haycock E Sawmill Rd street

1:n

Page 11: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 11

• GIS functions used:ST_Envelope()ST_Union()

• Limitations in Geography type

• 12 daysBottlenecked by CPU

Creating areas out of points

Page 12: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 12

My dataset!

Page 13: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 13

Accuracy compared to source = 93% (...5m margin of error)

Page 14: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 14

sql = """SELECT id,

ST_X(st_geomfromtext(st_astext("p"))) "x",

ST_Y(st_geomfromtext(st_astext("p"))) "y"

FROM "Point"

WHERE "Point"."LocationId" = %s"""

cur.execute(sql, [id] ) points = cur.fetchall() for p in points : db.point.insert({ "_id" : p['id'],

"LocationId" : id,

"p":[p['x'], p['y']] })

Migrating from SQL to NoSQL

Page 15: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

Company Confidential. ©2010 Nokia

Nokia Internal Use Only

MongoDB requires points to be ordered as (lon, lat).

Python dictionaries are serialized in alphabetical order.

You are HERE

Page 16: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 16

SQL with polygonsSELECT * FROM "GeomArea" JOIN "Location" ON "GeomArea"."id" = "Location"."id" WHERE ST_Within(ST_GeomFromEWKT('SRID=4326;POINT(<lon> <lat>)'), "p")

SQL with pointsSELECT * FROM Point JOIN Location ON Point.LocationId = Location.id WHERE ST_Within(p, ST_GeomFromText('POLYGON((<lon>+1 <lat>+1, <lon>+1 <lat>-1, <lon>-1 <lat>-1, <lon>-1 <lat>+1, <lon>+1 <lat>+1))')) ORDER BY ST_Distance(ST_GeomFromText('POINT(<lon> <lat>)'), p)

MongoDB with pointspoint = db.point.find( { "p": { "$near" : [ lon, lat ] } } ).limit(1) id = point[0]["LocationId"] location = db.location.find_one( {"_id": id} )

Reverse geocoding HowTo

Page 17: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 17

Centos 6

8 CPUs, 32GB RAM, all tests with data set in RAM

PostGIS 9.1

MySQL 5.6.9 RC

MariaDB 5.5.29

MongoDB 2.0.7

Versions

Page 18: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 18

My data (GB) World (GB)

PostGIS polygons 34 165 240

PostGIS points 70 340 200

MySQL & MariaDB polygons 3.9 18 954

MySQL & MariaDB points 18 87 480

MongoDB 71 345 060

Data size (note that my data set not packed for optimal for size)

Size for World is extrapolated by multiplier 4860This is based on 30% of the Earth surface being land

Polygons could be smoothened to reduce data set size by factor of 20-100

Page 19: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 19

Benchmark Results (data set in memory, 8 CPUs)

Clients TPS Avg RT (msec) 50% RT 98% RT

PostGIS polygons

1 138 7 6 18

4 547 7 6 18

8 1072 7 6 19

PostGIS points

1 419 2 2

4 1613 2 3

8 3136 3 3

PostGIS points disk bound: 100 TPS. Didn't scale with threads.

Page 20: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 20

Benchmark Results (data set in memory, 8 CPUs)

Clients TPS Avg RT (msec) 50% RT 98% RT

MySQL polygons

1 2866 0 0

4 10k 0 1

8 16.5k 0 1

MySQL points

1 1800 1 1

4 2110 2 3

8 1402 6 7

Using InnoDB for Location table (non-gis address data) was slightly faster for polygons.Is MySQL faster because it doesn't support projections? -> Try PostGIS with SRID=0.Points approach is stuck in "Creating sort index". (Should increase join buffers and tmp table.)

Page 21: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 21

Benchmark Results (data set in memory, 8 CPUs)

Clients TPS Avg RT (msec) 50% RT 98% RT

MariaDB polygons

1 2340 0 1

4 9146 0 1

8 15k 1 1

MariaDB points

1 1650 1 1

4 2270 2 2

8 1647 5 6

MariaDB GIS functions are independent of MySQL, but data format and indexes are the same.Performance within +/- 10% of MySQL.

Page 22: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 22

Benchmark Results (data set in memory, 8 CPUs)

Clients TPS Avg RT (msec) 50% RT 98% RT

MongoDB points

1 411 2 2 2

4 454 9 3 20

8 525 14 7 25

PostGIS points

1 419 2 2

4 1613 2 3

8 3136 3 3

MySQL & MariaDB points

1 1650 1 1

4 2270 2 2

8 1647 5 6

Page 23: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 23

• Nice linear scalability, stable response times

• Most advanced, but "bolted on" user experience

• Wasteful in CPU and data size

• Decent on disk bound workload

• Polygon based performance a small disappointment

• Wishlist:

• No more feutures needed.

• Ease of use and performance please.

• Future: Real 3D

PostGIS Summary

Page 24: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 24

MongoDB

• Simple: Radius from point (Foursquare)

• Combinations possible: type=restaurant within 1 km

• Single thread performance ok, but didn't scale

• Could be issue with benchmark framework

• Main gotcha: don't use python dictionary for (lon, lat)

• 2.4 brings lots of enhancements, not covered here.

MongoDB Summary

Page 25: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

(CC BY) 2013 Nokia 25

• 5x better than anything else

• For Within()

• Contention on sorting by Distance()

• Delivered on the vision of polygon based model

• Different implementations, same performance

• MySQL slightly faster, but within +/- 10%

• MariaDB has precise math operations

• Wishlist:

• Projections (SRID)

• InnoDB support

• Distance() using RTree index

MySQL & MariaDB Summary

Page 26: Spatial functions in  MySQL 5.6, MariaDB 5.5, PostGIS 2.0 and others

Company Confidential. ©2010 Nokia

Nokia Internal Use Only

Thank you!For more informationhttp://www.openlife.cc/blog [email protected]