Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain...

22
Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips

Transcript of Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain...

Page 1: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Presto Summit SF 2019Martin Traverso, Dain Sundstrom, David Phillips

Page 2: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Presto Software Foundation“An independent, non-profit organization with the mission of supporting a community of passionate users and developers devoted to the advancement of the Presto distributed SQL query engine for big data.”

“It is dedicated to preserving the vision of high quality, performant, and dependable software.”

“Ensuring the project remains open, collaborative and independent for decades to come”

Page 3: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Presto Community

• Github: https://github.com/prestosql

• Website: https://prestosql.io

• Blog: https://prestosql.io/blog

• Twitter: @prestosql

• Slack: prestosql.slack.com

Page 4: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Since the Launch…• Launched on January 31, 2019

• 16 releases (~1 per week)

• 1300+ commits

• 200k lines changed

• 650+ pull requests closed

• 50+ contributors

• 170 weekly active members on Slack

Page 5: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Contributors

kokosing

raunaqmorarka

pgagnonMiguelWeezardo

MarvinCai

Praveen2112

chancez

hustnn

kasiafi

sopel39

stagraqubole

yui-knk

Yaliang

dain

11xor6

Lewuathe

garvit-gupta

VicoWu

qqibrow

findepi

pettyjamesm

martint

electrum

vincentpoon

wyukawa

guyco33

bill-warshaw vkorukanti

anusudarsandilipkasana

sshardool linxingyuan1102

luohao

zhenxiao

rzeyde-varada

takezoe

kabunchiryanrupp

ilfrinChethanUK

ebyhrxumingming

Page 6: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization
Page 7: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Recent Improvements(since the launch)

Page 8: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

ORC Performance

Page 9: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

ORC Performance

Page 10: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Semijoin

Page 11: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Performance• S3 network bandwidth/latency for Parquet and ORC

• ZSTD and LZ4 for ORC/Parquet

• Skip redundant ORDER BY

• ORDER BY + LIMIT with OUTER JOIN

• IN (SELECT DISTINCT …)

• JOIN involving coercions and inline tables

• Spilling

• Coming soon: UNNEST improvements

• … and more

Page 12: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

ROW subscript operator

WITH t(r) AS ( VALUES ROW(ROW(1, 'a')), ROW(ROW(2, 'b'))) SELECT r[1], r[2] FROM t

r :: row(? smallint, ? varchar(1))

Access field by ordinal

Page 13: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Visualize plan structure

Clearer subplanschema

SELECT max(totalprice) FROM ( SELECT totalprice FROM orders ORDER BY orderkey)

Warn on redundant ORDER BY

Page 14: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Pushdown•Limit

•TableSample

•Filter (simple range predicates)

•Projection (column and ROW field dereference)

•Coming soon

•Generalized projections and filters

•Aggregation

•Join

https://github.com/prestosql/presto/issues/18

Page 15: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

New Plugins

• Elasticsearch connector

• Apache Phoenix connector

• Apache Ranger

• https://cwiki.apache.org/confluence/display/RANGER/Presto+Plugin

Page 16: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Other Improvements• Docker image

• Spill-to-disk improvements

• CLI output formats

• UUID type and functions

• format(), combinations() functions

• ORC bloom filters (non-legacy)

• Connector-provided view definitions

• More type mappings for various connectors

• … and more!

• FETCH FIRST … WITH TIES syntax

• OFFSET syntax

• COMMENT ON <table> IS …

• [LEFT/RIGHT/FULL] JOIN LATERAL (…) ON …

• Pass-through security (client provided credentials)

• Kerberos security improvements

• Role-based security

• Secure query results in client API

• Current user security mode for views

Page 17: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Roadmap

Page 18: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Roadmap

• Dynamic

• Real world priorities and requirements

• What volunteers work on

• Not a wish list

• https://github.com/prestosql/presto/labels/roadmap

Page 19: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Core Engine• Case-sensitive identifiers

• Timestamp semantics

• Dynamic filtering

• Dynamically-resolved functions

• SQL-defined functions (CREATE FUNCTION)

• Operator fusion and late materialization

Page 20: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Connectors

• Iceberg (in progress)

• Kinesis (in progress)

• Druid

• Pinot

• Clickhouse

Page 21: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Infrastructure

• Coordinator High Availability

• Spot instances

• Kubernetes

Page 22: Presto Summit SF 2019 - Starburst Data€¦ · Presto Summit SF 2019 Martin Traverso, Dain Sundstrom, David Phillips. Presto Software Foundation “An independent, non-profit organization

Getting Involved• Join Slack

• https://prestosql.io/community.html

• #troubleshooting channel

• File issues/bugs:

• https://github.com/prestosql/presto

• Write blog posts

• https://prestosql.io/blog