An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company...

31
Dipti Borkar Co-Founder & CPO | Ahana An introduction to Presto, an open source distributed SQL engine

Transcript of An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company...

Page 1: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

DiptiBorkarCo-Founder & CPO | Ahana

An introduction to Presto, an open source distributed

SQL engine

Page 2: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

Founder

Mom

����Immigrant

Girldatageek(DB)

Engineeralways

Producttechie

Teambuilder

Opensourcebeliever

Mixologist

Page 3: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

3

Agenda

• WhatisPresto?

• Historyoffederation

• IntroductiontoPresto

• WhatmadePrestodifferent?

• Scalablearchitecture

• FlexibleConnectors

• Performance

• Thelifeofaquery

Page 4: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

4

TechnologyCyclesRhyme:DataFederationFDBMSChallengesRDBMSFDBMSPaperbyMcCleod /Heimbigner (1985)FDBMSPaperbySheth /Larson(1990)

OLTPtoDWWinsDataWarehousebecomesthesourceoftruthStarschemabecomessacred

Cloud&BigDataComposite Software(founded2001)GarlicPaperbyLauraHaas(2002)à DB2FederatedGoogleFileSystemPaper(2003)MapReducepaper(2006)SparkPaper(2010)ToomanyDataSources,Nooneuberschema

NewCloudDWw/DataLakesBasedonSQLSelfServicePlatformswhichenableSelf-ServiceAnalytics

SQLFederationMakesComebackDremel Paper (2010) àDrill paper (2012)SQL ++ paper (2014) à Couchbase SQL++ engine (2018)Presto paper (2019), PartiQL (2019)

80’s

90’s

2000’s

2010’s

2020’s

Page 5: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

5

Presto:OneoftheFastestGrowingOpenSourceProjectsinDataAnalyticsBusinessNeeds

Data-drivendecisionmaking

Businessesneedmoredatatoiterateover

TechnologyTrends

DisaggregationofStorageandCompute

Theriseofdatalakes

Page 6: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

6

WhatisPresto?

• DistributedSQLqueryengine

• ANSISQLonDatabases,Datalakes

• Designedtobeinteractive

• Accesstopetabytesofdata

• Opensource,hostedongithub

• https://github.com/prestodb

Page 7: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

7

PrestoOverview

Page 8: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

8

CommonQuestions?

• Isprestoadatabase?

• HowisitrelatedtoHadoop?

• Howisitdifferentfromadatawarehouse?

Page 9: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

9

SamplePrestodeploymentstack&usecases

• Adhoc

• BItools

• Dashboard

• A/Btesting

• ETL/scheduledjob

• Onlineservice

Page 10: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

10

WhatmadePrestodifferent?

• Scalablearchitecture

• PluggableConnectors

• Performance

Page 11: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

11

ScalableArchitecture

• Tworoles- coordinatorand

worker

• Easyscaleupandscaledown

• Scaleupto1000workers

• Validatedatwebscaled

companies

Page 12: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

12

ScalableArchitecture

Page 13: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

13

PluggablePrestoConnectors

Page 14: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

14

PrestoConnectorDataModel

• Connector:Driverforadatasource.

• Example:HDFS,AWSS3,Cassandra,MySQL,SQLServer,Kafka

• Catalog:Containsschemasfromadatasourcespecifiedbythe

connector

• Schemas:Namespacetoorganizetables.

• Tables:Setofunorderedrowsorganizedintocolumnswithtypes.

Page 15: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

15

PrestoHiveConnectorforObjectstores&Filessystems

Page 16: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

16

PrestoHiveConnector– AccessControl

Page 17: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

17

PrestoHiveConnector– DataFileTypes

• SupportedFileTypes• ORC• Parquet• Avro• RCFile• SequenceFile• JSON• Text

• Nodataingestionneeded

Page 18: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

18

PrestoDruidConnectorforreal-timeanalytics

Page 19: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

19

WhyPrestoisFast

• In-Memoryprocessing

• Pullmodel

• Columnarstorageandexecution

Page 20: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

20

TheLifeofaQuery– SimpleScan

Page 21: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

21

TheLifeofaQuery– JoinandAggregationSELECT

orders.orderkey,SUM(tax)

FROM orders

LEFTJOINlineitem

ON orders.orderkey =lineitem.orderkey

WHERE discount=0GROUPBYorders.orderkey

This example is from Presto: SQL on Everything

https://research.fb.com/publications/presto-sql-on-everything/

Page 22: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

22

LogicalPlan- DoNOTJoinTwoBigTables

Page 23: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

23

Limitations

• MemoryLimitation

• FaultTolerance

• SingleCoordinator

Page 24: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

24

Getstarted

DockerSandboxforPresto

https://hub.docker.com/r/ahanaio/prestodb-sandbox

AWSSandboxAMIforPresto

https://ahana.io/tutorials/aws-sandbox/

Page 25: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

25

Ahana

• SQLanalyticscompanybasedonPresto

• Teamofexpertsincloud,database,andPresto

• InvestmentfromGoogleVentures

• NamedCRNTop10BigDataStartupof2020

• Premiermemberof “[Ahana founders] have been strongsupporters of the Presto Foundationsince its launch in September 2019”

“We are excited to welcome Ahana, asthe first and only company focused onsupporting Presto of the PrestoFoundation”

Page 26: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

https://events.linuxfoundation.org/prestocon/

PRESTO20WIBD

Free for WiBD Members

Page 27: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

27

JointhePrestoCommunity• Requirenewfeatureorfileabug:github.com/prestodb/presto• Slack:prestodb.slack.com• Twitter:@prestodb

Stay Up-to-Date with Ahana• URL: ahana.io

• Twitter: @ahanaio

Page 28: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

Q & A

And yes! We are hiring!

Page 29: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

8/27/20

Page 30: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

30

PrestoFoundation:CommunityDriven

Page 31: An introduction to Presto, an open source distributed Dipti ......25 Ahana • SQL analytics company based on Presto • Team of experts in cloud, database, and Presto • Investment

31

Data-DrivenCompaniesneedLowDataLatency

AnalystsandScientistsneedtoanswerquestions:

Thetimeittakesfromauserhavingaquestiontothetimetheycanactuallyanswerit

“DataLatency”=

1.Userwantstotrackorexploresomenewdata

2.UsermeetswithDataEng teamto

makeplan

3.Datateamacquiredataandcheck

accesspermissions

4.BuildandtesttheETLsandmake

tablesavailabletouser

5.Notifytheusersotheycanasktheir

questions

!Canbedaysorweeksof

time