Presto meetup 2015-03-19 @Facebook

29
Copyright ©2015 Treasure Data. All Rights Reserved. Presto as a Service Tips for operation and monitoring Dongmin Yu Treasure Data, Inc. [email protected] JeroMQ / ZeroMQ committer & maintainer Mar 19, 2015 Presto Meetup @ Facebook

Transcript of Presto meetup 2015-03-19 @Facebook

Page 1: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Presto  as  a  ServiceTips  for  operation  and  monitoring

Dongmin YuTreasure  Data,  Inc.min@treasure-­data.comJeroMQ /  ZeroMQ committer  &  maintainer

Mar  19,  2015Presto  Meetup @  Facebook

Page 2: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Topics  

• Presto  as  a  Service  in  Treasure  Data– Error  Recovery– Presto  Deployment

• Tips  for  Monitoring  Presto– JSON  API– Presto  +  Fluentd

• Custom  changes

2

Page 3: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Treasure  Data:  Presto  as  a  Service

3

Presto Public Release

Page 4: Presto meetup 2015-03-19 @Facebook

Hive

TD  API  /  Web  ConsoleInteractive  query

batch  query

Presto

Treasure  Data

PlazmaDB:MessagePack Columnar  Storage

td-­presto  connector

Page 5: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Deployment• Building  Presto  takes  more  than  20  minutes.

• Facebook  frequently  releases  new  versions

• Let  CircleCI build  Presto  – Deploy  jar  files  to  private  Maven  repository– We  sometime  use  non-­release  versions

• for  fixing  serious  bugs• hot-­fix  patches

• Integration  Test– td-­presto  connector

• PlazmaDB,  Multi-­tenant  query  scheduler• Query  optimizer

– Run  test  queries  on  staging  cluster– Presto  Verifier

5

Page 6: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Production:  Blue-­Green  Deployment• http://martinfowler.com/bliki/BlueGreenDeployment.html

• 2  Presto  Coordinators  (Blue/Green)– Route  Presto  queries  to  the  active  cluster– No  down-­time  upon  deployment

• Launch  Presto  worker  instances  with  chef      <-­ less  than  5  min.  in  AWS• Inactive  clusters  is  used  for  pre-­production  testing  and  customer  support

– Investigation  and  tuning  of  customer  query  performance– Trouble  shooting

6

Page 7: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Error  Recovery

• Presto  has  no  fault  tolerance• Error  types

– User  error• Syntax  errors

– SQL  syntax,  missing  function• Semantic  errors

– missing  tables/columns– Insufficient  resource  

• Exceeded  task  memory  size– Internal  failure

• I/O  error– S3/Riak CS

• worker  failure• etc.

7

Worth A Retry!

Page 8: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Failed  Query  Rate

8

Page 9: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved. 9

Page 10: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Query  Retry  Patterns  used  in  TD

• Error  code  +  message  pattern

10

Page 11: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Monitoring  Presto  with  Fluentd

11

Page 12: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Monitoring  Presto

• REST  API  for  monitoring  Presto  state– JSON  format

• (presto  server  IP):8080/v1/query– List  of  recent  queries    (BasicQueryInfo class)

• (presto  server  IP):8080/v1/query/(query  id)– Detailed  query  state  information– Query  plan,  tasks  and  running  worker  IDs  – Processed  rows/data  size

12

Page 13: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Query  List      /v1/query

13

Page 14: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Detailed  query  Info  /v1/query/(query  id)

14

Page 15: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

/ui/query-­execution/(query  id)

15

Page 16: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Complex  Queries

16

Page 17: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved. 17

Page 18: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Presto  Coordinator

• Organizes  query  execution  pipelines– Coordinates  presto  workers

• Retrieves  table  partition  and  split  location  from  connectors– Creates  distributed  query  plans

• Full  GC– Stalls  coordinator

• When  memory  is  insufficient– Use  memory-­rich  machine– GC  Tuning

• UseG1GC

18

Page 19: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

presto-­metrics  (Ruby)

• https://github.com/xerial/presto-­metrics

19

Page 20: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved. 20

Page 21: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Query  Collection  in  TD

• SQL  query  logs– query,  detailed  query  plan,  elapsed  time,  processed  rows,  etc.– newSetBinder(binder,EventClient.class).addBinding()

.to(FluentEventClient.class)

• Presto  is  used  for  analyzing  the  query  history

21

Page 22: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Daily/Hourly  Query  Usage

22

Page 23: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Query  Running  Time

• More  than  90%  of  queries  finishes  within  2  min.≒ expected  response  time  for  interactive  queries

23

Page 24: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Detecting  Anomaly

• Started  Query  Rate  (in  5min/15min)– If  no  query  has  started,  cluster  may  be  down  (or  not  started  properly)

• Processed  rows  in  a  query– Sum  up  the  number  of  the  processed  rows  from  all  of  the  sub  stages– Simple,  but  the  most  reliable  measure  

• Send  an  alert– Slack  notification– PagerDuty call

• JP/US  team  rotation

24

Page 25: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Benchmarking

• Query  performance  comparison– between  two  versions  of  Presto

• Benchmark– Run  query  set  multiple  times– Store  the  results  to  TD– Report  the  result  with  Presto

• Aggregation  query

25

Page 26: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Presto  Operation  Tool

• Prestop– Our  internal  tool  for  managing  multiple  presto  clusters• written  in  Scala

– Query  monitoring– Benchmarking– Workload  simulation

• stress  testing

• Monitoring– Datadog– PageDuty– ChartIO (query  stats)

26

Page 27: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

buffer

Optimizing  Scan  Performance  – Storage  Manager

• Fully  utilize  the  network  bandwidth  from  S3• TD  Presto  becomes  CPU  bottleneck

27

TableScanOperators

• s3  file  list• table  schema header  

request

S3 / RiakCS

• release(Buffer)

Buffer size limitReuse allocated buffers

Request Queue

• priority queue• max connections limit

HeaderColumn Block 0 (column names)

Column Block 1

Column Block i

Column Block m

MPC1 file

HeaderReader

• callback  to  HeaderParser

ColumnBlockReader

headerHeaderParser

• parse  MPC  file  header• column  block  offsets• column  names

column block requestColumn  block  requests

column block

prepare

buffer

MessageUnpackerMessageUnpacker

S3 read

S3 read

pull records

Retry  GET  request  on-­ 500  (internal  error)-­ 503  (slow  down)-­ 404  (not  found)-­ eventual  consistency

S3 read• decompression• msgpack-­java  v07• On-­demand  de-­ser

S3 read

S3 read

S3 read

Page 28: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

Multi-­tenancy:  Resource  Allocation• Price-­plan  based  resource  allocation

• Parameters– The  number  of  worker  nodes  to  use  (min-­candidates)– The  number  of  hash  partitions  (initial-­hash-­partitions)– The  maximum  number  of  running  tasks  per  account

• If  running  queries  exceeds  allowed  number  of  tasks,  the  next  queries  need  to  wait  (queued)

• Presto:  SqlQueryExecution class– Controls  query  execution  state:  planning  -­>  running  -­>  finished

• No  resource  allocation  policy

– Extended  TDSqlQueryExection class  monitors  running  tasks  and  limits  resource  usage• Rewriting  SqlQueryExecutionFactory at  run-­time  by  using  ASM  library

28

Page 29: Presto meetup 2015-03-19 @Facebook

Copyright ©2015 Treasure Data. All Rights Reserved.

WE  ARE  HIRING!

29

Check: www.treasuredata.com