Presto - swiss army SQL knife on Hadoop

http://allegro.tech/@AllegroTechBlog

Marek GawińskiDariusz Eliasz

Apache: Big Data North America 2017

Agenda

● About Allegro● Technical history of Allegro platform● Architecture of our data computation stack● Leaking component in desired architecture● What is Presto ?● Story of some Presto implementation● Swiss knife - 3 dimensions● Lessons learned● Q&A

About us

https://github.com/allegro

Technical history of Allegro platform

Monolyth Oriented Architecture

Service Oriented Architecture

Relational DB+

Exadata

Distributed DB Engines

+ Message Bus

+Hadoop

BI Team+

DWH Team

Everybody workswithdata

Architecture of our data computation stack

Leaking component in desired architecture

What if not Exadata ?

Leaking component in desired architecture

Wish list:

- Low entry threshold tool:- AnsiSQL- JDBC interface (DataGrip etc.)- Legible documentation- Predefined UDF- Users didn’t need special knowledge related with queries construction

- Project with solid community- Works with secured cluster

What is Presto?

Apache: Big Data North America 2017Source: https://prestodb.io/overview.html

“Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.”

“Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organizations like Facebook.”

Who’s using Presto?

“Facebook uses Presto for interactive queries against several internal data stores, including their 300PB data warehouse. Over 1,000 Facebook employees use Presto daily to run more than 30,000 queries that in total scan over a petabyte each per day.

Leading internet companies including Netflix, Airbnb and Dropbox are using Presto.”

Presto Architecture

Presto Connectors

● Accumulo● Black Hole● Cassandra● Hive● Hive Security● Memory● JMX● Kafka

● Local File● MongoDB● MySQL● PostgreSQL● Redis● SQL Server● System● TPCH

Presto Data Formats

● Text● Avro● SequenceFile● RCFile● ORC● Parquet

What is Presto?

Our trip to production

Secured CoordinatorLDAP + HTTPS

Swiss army SQL knife - 3 dimensions

Business dimension

Flexibility of solutionAsk fast - fail fast

Lessons learned

● Engaging users right from the start of the project● Be in touch with developers and community● Pay attention to data formats (prices as string)● Monitor all metrics● Test every format used - Avro, Parquet problems● Read source code to learn deeper how things works● Test tuning options● Teradata vs Facebook version● Lack of CBO (will be soon) - don’t trust benchmarks

Contact:dariusz.eliasz@allegrogroup.commarek.gawinski@allegrogroup.com

Allegro Tech blog - http://allegro.techMeetup - https://www.meetup.com/allegrotechFacebook allegro Tech - https://www.facebook.com/allegro.techTwitter Allegro Tech - @AllegroTechBlog

Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack...

Transcript of Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack...

Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack...

Documents

Transcript of Presto - swiss army SQL knife on Hadoop · PDF fileArchitecture of our data computation stack...

PRESTO software

Presto! PageManager

Presto at Hadoop Summit 2016

Presto - Hadoop Conference Japan 2014

Presto! PageManager Overviewscanners.fcpa.fujitsu.com/scanzen/presto_page_manager.pdf · Presto! PageManager Overview Presto! PageManager is a document management software that is

Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)

PRESTO Implementation Update - June 2019 · PRESTO Tickets and approved recommendations to introduce one-ride, two-ride, and day pass PRESTO Tickets, and allow for PRESTO Ticket expiry,

PRESTO - Counsellor

Advanced Database Technologies - · PDF fileArchitecture: Hadoop Cluster ... 2009 reintroduced on a meetup for “open source, ... Lecture: Advanced Database Technologies Summer 2011

Building Scalable Big Data Solutions - · PDF fileARCHITECTURE. Before. Data Processing in the Cloud. After. S3 EMR EMR EMR EMR. UNIQUE FEATURES & ADVANTAGES. ... Java,Hadoop, Pig,

Presto Caligrafix

Presto Testing Tools: Benchto & Tempto (Presto Boston Meetup 10062015)

Presto at Pinterest - Starburst Data · Scale at Pinterest Biz Scale 300M+ MAUs 200B+ Pins 4B+ Boards Data Scale 400+ PB @ S3 Peak of 80k Hadoop jobs per day 10,000+ Hive/Hadoop nodes

Presto 2.0 Introduction - What is Presto

Notice installation PRESTO V5 - cecoiawiki.ac-creteil.frcecoiawiki.ac-creteil.fr/wiki/telechargement/PRESTO/PRESTO_V5.0/... · Notice d’installation PRESTO V5.0 DSI CRETEIL Assistance

Manual presto

Consuming Data Lakelearnandbecurious.cloud/data/decks/Consuming_Data_Lake.pdf · BI & Data Visualization Kinesis Streams & Firehose Batch EMR Hadoop, Spark, Presto Redshift Data Warehouse

The Enterprise Presto Company STARBURST Presto: SQL-on ...biconsulting.hu/letoltes/2018budapestdata/wojciech_biela_presto_sql_on_anything.pdf · The Enterprise Presto Company STARBURST

Presto for the Enterprise @ Hadoop Meetup

presto Thermoplastic Tubing - kovaz.cz a koncovky... · presto Tubing is not approved for food applications. presto presto presto presto. 4 Catalogue 5210-UK Thermoplastic Tubing