Presto @ Treasure Data - Presto Meetup Boston 2015

16
Designing An Evolving Database Service with Presto Taro L. Saito [email protected] Oct 6th, 2015. Presto Meetup @ Boston

Transcript of Presto @ Treasure Data - Presto Meetup Boston 2015

Page 1: Presto @ Treasure Data - Presto Meetup Boston 2015

Designing An Evolving Database Service with Presto

Taro L. Saito [email protected]

Oct 6th, 2015. Presto Meetup @ Boston

Page 2: Presto @ Treasure Data - Presto Meetup Boston 2015

Presto Usage at Treasure Data

2

• 100~ customers are actively using Presto • 30,000~ Presto queries every day • Importing 1,000,000~ records / sec.

Import Export

Store Analyze with Presto/Hive

Page 3: Presto @ Treasure Data - Presto Meetup Boston 2015

Mobile and Web Sources

Mobile SDKs

JavaScript SDK (web access logs)

3

Page 4: Presto @ Treasure Data - Presto Meetup Boston 2015

Stream Sources

Streaming

Apache Logs nginx logs

syslogJSON logs

4

JSON

Page 5: Presto @ Treasure Data - Presto Meetup Boston 2015

Existing Data Sources

Bulk Import

Data files (CSV, TSV, etc.) MySQL

PostgreSQLOracle

5

Page 6: Presto @ Treasure Data - Presto Meetup Boston 2015

Embedded Devices

• Collect data from Embedded linux, serial devices, MQTT, XBee Radio, etc.

6

Page 7: Presto @ Treasure Data - Presto Meetup Boston 2015

Import data, now.

7

Page 8: Presto @ Treasure Data - Presto Meetup Boston 2015

Treasure Data Architecture

8

LogLogLogLogLogLog

1-hourpartition1-hour

partition1-hourpartition

Hadoop MapReduce

2015-09-29 01:00:00

2015-09-29 02:00:00

2015-09-29 03:00:00

Real-Time Storage

ArchiveStorage

time column-based partitioning…

Hive Presto

Log

many small log files log merge job

LogLogLogLogLog

Distributed SQL Query Engine

S3 (AWS) Rick CS (IDCF)

Columnar Format

Page 9: Presto @ Treasure Data - Presto Meetup Boston 2015

• JSON data • {“time”: 1412380700, “user”:1}

• Additional Column • {“time”: 1412381000, “user”:2, “status”:200}

• Type Escalation (int -> string) • {“time”: 1412390000, “user”:”U01”, “status”:200}

• MessagePack • A fast and compact JSON-like format

• Auto type conversion • Table schema <=> MessagePack types

Extensible Columnar Store

9

Page 10: Presto @ Treasure Data - Presto Meetup Boston 2015

Use Cases

Page 11: Presto @ Treasure Data - Presto Meetup Boston 2015

E-COMMERCE

BEFORE

AFTER

Biggest Mobile Shopping

WISH.COM

• Reduced costs

• Scalability

• Single data warehouse11

Page 12: Presto @ Treasure Data - Presto Meetup Boston 2015

GAMING

BEFORE

AFTER

Daily Upload Delay of 1-2 days

2500+ servers

Real-timeReal-time

2500+ servers

1 Billion records/day

• Reduced TCO

• Real-time collection

• Real-time access to KPIs

Top 10 globally; 40M+ users

x 20

12

Page 13: Presto @ Treasure Data - Presto Meetup Boston 2015

AD TECH

Publishers’ Dashboard Advertisers’ Dashboard

• 800 B/month

• Live in 2 weeks with 1 engineer!

• 300% growth

Europe’s largest mobile ad-exchange

More than 50 billion impressions/month

13

Page 14: Presto @ Treasure Data - Presto Meetup Boston 2015

LOYALTY

Aggregation

E-CommerceMarketing Campaigns;

Promotions

• Customer Segmentation

• A/B Testing

14

Page 15: Presto @ Treasure Data - Presto Meetup Boston 2015

Challenges• Handle Huge Query Result Output

• SELECT */ CREATE TABLE AS /INSERT INTO • Parallel Result Upload to S3

• Bypass JSON result generation at the coordinator

• td-presto connector • Accesses MessagePack based columnar store • Handle S3 access retry / pipelining

• Future: • Better query plan visualization

• Quickly find the performance bottleneck and memory consuming tasks • Storing intermediate query results to disks

• Process large joins, query resource limitation

15

Page 16: Presto @ Treasure Data - Presto Meetup Boston 2015

Extensible Schema SQL via Hive, Presto

Unlimited Users, Queries

Enterprise Apps

Enterprise Apps Data Science Tools

REST API

Ingestion: Streaming, Bulk

BI Tools

treasuredata.com/request_demo