Tuning up with Apache Tez

20
Tuning Up With Apache Tez Gal Vinograd @ Crosswise - 2016/03/09

Transcript of Tuning up with Apache Tez

Page 1: Tuning up with Apache Tez

Tuning Up With Apache Tez

Gal Vinograd @ Crosswise - 2016/03/09

Page 2: Tuning up with Apache Tez
Page 3: Tuning up with Apache Tez

Agenda

The Pipeline

The Problem

Why we chose Tez

Lessons Learned

Demo

Page 4: Tuning up with Apache Tez

The BatchInternet

Labels

Data

Page 5: Tuning up with Apache Tez

Internet

Labels

Data~200 Scripts

Page 6: Tuning up with Apache Tez

250 c3.2xlarge X30 hours

10TB per Batch

Page 7: Tuning up with Apache Tez
Page 8: Tuning up with Apache Tez

“Tez aims to be a general purpose execution runtime that enhances various scenarios that are not well served by classic Map-Reduce. In the short term the major focus is to support Hive and Pig ...”

Tez Design v1.1

Page 9: Tuning up with Apache Tez

“Tez aims to be a general purpose execution runtime that enhances various scenarios that are not well served by classic Map-Reduce. In the short term the major focus is to support Hive and Pig ...”

Tez Design v1.1

Page 10: Tuning up with Apache Tez

Hortonworks

Page 11: Tuning up with Apache Tez

The Batch

Internet

Labels

Data~200 Scripts

Page 12: Tuning up with Apache Tez

Tez Atomic Components

Tokenizer

Aggregator

Edge

Vertex

Vertex

Page 13: Tuning up with Apache Tez

Logical and Physical Graphs

PhysicalLogical

Hortonworks

Page 14: Tuning up with Apache Tez

OptimizationsNo “NOP” Map

Project

Distinct

GroupBy

NOP

Project

Distinct

GroupBy

Tez MR

Page 15: Tuning up with Apache Tez

OptimizationsNo Barrier Between Jobs

Project

GroupBy

Project

Project

Distinct

Project

Distinct

GroupBy

Tez MR

Page 16: Tuning up with Apache Tez

OptimizationsNo Redundant Resource Allocation

Project

Project

Distinct

GroupBy

Project

Project

Distinct

GroupBy

Pig Process

Pig Process

Tez MR

Page 17: Tuning up with Apache Tez

OptimizationsSessions

Allocate

Submit 2

Submit 1

Cleanup

Client

Page 18: Tuning up with Apache Tez

Lessons Learned

Some Pig Tasks Did Not Compile \ Occasionaly Froze

No DistributedCache Support For S3

Poor Amazon Support

No Pre-Built Releases

Additional Deployment for Tez UI

Page 19: Tuning up with Apache Tez

What is it good for?

EarilyAdopters

Pig \ Hive Bounded

Page 20: Tuning up with Apache Tez

Thanks for Listening!