TEZ-8 UI Walkthrough
-
Upload
t3rmin4t0r -
Category
Technology
-
view
384 -
download
7
Transcript of TEZ-8 UI Walkthrough
Page 2 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
TEZ (nomenclature)
• DAG
• Vertex
• Task
• Attempt
• Container
• Edge
Page 4 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
How to view raw DAGs from logs
• Tez Application logs contain .dot files in Graphviz format
• To generate images: dot –Tpng –o dag.png dag.dot
• OR javascript version: http://people.apache.org/~gopalv/dagviz/
Page 5 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
TEZ-8 JIRA & branch
• TEZ UI for progress tracking and history
• https://issues.apache.org/jira/browse/TEZ-8
• https://github.com/apache/tez/tree/TEZ-8
• UI-centric branch
Page 9 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Vertex -> Tasks view
Page 10 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Task logs
Task logs
Page 11 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Task counters
Task counters
Page 12 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Task counters
Search for
counters
Page 13 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Per-edge shuffle counters
Map 3 to Map 1 only
Page 15 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Failed DAGs (diagnostic)
Page 16 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Tez UI: Failed tasks indication
Failed tasks
Page 20 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Post-hoc/Ad-hoc analysis helpers
• tez/tez-tools ships with two helper tools
• swimlanes
• tez-tfile-parser
Page 21 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
Swimlanes
• ./yarn-swimlanes.sh application_1415860665053_0098
Page 22 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
TFile parser
• Tez logs can be parsed via PIG
• Allows us to treat our logs exactly like we treat our big-data
• Processing using “pig –x tez” + UDFs [1]
rawLogs = load ‘/app-logs/root/logs/application_1409012059361_0539/*' using
org.apache.tez.tools.TFileLoader() as (machine:chararray, key:chararray, line:chararray);
[1] - https://github.com/rajeshbalamohan/tez_log_parser/blob/master/src/main/resources/pig/udf.groovy
Page 23 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
TFile parser (contd)
• Parsing INFO logs for shuffle for instance (for time taken + machine)
Problematic machine
Page 24 © Hortonworks Inc. 2014 FOR: BAY AREA HADOOP USER GROUP MEETUP #49
TFile parser (node/rack traffic at 350 nodes)
Problematic machine
Fetcher in node-100 is always slow
(irrespective of where its pulling data from)
Other faulty nodes
Mapout served from node-100 to node-120
To any node is always slow