Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai...

27
Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS) 2011
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    2

Transcript of Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai...

Page 1: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Monitoring and Debugging Dryad(LINQ) Applications

with Daphne

Vilas Jagannath, Zuoning Yin, Mihai BudiuUniversity of Illinois, Microsoft Research SVC

International Workshop onHigh-Level Parallel Programming Models and

Supportive Environments (HIPS) 2011

Page 2: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Programming Clusters: Marketing

Map-Reduce

Page 3: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Programming Clusters: Reality

Page 4: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Complexity Exposed

Correctness or performance bugsbreak the single-system abstraction

Page 5: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Outline

• Motivation• Job structure• The Job Object Model• Tools for job understanding• Conclusions

Page 6: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Execution

Application

Data-Parallel Computation

6

Storage

Language

Map-Reduce

GFSBigTable

CosmosAzureHPC

Dryad

DryadLINQScope

Sawzall,FlumeJava

Hadoop

HDFSS3

Pig, Hive≈SQL LINQ, SQLSawzall, Java

Page 7: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

7

2-D Piping• Unix Pipes: 1-D

grep | sed | sort | awk | perl

• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50

Page 8: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

8

Dryad Job Structure

grep

sed

sortawk

perlgrep

grepsed

sort

sort

awk

Inputfiles

Vertices (processes)

Outputfiles

ChannelsStage

Page 9: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

9

Dryad System Architecture

Networkjob schedule

data plane

control plane

NS,Sched Exec ExecExec

V V V

Job manager cluster

Page 10: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Fire

wal

l

How does it work in detail?

Cluster/Cloud

Cluster Scheduler

Job Manager(JM)

Exec

Storage

Localhost

Job Submission

Compiler

Application

IDE Vertex

Exec

Storage

Vertex

Exec

Storage

L: Logs, IO: Input/Output, R: Resources

L R IO L R IO L R IO

Page 11: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Logs – lots of them

• Job-related – Plan (xml), status, resources

• Job-manager– stdout.txt, stderr.txt, *.log

• Vertex– stdout.txt, *.log, *.xml, *.cmd

Page 12: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Monitoring Tools Structure

Cosm

os

Scop

e

HPC

v2

HPC

v3

Cluster abstraction

Job Object Model

Monitoring,Profiling,

Debugging

GUIs

Page 13: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Job Object Model

Logs

JOM

Views

JobVerticesPlan

Tools

Page 14: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Outline

• Motivation• Job structure• The Job Object Model• Tools for job understanding• Conclusions

Page 15: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

The Job BrowserJob Stage Vertex

Page 16: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Job Schedule

Page 17: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Failure diagnosis

Page 18: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Diagnosis decision tree

• “Hand-made”• Least portable tool• Incomplete• High-coverage• Bug types:– User level– System-level– Cluster malfunction

Page 19: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Powershell = Interactive Queries

$cluster = get-cluster X $job = $cluster | select-AllJobs | sort-object Date | select-object -last 1 | select-DryadJob$failed = $job.Vertices | where-object { $_.State -eq "Failed" }

Page 20: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Vertex Debugging on Client

Page 21: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Vertex Profiling on Client

Page 22: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Debugging on Cluster

Collection<T> collection;var results = from c in collection

where c.name.length > 10 orderby c.age

select c.name;

where c.name.length > 10

Program Job

Breakpoint

Page 23: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Fire

wal

l

Cluster/Cloud

Storage

L R

Remote debugging

Cluster Scheduler

Job Manager(JM)

Localhost

Job Submission

DryadLINQ

Application

Visual Studio Vertex 1 Vertex 2

Breakpoint hit…

Breakpoint

L: Logs, IO: Input/Output, R: Resources

attach

Exec

Storage

Exec

Storage

Exec

L R IO L R IO IO

Page 24: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Fire

wal

l

Cluster/Cloud

Exec Exec

Storage Storage Storage

L L L

Notifications: Our Implementation

Cluster Scheduler

Job Manager(JM)

Localhost

Job Submission

DryadLINQ

Application

Visual Studio Vertex 1 Vertex 2

Daphne

L: Logs, IO: Input/Output, R: Resources

Exec

R IO R IO R IO

attach

Page 25: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Remote debugging

Page 26: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Open Problems

• What happens when 100,000 processes hit a breakpoint?

• How to evaluate expressions in the debugger when state is distributed?

• How to do large-scale performance debugging?• How to preserve map between distributed state

and original program state?• How much can the illusion of a

single system be preserved?

Page 27: Monitoring and Debugging Dryad(LINQ) Applications with Daphne Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC.

Conclusions

• Single-machine abstractions break down in the presence of (performance/correctness) bugs

• Job Object Model insulates tools from messy details

• Design the cluster runtime to make iteasy to build a JOM

• Rich interactive tools easily built on top of JOM• Much more work needed for debugging at scale