HOMME Trace Analysis

19
HOMME Trace Analysis Fabrice Mizero Mentor: Dr. John Dennis Collaborators: Prof. Malathi Veeraraghavan (University of Virginia) Prof. Robert D. Russell (University of New Hampshire) Qian Liu(University of New Hampshire) Aug 1, 2014

description

HOMME Trace Analysis. Fabrice Mizero Mentor: Dr. John Dennis Collaborators: Prof. Malathi Veeraraghavan ( University of Virginia) Prof. Robert D. Russell (University of New Hampshire) Qian Liu(University of New Hampshire) Aug 1, 2014. Roadmap. Motivation Background Methodology - PowerPoint PPT Presentation

Transcript of HOMME Trace Analysis

Page 1: HOMME Trace Analysis

HOMME Trace AnalysisFabrice Mizero

Mentor: Dr. John Dennis

Collaborators:Prof. Malathi Veeraraghavan (University of Virginia)

Prof. Robert D. Russell (University of New Hampshire)Qian Liu(University of New Hampshire)

Aug 1, 2014

Page 2: HOMME Trace Analysis

2

• Motivation • Background• Methodology• Results • Conclusion and Solutions• Future Work

Roadmap

Page 3: HOMME Trace Analysis

3

• Understanding the causes of poor performance of CESM on Yellowstone: a 5-step approach Experimental execution and data collection HOMME trace analysis IBMgtSim: routing study Network simulation Integrated simulation

Big Picture

Page 4: HOMME Trace Analysis

4

2-hop

4-hop

6-hop*Credit: Dr. John Dennis Zhengyang Liu

Page 5: HOMME Trace Analysis

5

• Network Congestion Head of Line Blocking Credit-Based Flow Control

• OS Jitter Kernel Interrupts

• Application Interference: Self-Interference Interference with others (Neighborhood Effect)

Suspected Causes“…OS noise, shape of the allocated partition, and interference from other jobs.” Abhinav Bhatele et al. SC13

Page 6: HOMME Trace Analysis

6

H4

Congestion Head of Line Blocking (HOL)

Worst Case Scenario:Congestion Spreading due to HOL

H1

H2 H5

H3 H6

H7

S2S1

Stuck!!!

Out of Buffer

Space!! Out of Buffer

Space!!

Victim Flow

Page 7: HOMME Trace Analysis

7

• Each compute node runs its own OS - RHEL• Interference caused by OS routines

Timer interrupts OS Daemons Hardware interrupts

• Competition for CPU resources. Example: Line Printer Daemon

OS Jitter

Page 8: HOMME Trace Analysis

8

• How does congestion impact network latency?

• How important is OS Jitter to network latency?

• What has a bigger impact to message latency: OS Jitter or Congestion?

3 Questions

Page 9: HOMME Trace Analysis

9

• Congestion: 2 Platforms

• Jellystone: Non-production machine • Yellowstone: production machine

Different message sizes & Hop distance• OS Jitter:

Linux Transparent Huge Pages (THP)

Experimental Set-Up

Page 10: HOMME Trace Analysis

10

Methodology

Extrae Trace Collection

Hop, SizeHop, Size

Wilcoxon Rank Sum Test

Clock Skew Correction

Page 11: HOMME Trace Analysis

11

• Tracing tool Developed at BSC• Chronologic event, state, communications records• One way communication delays – Visuals with Paraver

Extrae

MPI-Isend

Start EndTime

Page 12: HOMME Trace Analysis

12

Clock Skew

Host A Ca(t1)

Host BCb(t2)

In reality, Offset = Ca(t) – Cb(t) != 0

Skew = Ca’(t) - Cb

’(t) != 0

Ideally, CAB= Cb(t2) – Ca(t1)

• Same size, Same Hop-Count, host-pair level Min delay: best approximation of offset CAB(t) – min( CAB(t)) + minpingpong

Page 13: HOMME Trace Analysis

13

• Wilcoxon Rank Sum Test: Non-parametric significance test Compare the means of two independent populations Tests:

• OS Jitter? Jellystone: no THP <=> with THP

• Congestion? Yellowstone: 0-Hop delays 4-Hop Delays Jellystone: THP Yellowstone: THP

Statistical Methods

Page 14: HOMME Trace Analysis

14

• Perfquery: IB performance counters query tool.• PortXmitWait: Port congestion monitoring

Credit-Based Flow control

Perfquery

Host A

TOR Switch

Credits?

No

Yes

PortXmitWait

Page 15: HOMME Trace Analysis

15

• How important is OS Jitter to network latency? Jellystone::0-Hop::NoTHP vs. Jellystone::0-Hop::THP

Intranode communications delays with THP enabled are slower than without THP.

Results

Msg size Sample size p-Value Interpretation

488B 54624::45727 <0.001, <0.001,1 NoTHP is faster than with THP

1952B 9503::7950 <0.001, <0.001,1 NoTHP is faster than with THP

2440B 102120::85468 <0.001, <0.001,1 NoTHP is faster than with THP

2928B 47504::39764 <0.001, <0.001,1 NoTHP is faster than with THP

Page 16: HOMME Trace Analysis

16

• What has a bigger impact to message latency: OS Jitter or Congestion? Comparing: Yellowstone: 0-Hop delays, 4-Hop delays

For all considered message sizes, intranode communications delays can outweigh internode delays

Results

Msg size Sample size p-Values Interpretation

488B 54325::23621 <0.001, <0.001,1 4-Hop is faster than 0-Hop

2440B 101581::16529 <0.001, <0.001,1 4-Hop is faster than 0-Hop

2928B 47243::21259 <0.001, <0.001,1 4-Hop is faster than 0-Hop

4880B 49603::4720 <0.001, <0.001,1 4-Hop is faster than 0-Hop

Page 17: HOMME Trace Analysis

17

• OS Jitter can cause performance degradation or variability.

• Inter-job interference can lead to application performance variability.

Solutions Congestion:

Dynamic Allocation of Virtual Lanes to redirect victim flows around congested ports.

OS Jitter: Linux Tickless Kernel MPI-3 for better control over share-memory

communications.

Conclusion

Page 18: HOMME Trace Analysis

18

• Further study on the Dynamic Virtual Lanes assignment solution

• Plan and collect new HOMME traces with PortXmitWait monitored and LSF Logs saved.

• Study intra-job interference• More efficient algorithm of correcting Clock Skew

Future Work

Page 19: HOMME Trace Analysis

Thank You

Fabrice [email protected]