K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science...

29
K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon

Transcript of K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science...

Page 1: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

K T A UKernel Tuning and Analysis Utilities

Department of Computer and Information Science

Performance Research Laboratory

University of Oregon

Page 2: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

Agenda

• Motivations

• KTAU Overview

• ZeptoOS - KTAU - TAU on BG/L

• KTAU - TAU on Linux Cluster

Page 3: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

What is a process is doing inside a kernel?

Solution:

Context-of-Execution Based profile/trace

We can analyze the execution path of a process, and store the data local to a process.

Page 4: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

What about other processes on the system?

Solution:

System-wide performance analysis

By aggregating performance of each process in the system (all or selectively), we can capture interactions among processes.

Page 5: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

Profiling or Tracing?

Answer:

Why not doing both?

• Profile• A summarized view of performance data, with the

advantage of compact data size.

• Trace• A detail view of process execution timeline, with a

disadvantage of large data size.

Page 6: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

Why do we need another kernel profiling/tracing tool?

Answer:

Why not?

• LTT• Oprofile• KernInst

Page 7: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

KTAU Design Goals

• Fine-grained kernel-level performance measurement

– Parallel applications

– Support both profiling and tracing

• Both process-centric and system-wide view

• Merge user-space performance with kernel-space

• Detailed program-OS interaction data

• Analysis and visualization compatible with existing tools

Page 8: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

KTAU Method• Instruments Linux kernel source with KTAU profiling

API

• Maintains performance data for each kernel routine (per process)

• Performance data accessible via /proc filesystem

• Instrumented application maintains data in user-space

• Post-execution performance data analysis

Page 9: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

KTAU

Framework

Page 10: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

KTAU Architecture

5 modules

- KTAU Instrumentation

- KTAU Profiling/Tracing Infrastructure

- KTAU Proc Interface

- KTAU User-API Library

- KTAU-D

Page 11: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

Kernel Profiling Issues on BG/L

• I/O node kernel• Linux kernel approach

• Compute node kernel• No daemon processes• Single address space

– single performance database– single callstack across user/kernel

• Keeps track of one process only (optimization)• Instrumented compute node kernel

Page 12: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

KTAU on BG/L I/O Node

. . . . . . . . C N 2. . . . . . . .

. . . . . . . . C N 3. . . . . . . ..

…32 Compute Nodes….

. . . . . . . . C N 31. . . . . . . .

. . . . . . . . C N 32. . . . . . . ...

BG/L IO-Node

BG/L Compute-Node

ZeptoOS IO-N Kernel

KTAU

User-space + ZeptoOS RamDisk

IBM’sCIOD KTAU-D

IBM Compute-N Kernel

User-space

Compute Job w/ TAU

. . . . . . . . C N 2. . . . . . . .

. . . . . . . . C N 3. . . . . . . ..

…32 Compute Nodes….

. . . . . . . . C N 31. . . . . . . .

. . . . . . . . C N 32. . . . . . . ...

BG/L IO-Node

BG/L Compute-Node

ZeptoOS IO-N Kernel

KTAU

User-space + ZeptoOS RamDisk

IBM’sCIOD KTAU-D

IBM Compute-N Kernel

User-space

Compute Job w/ TAU

Page 13: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

KTAU on BG/L

• Current status– IO Node ZeptoOS kernel profiling/tracing

– KTAU integrated into ZeptoOS build system

– Detailed IO Node kernel observation now possible

– KTAU-Daemon (KTAU-D) on IO Node• monitors system-wide and individual process• more than what strace allows

– Visualization of trace/profile of ZeptoOS and CIOD• Vampir/JumpShot (trace), and Paraprof (profile),

Page 14: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

KTAU Usage Models for BG/L IO-Node

• Daemon-based monitoring (KTAU-D)– Use KTAU-D to monitor (profile/trace) a single process (e.g.,

CIOD) or entire IO-Node kernel– No access to source code of user-space program– CIOD kernel-activity available though CIOD source N/A

• ‘Self’ monitoring– A user-space program can be instrumented (e.g., with TAU)

to access its OWN kernel-level trace/profile data– ZIOD (ZeptoOS IO-D) source (when available) can be

instrumented– Can produce MERGED user-kernel trace/profile

Page 15: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

More on KTAU-D

• A daemon running on BG/L IO-node that periodically accesses kernel profile/trace data and outputs to filesystem

• Configuration done through ZeptoOS configuration tool

• KTAU-D, configuration file, and necessary scripts are integrated into the ZeptoOS runtime environment.

Page 16: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

KTAU-D Configuration in ZeptoOS-1.2

Page 17: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

KTAU-D Profile Data• KTAU-D can be used to access profile data (system-

wide and individual process) of BGL IO-Node

• Data is obtained at the start and stop of KTAUD, and then the resulting profile is generated

• Currrently flat profiles with inclusive/exclusive times and Function call counts are produced– (Future work: Call-graph profiles).

• Profile data is viewed using the ParaProf visualization tool

Page 18: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

Example of Operating System Profile on I/O Nodes

Running Flash3 on 32 compute-node

Ciod KernelProfile

Page 19: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

KTAU-D Trace

• KTAU-D can be used to access system-wide and individual process trace data of BGL IO-Node

• Trace from KTAU-D is converted into TAU trace-format which then can be converted into other formats– Vampir, Jumpshot

• Trace from KTAU-D can be used together (merged) with trace from TAU to monitor both user and kernel space activities– (Work in progress)

Page 20: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

Exp 1: Observe activities on the IO node

Set up:– KTAU:

• Enable all instrumentation points• Number of kernel trace entries per process = 10K

– KTAU-D:• System-wide tracing• Accessing trace every 1 second and dump trace output

to a file in user’s home directory through NFS

– IOTEST:• An mpi-based benchmark (open/write/read/close)• Running with default parameters (block-size = 16MB) on

NFS.

Page 21: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

Read Time

Write Time

Write Seek Time

Read Seek Time

Main

IOTESTwith TAU

instrumentation

Page 22: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

sys_write() / sys_read()

KTAU Trace of CIOD running 2, 4, 8, 16, 32 nodes

As the number of compute node increase, CIOD has to handle larger amount of sys_call

being forwarded.

Page 23: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

Zoomed View of CIOD Trace (8 compute nodes)

Page 24: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

Can Correlate CIOD Activity with RPC-IOD?

• Activity within a BG/L ionode system switching from “CIOD” to “rpciod” during a “sys_write” system call

• rpciod performs “socket_send” and interrupt handling before switching back

rpciod

ciod

Page 25: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

Exp 2: Correlating multiple traces from Compute-node and IO-node

• Set up:– Running IOTEST with TAU instrumentation on 64

compute nodes– Running ZeptoOS-1.2 with KTAU on 2 io-node– Reduced set of kernel instrumentation.

• No TCP stack and schedule()

– 10K entries of ring-trace buffer– Using PVFS2

(Note: Trace of 64 compute-node and 2 io-node)

Page 26: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

read() @ 12:678 sec

write() @ 3:283 sec

TAU Trace

Page 27: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

sys_open() @ 53:1 sys_read() @1:05:545sys_write() @ 56:6

sys_open() @ 53:2 sys_write() @ 56:85 sys_read() @ 1:05:778

ciod on ionode23

ciod on ionode47

pvfs2-client on ionode23

pvfs2-client on ionode47

Page 28: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

Exp 3: Analyze system-wide performance

• Set up:– 2 runs of IOTEST with TAU instrumentation on 32

compute nodes• NFS• PVFS

– Running ZeptoOS-1.2 with KTAU on 1 io-node– Analyzing both profile and trace data

Page 29: K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

University of Oregon Performance Research Lab

write() @ 39:00 read() @ 47.804

write() @ 42:99 read() @ 54:61

pvfs2-client

ciod

rpciod

ciod