Profiling Tools Introduction to Computer System, Fall 2015. (PPI, FDU) Vtune & GProfile.

Post on 17-Jan-2016

216 views 3 download

Transcript of Profiling Tools Introduction to Computer System, Fall 2015. (PPI, FDU) Vtune & GProfile.

Profiling Tools

Introduction to Computer System, Fall 2015. (PPI, FDU)

Vtune & GProfile

Profiling• In software engineering, profiling ("program profiling",

"software profiling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization.

• Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods.

Performance Tuning forIntel® Xeon Phi™ Coprocessors

Visualizing Performance Opportunities using Intel® VTune™ Amplifier

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Introduction

Can profile host, offload or native coprocessor applications

Host-based profiling may be sufficient to identify vectorization/parallelism/ offload candidates Call stacks currently available for host only

Start with representative/reasonable workloads!

Use Intel® VTune™ Amplifier XE to gather hot spot data

Tells what functions account for most of the run time

Often, this is enough

But it does not tell you much about program structure

Move on to more detailed analyses

2

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

Hotspot (Statistical call tree)Hardware-Event Based Sampling

Thread Profiling

Visualize thread interactions on timelineBalance workloads

Easy set-up

Pre-defined performance profilesUse a normal production build

Compatible

Microsoft*, GCC*, Intel compilersC/C++, Fortran, Assembly, .NET*Latest Intel processorsand compatible processors1

Find Answers Fast

Filter out extraneous dataView results tied to source/assembly linesEvent multiplexing

Windows* or Linux*Visual Studio* Integration (Windows)

Standalone user interface and command line32 and 64-bit

3

Intel® VTune™ Amplifier XETune Applications for Scalable MulticorePerformance

Fast, Accurate Performance Profiles

1IA-32 and Intel® 64 architectures.Many features work with compatible processors.Event based sampling requires a genuine Intel Processor.

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice4

A Quick Tour Through Intel® VTune™AmplifierSetting up a project

Execution file, command line arguments, working directory

Search directories (standard binary libraries for Intel MPSS 3)

Quick tour of advanced setup dialog

Selecting a collector

Host versus native event collection

Launching a collection

Viewing results, source and assembly

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

VTune™ Amplifier XE visualizes performance

5

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

VTune™ Amplifier XE visualizes performance

6

Instructions Navigator New New CompareOpenResult

Open PropertiesProject

Toolbar

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

VTune™ Amplifier XE visualizes performance

13

Grid Pane

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

VTune™ Amplifier XE visualizes performance

14

Grid Pane

Grouping pull-down

VTune™ Amplifier XE visualizes performance

Intel Confidential

Optimization Notice

18

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.6/29/2014

Source View /

Per line localization

VTune™ Amplifier XE visualizes performance

Intel Confidential

Optimization Notice

19

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.6/29/2014

Source View /

View / Hot spotNavigation controls

Can also copy small data files onto card,but will need to be recopied after reboot.

Suggest create /tmp/usrname as workingdirectory

VTune™ Amplifier XE visualizes performance

Intel Confidential

Optimization Notice

20

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.6/29/2014

Assembly View /

View / Hot spotNavigation controls

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

For event collection the coprocessor istreated as a special HW architecture

21

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.Optimization Notice

General Exploration runs a set of events todrive top-down analysis

25

VTune™ Amplifier XE visualizes performance

Intel Confidential

Optimization Notice

20

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.6/29/2014

Assembly View /

View / Hot spotNavigation controls

VTune™ Amplifier

Intel Confidential

Optimization Notice

20

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.6/29/2014

Advantage• Both command line and GUI, easy to use

• Multiple predefined analyzing suite

• Support hardware events like cache and memory access analysis

• Multithread profiling well supported

VTune™ Amplifier

Intel Confidential

Optimization Notice

20

Copyright © 2014, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.6/29/2014

Limitations• For enterprise use, Expensive!!!

• Can only be used on intel machines.

GPROF• Gprof is a performance analysis tool for Unix

applications. It uses a hybrid of instrumentation and sampling and was created as extended version of the older "prof" tool. Unlike prof, gprof is capable of limited call graph collecting and printing.

Usage

• Instrumentation code is automatically inserted into the program code during compilation (for example, by using the '-pg' option of the gcc compiler), to gather caller-function data. A call to the monitor function 'mcount' is inserted before each function call.

• gcc -Wall -g -pg -lc_p example.c -o example• ./example will create gmon.out• gprof -b example gmon.out

Result• Gprof output consists of two parts: the flat profile

and the call graph. The flat profile gives the total execution time spent in each function and its percentage of the total running time. Function call counts are also reported. Output is sorted by percentage, with hot spots at the top of the list.

Result%                       the percentage of the total running time of thetime                    program used by this function.

cumulative          a running sumof the number of seconds accountedseconds            for by this function and those listed above it.

self                   the number of seconds accounted for by thisseconds            function alone. This is the major sort for this                         listing.

calls                  the number of times this function was invoked, if                         this function is profiled, else blank.

self                  the average number of milliseconds spent in thisms/call              function per call, if this function is profiled,                        else blank.

total                 the average number of milliseconds spent in thisms/call              function and its descendents per call, if this                         function is profiled, else blank.      name                the name of the function. This is the minor sort                         for this listing. The index shows the location of                         the function in the gprof listing. If the index is                         in parenthesis it shows where it would appear in                         the gprof listing if it were to be printed.      

Advantages

• GNU is not UNIX(supported by GNU)• Unlimited by hardwares

Limitations

• Gprof cannot measure time spent in kernel mode (syscalls, waiting for CPU or I/O waiting), and only user-space code is profiled.

• Gprof profiles the main thread of application of multi-threaded application.

• Insert code when compiling.• No hardware events.

More

• man gprof• https://sourceware.org/binutils/

docs/gprof/

Open topic

Introduction to Computer System, Fall 2015. (PPI, FDU)

Pwned

Attack: Stack Buffer Overflow• A Typical Buffer Overflow Attack

– Inject malicious code in buffer– Overwrite return address to

buffer– Once return, the malicious code

runs 0110110101010101010101101010101010101010

return addrsaved ebp

ebp

buf

01010110101010111010

void function(char *str) { char buf[16]; strcpy(buf,str);}

Defense: DEP (Data Execution Prevention)

• Execute Code, not Data• Data areas marked non-

executable– Stack marked non-executable

• Hardware enforced (NX)• You can load your shellcode in the

stack …but you can’t jump to it

slide 30

How to pwn?

• Give other ways of pwning except buffer overflow.

• Focusing on how to change the program form its normal execution path.

Debugging

How to Debug?

• The Program gets wrong results• Runs program in debug mode• Execute the code line by line to find

the cause

Can this always work well in a multi-threads program?

If not, why? what’s the difference between sequential bugs and parallel ones?

And how to debug a tricky multi-threads program?

Cache

Cache locality

• Cache locality is the key to achieving high levels of performance.

• We can improve cache locality by either optimizing our program or changing the cache strategy or the implementation.

• You can introduce some methods to improve the cache locality from certain perspective and present how it works.

Requirement

• Each student picks one topic and do a presentation with ppt slides.

• Any techniques or methods if you can finish presentation within 6 min

• 2015/10/30 6-7 classroom will be informed later.

• PPT slides should be emailed to your TA before 2015/10/29 23:59 p.m.

How to score high?• Illustrate your ideas clearly, you may refer to the

Internet or give out your own solution.• Remember time is limited, try to be precise and

concise.• Your presentation contains three part: PPT, oral

speaking and your content. All of these are important in grading.