Profiling Tools Introduction to Computer System, Fall 2015. (PPI, FDU) Vtune & GProfile.

Profiling Tools

Introduction to Computer System, Fall 2015. (PPI, FDU)

Vtune & GProfile

Profiling• In software engineering, profiling ("program profiling",

"software profiling") is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization.

• Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods.

Performance Tuning forIntel® Xeon Phi™ Coprocessors

Visualizing Performance Opportunities using Intel® VTune™ Amplifier

Introduction

Can profile host, offload or native coprocessor applications

Host-based profiling may be sufficient to identify vectorization/parallelism/ offload candidates Call stacks currently available for host only

Start with representative/reasonable workloads!

Use Intel® VTune™ Amplifier XE to gather hot spot data

Tells what functions account for most of the run time

Often, this is enough

But it does not tell you much about program structure

Move on to more detailed analyses

Hotspot (Statistical call tree)Hardware-Event Based Sampling

Thread Profiling

Visualize thread interactions on timelineBalance workloads

Easy set-up

Pre-defined performance profilesUse a normal production build

Compatible

Microsoft*, GCC*, Intel compilersC/C++, Fortran, Assembly, .NET*Latest Intel processorsand compatible processors1

Find Answers Fast

Filter out extraneous dataView results tied to source/assembly linesEvent multiplexing

Windows* or Linux*Visual Studio* Integration (Windows)

Standalone user interface and command line32 and 64-bit

Intel® VTune™ Amplifier XETune Applications for Scalable MulticorePerformance

Fast, Accurate Performance Profiles

1IA-32 and Intel® 64 architectures.Many features work with compatible processors.Event based sampling requires a genuine Intel Processor.

A Quick Tour Through Intel® VTune™AmplifierSetting up a project

Execution file, command line arguments, working directory

Search directories (standard binary libraries for Intel MPSS 3)

Quick tour of advanced setup dialog

Selecting a collector

Host versus native event collection

Launching a collection

Viewing results, source and assembly

VTune™ Amplifier XE visualizes performance

Instructions Navigator New New CompareOpenResult

Open PropertiesProject

Toolbar

Grid Pane

Grouping pull-down

Intel Confidential

Optimization Notice

Source View /

Per line localization

Intel Confidential

Optimization Notice

Source View /

View / Hot spotNavigation controls

Can also copy small data files onto card,but will need to be recopied after reboot.

Suggest create /tmp/usrname as workingdirectory

Intel Confidential

Optimization Notice

Assembly View /

For event collection the coprocessor istreated as a special HW architecture

General Exploration runs a set of events todrive top-down analysis

Intel Confidential

Optimization Notice

Assembly View /

VTune™ Amplifier

Intel Confidential

Optimization Notice

Advantage• Both command line and GUI, easy to use

• Multiple predefined analyzing suite

• Support hardware events like cache and memory access analysis

• Multithread profiling well supported

VTune™ Amplifier

Intel Confidential

Optimization Notice

Limitations• For enterprise use, Expensive!!!

• Can only be used on intel machines.

GPROF• Gprof is a performance analysis tool for Unix

applications. It uses a hybrid of instrumentation and sampling and was created as extended version of the older "prof" tool. Unlike prof, gprof is capable of limited call graph collecting and printing.

• Instrumentation code is automatically inserted into the program code during compilation (for example, by using the '-pg' option of the gcc compiler), to gather caller-function data. A call to the monitor function 'mcount' is inserted before each function call.

• gcc -Wall -g -pg -lc_p example.c -o example• ./example will create gmon.out• gprof -b example gmon.out

Result• Gprof output consists of two parts: the flat profile

and the call graph. The flat profile gives the total execution time spent in each function and its percentage of the total running time. Function call counts are also reported. Output is sorted by percentage, with hot spots at the top of the list.

Result% the percentage of the total running time of thetime program used by this function.

cumulative a running sumof the number of seconds accountedseconds for by this function and those listed above it.

self the number of seconds accounted for by thisseconds function alone. This is the major sort for this listing.

calls the number of times this function was invoked, if this function is profiled, else blank.

self the average number of milliseconds spent in thisms/call function per call, if this function is profiled, else blank.

total the average number of milliseconds spent in thisms/call function and its descendents per call, if this function is profiled, else blank. name the name of the function. This is the minor sort for this listing. The index shows the location of the function in the gprof listing. If the index is in parenthesis it shows where it would appear in the gprof listing if it were to be printed.

Advantages

• GNU is not UNIX(supported by GNU)• Unlimited by hardwares

Limitations

• Gprof cannot measure time spent in kernel mode (syscalls, waiting for CPU or I/O waiting), and only user-space code is profiled.

• Gprof profiles the main thread of application of multi-threaded application.

• Insert code when compiling.• No hardware events.

• man gprof• https://sourceware.org/binutils/

docs/gprof/

Open topic

Introduction to Computer System, Fall 2015. (PPI, FDU)

Attack: Stack Buffer Overflow• A Typical Buffer Overflow Attack

– Inject malicious code in buffer– Overwrite return address to

buffer– Once return, the malicious code

runs 0110110101010101010101101010101010101010

return addrsaved ebp

01010110101010111010

void function(char *str) { char buf[16]; strcpy(buf,str);}

Defense: DEP (Data Execution Prevention)

• Execute Code, not Data• Data areas marked non-

executable– Stack marked non-executable

• Hardware enforced (NX)• You can load your shellcode in the

stack …but you can’t jump to it

How to pwn?

• Give other ways of pwning except buffer overflow.

• Focusing on how to change the program form its normal execution path.

Debugging

How to Debug?

• The Program gets wrong results• Runs program in debug mode• Execute the code line by line to find

the cause

Can this always work well in a multi-threads program?

If not, why? what’s the difference between sequential bugs and parallel ones?

And how to debug a tricky multi-threads program?

Cache locality

• Cache locality is the key to achieving high levels of performance.

• We can improve cache locality by either optimizing our program or changing the cache strategy or the implementation.

• You can introduce some methods to improve the cache locality from certain perspective and present how it works.

Requirement

• Each student picks one topic and do a presentation with ppt slides.

• Any techniques or methods if you can finish presentation within 6 min

• 2015/10/30 6-7 classroom will be informed later.

• PPT slides should be emailed to your TA before 2015/10/29 23:59 p.m.

How to score high？• Illustrate your ideas clearly, you may refer to the

Internet or give out your own solution.• Remember time is limited, try to be precise and

concise.• Your presentation contains three part: PPT, oral

speaking and your content. All of these are important in grading.

Profiling Tools Introduction to Computer System, Fall 2015. (PPI, FDU) Vtune & GProfile.

Documents

Transcript of Profiling Tools Introduction to Computer System, Fall 2015. (PPI, FDU) Vtune & GProfile.

2012 FDU Women's Tennis Championship Guide

FDU-XT - Thuraya

FDU Folleto 2021-05may - Farmacias Union

WBB Game Notes vs. FDU, 2.23.13

FDU Teacher Presentation

FDU Study Abroad · form – put your study abroad courses and FDU equivalents for your dean’s approval 2. FDU Vancouver 3. FDU Study Abroad in China Study Abroad at FDU’s global

Starting a Digitization Program: the FDU Experience

FDU Men's XC Performance List after NEC

Remote Fire Annunciator FDU-80 - Keyhole Security Operating.pdf · Remote Fire Annunciator FDU-80 Instruction Manual. 2 FDU-80 Instruction Manual ... Acclimate Plus™, HARSH™ NIS™,

FDU Group Brochure

FDU Online Application Manual (English)

red wings in fdu,shanghai,china

slideshow of intel vtune

Fixed Docking Unit Thuraya FDU-3500

FDU FOUNDRY DEGASSING UNIT

Intel VTune ISAS

2017 FDU Seminar: Doing Business in China

FDU Magazine Winter/Spring 2012

2012 FDU Men's Tennis Championship Guide

Friends of Florham - FDU