Bhanu Shankar, Ph.D.
Architect, 3D XPoint™ Performance Analysis
Intel Corporation
May 17, 2016
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Legal Notices and DisclaimersIntel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer.
No computer system can be absolutely secure.
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K.
The products described may contain design defects or errors known as errata which may cause the product to deviate from publishedspecifications. Current characterized errata are available on request.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate.
Intel, Xeon, Xeon Phi, Core, VTune, Atom, Quark and the Intel logo are trademarks of Intel Corporation in the United States and other countries.
*Other names and brands may be claimed as the property of others.
© 2016 Intel Corporation.
2
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
33
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
In a single phrase “VTune is the best oscillator for Intel® Platforms”
If there is something to measure on the platform, VTune can do it
Learn a single tool
Use it on multiple Operating Systems
– Windows / Linux / FreeBSD / Android / VxWorks
Use it on Multiple Platforms
– Quark, Atom Family, Core Family, Xeon family, Xeon Phi family
Updated often with new Analyses modes for better insight
Intel® VTune™ AmplifierGet Faster Code Faster
4
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice5
Familiarity with the basics of Intel® VTune™ Amplifier
Create Projects
Starting a profiling run
– Choose Target and Analysis Type
Types of Analyses available in VTune™ Amplifier
VTune Panes
– Role of the Grid
– Timeline Views
– Grouping Toolbar
Familiarity with the basics
Parallel programming using OpenMP
Intel x86 assembly language
Basics of compiler optimizations
Cache and Memory hierarchies
Audience Knowledge
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
I will start with an application and work through the process analyzing its performance.
The focus of this process is to allow you, the user, to be able to find out if your application is memory bound.
If so, is the memory boundedness caused due to NUMA behavior
The application is a modified version of the stream benchmark
Freely available at: http://www.cs.virginia.edu/stream
A simple, synthetic benchmark designed to measure sustainable memory bandwidth
Synopsis of this webinar
6
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
First - Run Advanced Hotspots Analysis
Identify the hotspots
Characterize the application behavior
Secondly - Run General Exploration Analysis
Identify areas to explore after the basic algorithm / hotspot
Lastly – Run Specialized Analysis
For this example - Memory Analysis
– Memory Analysis without objects
– Memory Analysis with objects (Linux only)
General Methodology for using VTune
7
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Let’s get started
8
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
First - Run Advanced Hotspots
Identify the hotspots
Characterize the application behavior
Secondly - Run General Exploration
Identify areas to explore after the basic algorithm / hotspot
Lastly – Run Specialized Analysis
For Instance - Memory Analysis
– Memory Analysis without objects
– Memory Analysis with objects (Linux only)
Step 1:
9
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice10
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Application – Hotspot – Bottom Up Tab
11
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Application Source/Object
Source code is simple
object code is straight forward
Why the large CPI?
Not caused by algorithm
Must be machine specific
12
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
First - Run Advanced Hotspots
Identify the hotspots
Characterize the application behavior
Secondly - Run General Exploration
Identify areas to explore after the basic algorithm / hotspot
Lastly – Run Specialized Analysis
For Instance - Memory Analysis
– Memory Analysis without objects
– Memory Analysis with objects (Linux only)
Step 2
13
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Summary Page
14
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
General Exploration – Bottom Up Tab
Same Loops as earlier
15
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Let’s Explore – Source level
Yes, Indeed – We have a bottleneck in the memory hierarchy
16
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
First - Run Advanced Hotspots
Identify the hotspots
Characterize the application behavior
Secondly - Run General Exploration
Identify areas to explore after the basic algorithm / hotspot
Lastly – Run Specialized Analysis
For Instance - Memory Analysis
– Memory Analysis without objects – Do we have a bandwidth problem?
– Memory Analysis with objects (Linux only)
Step 3: Find the memory bandwidth
17
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
How do I run Memory Access Analysis?
Make sure this box is unchecked.
18
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Memory Access - Summary
Looks like a problem accessing remote DRAM
19
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Memory Access – Bottom-Up View
Imbalance in memory access across both sockets
Average latency is fairly large
20
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
First run Advanced Hotspots
Identify the hotspots
Characterize the application behavior
Secondly run General Exploration
Identify areas to explore after the basic algorithm / hotspot
Lastly – Run Specialized Analysis
For Instance - Memory Analysis
– Memory Analysis without objects
– Memory Analysis with objects (Linux only)
Step 4: Identify the memory object(s)
21
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Identify the Memory Objects - Configuration
Make sure this box is checked.Minimal size of object to track.
22
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Identify the Memory Objects
Location of the heap object
Average Latency is large
23
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Dive into an object
Access to the object in a parallel region - Good
Access to the object in a serial region –Hmmm…Investigate
24
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Serial Access to memory object
This is where memory is first touched.BINGO!!! Linux stripes memory to local memory of socket!!!
25
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Did it work? Analyze the fixed applicationRun Memory Access on fixed code
26
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Fixed Code: Summary Page
Effects of NUMA completely disappearedRemote DRAM access are minimal
27
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
The stream benchmark has 5 loops that are parallelized
Locate the loops by tagged with “#pragma omp parallel for”
Remove the “#pragma omp parallel for” for each or multiple loops
Run Intel® VTune™ Amplifier
See the effects of memory placement and parallel execution
Try the compare results feature on your runs of VTune using the icon
28
Lab exerciseTry out what you just learned
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Memory and Inter-socket Bandwidth
Memory Latency
Memory Hierarchy
False Sharing
True Sharing
Effectiveness of Lockless Algorithms
What other problems can I diagnose this way?
29
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Intel® VTune™ Amplifier continues to add tools to the toolbox to diagnose system performance problems
Memory Access Analysis is one such powerful tool
Stay tuned for more such tools in the future
Summary
30
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice
Download and Evaluate Intel® VTune™ Amplifier
https://software.intel.com/en-us/intel-vtune-amplifier-xe
Intel® VTune™ Amplifier Support
https://software.intel.com/en-us/intel-vtune-amplifier-xe-support
Get Help: Ask the Community
https://software.intel.com/en-us/forums/intel-vtune-amplifier-xe
NUMA Architecture
https://software.intel.com/en-us/articles/a-brief-survey-of-numa-non-uniform-memory-architecture-literature
Stream Benchmark
http://www.cs.Virginia.edu/stream
or type “stream benchmark” into your favorite search engine
31
Call to Action
Copyright © 2016, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others.
Optimization Notice32
Questions?
Top Related