[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanbat and ayman shoukly)
Visual C++ 2005 New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation.
Transcript of Visual C++ 2005 New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation.
Visual C++ 2005 New Visual C++ 2005 New OptimizationsOptimizations
Ayman ShoukryAyman ShoukryProgram ManagerProgram ManagerVisual C++Visual C++Microsoft CorporationMicrosoft Corporation
How can your application run How can your application run faster?faster?
► Maximize optimization for each file.Maximize optimization for each file.► Whole Program Optimization (WPO) goes Whole Program Optimization (WPO) goes
beyond individual files.beyond individual files.► Profile Guided Optimization (PGO) Profile Guided Optimization (PGO)
specializes optimizations specifically for specializes optimizations specifically for your application.your application.
► New Floating Point Model.New Floating Point Model.► OpenMPOpenMP► 64bit Code Generation.64bit Code Generation.
Maximum Optimization for Each Maximum Optimization for Each FileFile
► Compiler optimizes each source code file to Compiler optimizes each source code file to get best runtime performance get best runtime performance The only type optimization available in Visual C++ The only type optimization available in Visual C++
66
► Visual C++ 2005 has better optimization Visual C++ 2005 has better optimization algorithmsalgorithms Specialized support for newer processors such as Specialized support for newer processors such as
Pentium 4Pentium 4 Improved speed and better precision of floating Improved speed and better precision of floating
point operationspoint operations New optimization techniques like loop unrollingNew optimization techniques like loop unrolling
Whole Program OpitmizationWhole Program Opitmization
► Typically Visual C++ will optimize programs by Typically Visual C++ will optimize programs by generating code for object files separately generating code for object files separately
► Introducing whole program optimizationIntroducing whole program optimization First introduced with Visual C++ 2002 and has since First introduced with Visual C++ 2002 and has since
improvedimproved Compiler and linker set with new options (/GL and /LTCG)Compiler and linker set with new options (/GL and /LTCG) Compiler has freedom to do additional optimizationsCompiler has freedom to do additional optimizations
► Cross-module inliningCross-module inlining► Custom calling conventionsCustom calling conventions
Visual C++ 2005 supports this on all platformsVisual C++ 2005 supports this on all platforms Whole program optimizations is widely used for Microsoft Whole program optimizations is widely used for Microsoft
products.products.
Profile Guided OptimizationProfile Guided Optimization► Static analysis leaves many open optimization Static analysis leaves many open optimization
questions for the compiler, leading to conservative questions for the compiler, leading to conservative optimizationsoptimizations
► Visual C++ programs can be tuned for expected Visual C++ programs can be tuned for expected user scenarios by collecting information from user scenarios by collecting information from running applicationrunning application
► Introducing profile guided optimization Introducing profile guided optimization Optimizing code by using program in a way how its Optimizing code by using program in a way how its
customer use itcustomer use it Runs optimizations at link time like whole program Runs optimizations at link time like whole program
optimizationoptimization Available in Visual Studio 2005 Available in Visual Studio 2005 Widely adopted in Microsoft Widely adopted in Microsoft
if (p != NULL) { /* Perform action with p */} else { /* Error code */}
Is it common for p to be NULL?
If it is not common for p to be NULL, the error
code should be collected with other
infrequently used code
PGO: InstrumentationPGO: Instrumentation
► We instrument with “probes” inserted into We instrument with “probes” inserted into the codethe code
► Two main types of probesTwo main types of probes Value probesValue probes
► Used to construct histogram of valuesUsed to construct histogram of values
Count (simple/entry) probesCount (simple/entry) probes► Used to count number of times a path is takenUsed to count number of times a path is taken
► We try to insert the minimum number of We try to insert the minimum number of probes to get full coverageprobes to get full coverage Minimizes the cost of instrumentationMinimizes the cost of instrumentation
PGO OptimizationsPGO Optimizations
►Switch expansionSwitch expansion►Better inlining decisionsBetter inlining decisions►Cold code separationCold code separation►Virtual call speculationVirtual call speculation►Partial inliningPartial inlining
Compile with /GL & Optimizations On (e.g. /O2)Source Object files
InstrumentedImage
Scenarios Output Profile data
Object files Link with /LTCG:PGI InstrumentedImage
Profile data
Object files
Link with /LTCG:PGOOptimized
Image
Profile Guided Optimization
PGO: Inlining SamplePGO: Inlining Sample►Profile Guided uses call graph path Profile Guided uses call graph path
profiling.profiling.
foo
bat
bar baz
a
PGO: Inlining Sample (Cont)PGO: Inlining Sample (Cont)
100
foo
bat
20 50bar baz
15bar
baz
►Profile Guided uses call graph path Profile Guided uses call graph path profiling.profiling.
a10 75
bar
baz15
PGO – Inlining Sample (cont)PGO – Inlining Sample (cont)
foo
bat
20 125bar baz
10015bar baz
► Inlining decisions are made at each Inlining decisions are made at each call site.call site.
a10
15
PGO – Switch ExpansionPGO – Switch Expansion
if (i == 10)
goto default;switch (i) {
case 1: …
case 2: …
case 3: …
default:…}
Most frequent values are pulled out.
switch (i) {
case 1: …
case 2: …
case 3: …
default:…}
// 90% of the // time i = 10;
►
PGO – Code SeparationPGO – Code Separation
A
CB
D
100
100
10
10
A
B
C
D
Default layout
A
B
C
D
Optimized layout
Basic blocks are ordered so that most frequent path falls through.
PGO – Virtual Call PGO – Virtual Call SpeculationSpeculation
class Foo:Base{…void call();}
class Bar:Base {…void call();}
class Base{…virtual void call();}
void Bar(Base *A){ … while(true) { … A->call(); … }}
void Func(Base *A){ … while(true) { … if(type(A) == Foo:Base) { // inline of A->call(); } else A->call(); … }}
The type of object A in function Func was almost always Foo via the profiles
PGO – Partial Inlining (cont)PGO – Partial Inlining (cont)Basic Block 1
Cond
Cold CodeHot Code
More Code
Hot path is inlined,but NOT the cold
New Floating Point ModelNew Floating Point Model
►/Op made your code run slow /Op made your code run slow No intermediate switchNo intermediate switch
►New Floating Point ModelNew Floating Point Model /fp:fast/fp:fast /fp:precise (default)/fp:precise (default) /fp:strict/fp:strict /fp:except/fp:except
/fp:precise/fp:precise
►The default floating point switchThe default floating point switch►Performance and PrecisionPerformance and Precision►IEEE Conformant IEEE Conformant ►Round to the appropriate precisionRound to the appropriate precision
At assignments, casts and function At assignments, casts and function callscalls
/fp:fast/fp:fast
► When performance matters mostWhen performance matters most► You know your application does simple You know your application does simple
floating point operationsfloating point operations► What can /fp:fast do?What can /fp:fast do?
AssociationAssociation DistributionDistribution Factoring inverseFactoring inverse Scalar reductionScalar reduction Copy propagationCopy propagation And othersAnd others……
/fp:except/fp:except
►Reliable floating point exceptionsReliable floating point exceptions►Thrown and not thrown when Thrown and not thrown when
expectedexpected Faults and traps, when reliable, Faults and traps, when reliable,
should occur at the line that causes should occur at the line that causes the exceptionthe exception
FWAITs on x86 might be addedFWAITs on x86 might be added►Cannot be used with /fp:fast and in Cannot be used with /fp:fast and in
managed codemanaged code
/fp:strict/fp:strict
►The strictest FP optionThe strictest FP option Turns off contractionsTurns off contractions Assumes floating point control word Assumes floating point control word
can change or that the user will can change or that the user will examine flagsexamine flags
►/fp:except is implied/fp:except is implied►Low double digit percent slowdown Low double digit percent slowdown
versus /fp:fastversus /fp:fast
What is the output?What is the output?
#include <stdio.h>#include <stdio.h>int main()int main(){{
double x, y, z;double x, y, z;double sum;double sum;x = 1e20;x = 1e20;y = -1e20;y = -1e20;z = 10.0;z = 10.0;sum = x + y + z;sum = x + y + z;printf ("sum=%f\n",sum);printf ("sum=%f\n",sum);
}}
/fp:fast /O2 = 0.000
/fp:strict /O2 = 10.0
OpenMPOpenMP
A specification for writing multithreaded A specification for writing multithreaded programsprograms
It consists of a set of simple #pragmas It consists of a set of simple #pragmas and runtime routinesand runtime routines
Makes it very easy to parallelize loop-Makes it very easy to parallelize loop-based codebased code
Helps with load balancing, Helps with load balancing, synchronization, etc…synchronization, etc…
In Visual Studio, only available in C++In Visual Studio, only available in C++
OpenMP ParallelizationOpenMP Parallelization► Can parallelize loops and straight-line codeCan parallelize loops and straight-line code► Includes synchronization constructsIncludes synchronization constructs
first = 1last = 1000
1 ≤ i ≤ 250 251 ≤ i ≤ 500 501 ≤ i ≤ 750 751 ≤ i ≤ 1000
void test(int first, int last) { #pragma omp parallel for for (int i = first; i <= last; ++i) { a[i] = b[i] + c[i]; }}
64bit Compiler in VC200564bit Compiler in VC2005
►64bit Compiler Cross Tools64bit Compiler Cross Tools Compiler is 32bit but resulting image is Compiler is 32bit but resulting image is
64bit64bit
►64bit Compiler Native Tools64bit Compiler Native Tools Compiler and resulting image are 64bit Compiler and resulting image are 64bit
binaries.binaries.
►All previous optimizations apply for All previous optimizations apply for 64bit as well.64bit as well.
ResourcesResources
►Visual C++ Dev CenterVisual C++ Dev Center http://msdn.microsoft.com/visualchttp://msdn.microsoft.com/visualc This is the place to go for all our news and This is the place to go for all our news and
whitepaperswhitepapers Also VC2005 specific forums at Also VC2005 specific forums at http://http://
forums.microsoft.comforums.microsoft.com
►MyselfMyself http://http://blogs.msdn.comblogs.msdn.com/aymans/aymans