Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be...

58
® ® * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights reserved. Intel ® Compilers and Libraries for Itanium® Architecture Intel ® Compilers and Libraries for Itanium® Architecture Intel, Itanium, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries Heinz Bast [email protected] Intel Software Enabling Group EMEA

Transcript of Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be...

Page 1: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®®

* Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights reserved.

Intel® Compilers and Libraries for Itanium®Architecture

Intel® Compilers and Libraries for Itanium®Architecture

Intel, Itanium, and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

Heinz [email protected] Software Enabling Group EMEA

Page 2: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 2Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Intel® Software Development Products

Page 3: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 3Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Libraries

Development Tools RoadmapDevelopment Tools RoadmapProducts Q4 ‘04 Q1 ‘05 Q2 ’05 2H '05 2006

Compilers

ThreadingTools Threading Tools 2.1 (Linux RDC)

Wabash(Native Linux)

Cluster Tools

Trace Analyzer & Trace Collector 4.0

Intel Cluster Toolkit 1.0

C++ & Ftn 8.1

C++ for WinCE.NET 1.2C++ Tool Suite 1.2

WinCE .NET 2.0

eIA MVCGE 8.1

eIA QNX 8.1eIA QNX/MV 9.0

C++ Tool Suite 2.0

C++ & Ftn 9.0

MKL 8.0

Cluster MKL 8.0

VTune™Performance

Analyzers

VTune™ 7.2

VTune™ Linux *2.0VTune™ Linux 3.0

Eclipse GUI

VTune™ Update Xscale™ (Windows*)

Intel Cluster ToolKit 1.0

EM64T Support

Eclipse* 2.1.3Additional

MKL 7.2

Trace Collector 5.0

Mobile Platform SDK 1.0

VTune™ 7.2

VTune Linux 3.1

VTune7.3

VTune Linux 4.0

PCA MVCEE ISV

C++ & Ftn “NR”C++ Bi-Endian

Threading Tools 2.1

OXL 1.0

MKL 7.2

IPP 4.1

Cluster MKL 7.2

VTune 8.0

Data Collector for CE5.0,next gen. Windows Mobile

Cluster MKL 7.2

C++ & Ftn 8.1

Eclipse* 3.0

ITT 2.2

IPP 5.0

OXL

MPI Library 1.0

Trace Analyzer 6.0 (Mosel)Trace Collector 6.0

Page 4: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 4Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

The Intel® Compiler FamilyEasiest Way to take advantage of Intel Architecture

The Intel® Compiler FamilyEasiest Way to take advantage of Intel Architecture

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

iecWindows / EFI Byte Code (EBC) Virtual Machine

IA32 and Itanium®

C

Name / Name since 8.0 release

OS/PlatformArchitectureLanguage

ccxsccePlatform Builder for Win CE .NET*

ccxscceMicrosoft eMbedded Visual C++Xscale™

efc→ ifortLinux*

Efl → ifortWindows*Itanium®

Ifc → ifortLinux*

ifl → ifortWindows*IA32Fortran

eccLinux*

eclWindows*Itanium®

icc → eccLinux*

iclWindows*IA32C/C++

Page 5: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 5Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

EPIC Architecture PrinciplesEPIC Architecture Principles1. “The compiler should play the key role in designing

the plan of execution, and the architecture should provide the requisite support for it to do so successfully

2. The Architecture should provide features that assist the compiler in exploiting ILP

3. The Architecture should provide mechanisms to communicate the compiler’s plan of execution to the hardware.”

Schlansker, Michael S. and Rau, B. Ramakrishna (HP

1. “The compiler should play the key role in designing the plan of execution, and the architecture should provide the requisite support for it to do so successfully

2. The Architecture should provide features that assist the compiler in exploiting ILP

3. The Architecture should provide mechanisms to communicate the compiler’s plan of execution to the hardware.”

Schlansker, Michael S. and Rau, B. Ramakrishna (HP

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

From: Schlansker et. al (HP Labs) “EPIC: Explicitly Parallel Instruction Computing” Computer, Vol 33 Issue: 2 , Feb. 2000 pp 37-45

Page 6: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 6Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Intel Compiler ArchitectureIntel Compiler Architecture

Profiler

C++Front End

Interprocedural analysis and optimizations: inlining, constant prop, whole program detect, mod/ref, points-to

Loop optimizations: data deps, prefetch, vectorizer, unroll/interchange/fusion/dist, auto-parallel/OpenMP

Global scalar optimizations: partial redundancy elim, dead store elim, strength reduction, dead code elim

Code generation: predication, software pipelining, global scheduling, register allocation, code generation

FORTRAN 95Front End

Disambiguation:types, array,

pointer, structure, directives

Page 7: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 7Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Generic FeaturesGeneric Features• Compatibility to standards ( ANSI C, C99,

Fortran95, Fortran2003)• Compatibility to leading open-source tools (

ICC vs. GCC, IDB vs GBD)• OpenMP 2.0 support• Automatic parallelization• Profile-guided optimization• Multi-file interprocedural optimization• Support of other Intel® tools

– E.g –tcheck for Threading Tools

• Compatibility to standards ( ANSI C, C99, Fortran95, Fortran2003)

• Compatibility to leading open-source tools ( ICC vs. GCC, IDB vs GBD)

• OpenMP 2.0 support• Automatic parallelization• Profile-guided optimization• Multi-file interprocedural optimization• Support of other Intel® tools

– E.g –tcheck for Threading Tools

Page 8: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 8Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Generic Features ( cont. )Generic Features ( cont. )• Support for gprof (-p switch) • Inlined math intrinsics• Code Coverage Tool• Test Prioritization Tool

• Support for gprof (-p switch) • Inlined math intrinsics• Code Coverage Tool• Test Prioritization Tool

Page 9: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 9Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Target Processor SelectionTarget Processor Selection

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

• Itanium® and Itanium®-2 processor – tpp1 ( Linux ) for Itanium®– tpp2 (Linux) for Itanium®-2– Itanium-2 now default ( since 7.0 release )

• Current “Madison” Processor– No compiler changes– Same as for Itanium®-2

• Be prepared for new options for future Itanium processors ( Montecito, Tukwila)

• Itanium® and Itanium®-2 processor – tpp1 ( Linux ) for Itanium®– tpp2 (Linux) for Itanium®-2– Itanium-2 now default ( since 7.0 release )

• Current “Madison” Processor– No compiler changes– Same as for Itanium®-2

• Be prepared for new options for future Itanium processors ( Montecito, Tukwila)

Page 10: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 10Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Target Selection is relevant Target Selection is relevant

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

int csum(double a[], double b[], double c[], double x, int max)for (i=0; i<max;i++)a[i]=b[i]+x*c[i];

Intel® Itanium™ processor optimized assembler:

.b1_2:{ .mmi(p16) ldfd f32=[r34],8(p16) ldfd f37=[r33],8

nop.I 0 ;;} { .mfb(p24) stfd [r32]=f46,8(p20) fma.d f42=f8,f36,f41

br.ctop.sptk .b1_2 ;;

Itanium™ -2 processor optimized assembler:

.b1_2:{ .mfi(p16) ldfd f32=[r34],8(p22) fma.d f46=f8,f38,f45

nop.i 0} { .mmb(p16) ldfd f39=[r33],8(p26) stfd [r32]=f50,8

br.ctop.sptk .b1_2 ;;

Itanium-2® processor can do 4 FP load/store operations/cycle, Itanium® only 2 !!( double performance for large max )

L2 LatencyItanium: 8

Itanium-2: 6

Page 11: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 11Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Some General SwitchesSome General Switches

-openmpOpenMP 2.0 support

-fno-fnaliasAssume no aliasing

-fastAggressive optimizations ( == -O3 –ipo –static)

-SGenerate assembly files

-parallelAutomatic parallelization for OpenMP threading

-gCreate symbols for debugging

-O3High-level optimizer, incl. prefetch, unroll, -FTZ

-O2Optimize for speed (default), includes SWP

-O1Optimize for speed (no code size increase), no SWP

-O0Disable optimization

Itanium and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

Page 12: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 12Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

High-level Optimizer OverviewHigh-level Optimizer Overview

• Data dependence analysis– Array subscript analysis

• Eliminating memory operations– Scalar replacement, unroll-and-jam, register blocking

• Cache optimizations– Loop interchange, fusion, distribution, blocking, data

layout• Overlapping memory latency

– Data prefetch• Instruction-level parallelism

– Unrolling, interchange, distribution• Bandwidth optimization

– Load-pair, array contraction

• Data dependence analysis– Array subscript analysis

• Eliminating memory operations– Scalar replacement, unroll-and-jam, register blocking

• Cache optimizations– Loop interchange, fusion, distribution, blocking, data

layout• Overlapping memory latency

– Data prefetch• Instruction-level parallelism

– Unrolling, interchange, distribution• Bandwidth optimization

– Load-pair, array contraction

Page 13: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 13Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Interprocedural OptimizationExtends optimizations across file boundariesInterprocedural OptimizationExtends optimizations across file boundaries

Compile & OptimizeCompile & Optimize

Compile & OptimizeCompile & Optimize

Compile & OptimizeCompile & Optimize

Compile & OptimizeCompile & Optimize

file1.c

file2.c

file3.c

file4.c

Without IPOWithout IPO

Compile & OptimizeCompile & Optimize

file1.c

file4.c file2.c

file3.c

With IPOWith IPO

Modules of multiple files/whole application-QipoOnly between modules of one source file-Qip

Page 14: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 14Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Interprocedural OptimizationInterprocedural Optimization• 254.gap, integer.c, (from SPEC CPU2000)

for((k=SIZE(hdR)/4*sizeof(TypDigit));k!= 0;--k){

c = L * *r++ + (c>>16); *p++ = c; //line 1

c = L * *r++ + (c>>16); *p++ = c; //line 2

c = L * *r++ + (c>>16); *p++ = c; //line 3

c = L * *r++ + (c>>16); *p++ = c; //line 4

}• r passed in as formal parameter• p is dynamically allocated

• 254.gap, integer.c, (from SPEC CPU2000)for((k=SIZE(hdR)/4*sizeof(TypDigit));k!= 0;--k){

c = L * *r++ + (c>>16); *p++ = c; //line 1

c = L * *r++ + (c>>16); *p++ = c; //line 2

c = L * *r++ + (c>>16); *p++ = c; //line 3

c = L * *r++ + (c>>16); *p++ = c; //line 4

}• r passed in as formal parameter• p is dynamically allocated

IPO predicts that r and p are independent making SWP possibleIPO predicts that IPO predicts that rr and and pp are independent making are independent making SWP possibleSWP possible

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

Page 15: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 15Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Profile-Guided Optimizations (PGO)Profile-Guided Optimizations (PGO)

• Use execution-time feedback to guide optimization

• Helps I-cache, paging, branch-prediction• Enabled optimizations:

– Basic block ordering– Better register allocation– Better decision of functions to inline– Function ordering– Switch-statement optimization

• Use execution-time feedback to guide optimization

• Helps I-cache, paging, branch-prediction• Enabled optimizations:

– Basic block ordering– Better register allocation– Better decision of functions to inline– Function ordering– Switch-statement optimization

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

Page 16: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 16Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Instrumented Compilationecl -prof_gen prog.cInstrumented Compilationecl -prof_gen prog.c

Instrumented Executionprog.exe (on a typical dataset)Instrumented Executionprog.exe (on a typical dataset)

Feedback Compilationecl -prof_use prog.cFeedback Compilationecl -prof_use prog.c

DYN file containingdynamic info: .dyn

DYN file containingdynamic info: .dyn

Instrumented executable: prog.exe

Instrumented executable: prog.exe

Merged DYNsummary file: .dpiDelete old dyn files unless you want their info included

Merged DYNsummary file: .dpiDelete old dyn files unless you want their info included

Step 1Step 1

Step 2Step 2

Step 3Step 3

PGO Usage: Three Step ProcessPGO Usage: Three Step Process

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

Page 17: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 17Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Programs That BenefitPrograms That Benefit

• Consistent hot paths• Many if statements or switches• Nested if statements or switches

• Consistent hot paths• Many if statements or switches• Nested if statements or switches

Significant Benefit

Little Benefit

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

Page 18: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 18Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Affects many compiler optimizations– Dynamic Branch Prediction– Loop (Pipeline vs. unroll vs. nothing) – Cache Utilization– Predication– Speculation– Classical (block order, inline, reg alloc)– Function Splitting

Affects many compiler optimizations– Dynamic Branch Prediction– Loop (Pipeline vs. unroll vs. nothing) – Cache Utilization– Predication– Speculation– Classical (block order, inline, reg alloc)– Function Splitting

Profile-Guided OptimizerProfile-Guided Optimizer

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

Page 19: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 19Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Predication – The Concept• Code Example: absolute difference of two numbers

cmpGE r2, r3jump_zero P2

P1: sub r4 = r2, r3jump end

P2: sub r4 = r3, r2end: ...

cmp.ge p1,p2 = r2,r3 ;;(p1) sub r4 = r2,r3(p2) sub r4 = r3, r2

Non-PredicatedPseudo Code

Predicated Assembly Code

C Codeif (r2 >= r3)

r4 = r2 - r3;else

r4 = r3 - r2;

Predication Removes Branches, Enables Parallel ExecutionPredication Removes Branches, Enables Parallel Execution

Page 20: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 20Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

PGO Helps Improve PredicationPGO Helps Improve Predication

10 10

Balanced paths -Good opportunityto predicateindependentof profile: We will double average path length but get rid of all mispredicts

Balanced paths -Good opportunityto predicateindependentof profile: We will double average path length but get rid of all mispredicts

Do we predicate?Do we predicate?

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

Page 21: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 21Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Do we predicate?

102

95% 5%

Bad move!Main path length increasedfrom 2 to 12 –this is 6 times and branch mispredicts are minimal

Bad move!Main path length increasedfrom 2 to 12 –this is 6 times and branch mispredicts are minimal

Predication: PGO BenefitsPredication: PGO Benefits

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

Page 22: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 22Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

PGO Example: Code Re-OrderPGO Example: Code Re-Order

for (i=0; i < NUM_BLOCKS; i++) {switch (check3(i)) {

case 3: /* 25% */x[i] = 3; break;

case 10: /* 75% */x[i] = i+10; break;

default: /* 0% */x[i] = 99; break; }

}• “Case 10” is moved to the beginning

– PGO can eliminate most tests&jumps for the common case

for (i=0; i < NUM_BLOCKS; i++) {switch (check3(i)) {

case 3: /* 25% */x[i] = 3; break;

case 10: /* 75% */x[i] = i+10; break;

default: /* 0% */x[i] = 99; break; }

}• “Case 10” is moved to the beginning

– PGO can eliminate most tests&jumps for the common case

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

Page 23: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 23Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Report Generation OptionsReport Generation Options• -[Q]opt_report

– generate an optimization report to stderr (NB: or file)• -[Q]opt_report_file file

– specify the filename for the generated report• -[Q]opt_report_level level

– specify the level of report verbosity (min|med|max)• -[Q]opt_report_phase phase_name

– specify the phase that reports are generated against• -[Q]opt_report_routine name

– reports on routines containing the given name• -[Q]opt_report_help

– display the optimization phases available for reporting

• -[Q]opt_report– generate an optimization report to stderr (NB: or file)

• -[Q]opt_report_file file– specify the filename for the generated report

• -[Q]opt_report_level level– specify the level of report verbosity (min|med|max)

• -[Q]opt_report_phase phase_name– specify the phase that reports are generated against

• -[Q]opt_report_routine name– reports on routines containing the given name

• -[Q]opt_report_help– display the optimization phases available for reporting

Page 24: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 24Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Phase_names (from –opt_report_help)Phase_names (from –opt_report_help)• ipo• ipo_inl• ipo_cp• ipo_modref• ipo_lpt• ipo_subst• ipo_ratt• ipo_vaddr• ipo_pdce• ipo_dp• ipo_gprel• ipo_pmerge• ipo_dstat• ipo_fps• ipo_ppi• ipo_unref• ipo_wp• ipo_dl

• ipo• ipo_inl• ipo_cp• ipo_modref• ipo_lpt• ipo_subst• ipo_ratt• ipo_vaddr• ipo_pdce• ipo_dp• ipo_gprel• ipo_pmerge• ipo_dstat• ipo_fps• ipo_ppi• ipo_unref• ipo_wp• ipo_dl

• ilo• ilo_lowering• ilo_strength_reduction• ilo_reassociation• ilo_copy_propagation• ilo_convert_insertion• ilo_convert_removal• ecg

• ecg_gra• ecg_swp• ecg_predication• ecg_speculation• ecg_code• ecg_code_cycles• ecg_code_size• ecg_code_size_fsp• Pgo

• ilo• ilo_lowering• ilo_strength_reduction• ilo_reassociation• ilo_copy_propagation• ilo_convert_insertion• ilo_convert_removal• ecg

• ecg_gra• ecg_swp• ecg_predication• ecg_speculation• ecg_code• ecg_code_cycles• ecg_code_size• ecg_code_size_fsp• Pgo

• hlo

• hlo_fusion

• hlo_distribution

• hlo_scalar_replacement

• hlo_unroll

• hlo_prefetch

• hlo_loadpair

• hlo_linear_trans

• hlo_opt_pred

• hlo_data_trans

• hlo_reroll

• hlo_array_contraction

• hlo_scalar_expansion

• all

Page 25: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 25Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Sample SWP Report –Qopt_report –Qopt_report ecg_swpSample SWP Report –Qopt_report –Qopt_report ecg_swp

Swp report for loop at line 12 in multiply_d in file multiply_d.c

Resource II = 1Recurrence II = 4 >0 means loop carried dep. Rewrite loop if possibleMinimum II = 4Scheduled II = 4 Min. II = Sched. II => loop optimally scheduledPercent of Resource II needed by arithmetic ops = 100%Percent of Resource II needed by memory ops = 100%Percent of Resource II needed by floating point ops = 100%

Number of stages in the software pipeline = 3

Following are the loop-carried memory dependency edges:Store at line 12 --> Load at line 12Store at line 12 --> Load at line 12

Swp report for loop at line 12 in multiply_d in file multiply_d.c

Resource II = 1Recurrence II = 4 >0 means loop carried dep. Rewrite loop if possibleMinimum II = 4Scheduled II = 4 Min. II = Sched. II => loop optimally scheduledPercent of Resource II needed by arithmetic ops = 100%Percent of Resource II needed by memory ops = 100%Percent of Resource II needed by floating point ops = 100%

Number of stages in the software pipeline = 3

Following are the loop-carried memory dependency edges:Store at line 12 --> Load at line 12Store at line 12 --> Load at line 12

• What to look for:– Was loop software pipelined or “loop not pipelined” ?– Scheduled II – most important info

• What to look for:– Was loop software pipelined or “loop not pipelined” ?– Scheduled II – most important info

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries.

Page 26: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 26Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Software Pipelining TermsSoftware Pipelining Terms• Initiation Interval (II) is the number of cycles

between the start of successive iterations in the loop• If the II is n cycles, a new loop iteration will be completed

every n cycles at steady state• Scheduled II is the cycles per iteration of the

pipelined loop• Minimum II is the smallest ii that is feasible for

pipelined loop (according to the compiler’s coded uarch scheduling rules).

• Recurrence II is caused by loop-carried dependence edges (memory and register dependences) from instructions in one iteration to instructions in subsequent iterations

• If the recurrences are caused by memory-dependence edges, the report prints out details of such edges

• Initiation Interval (II) is the number of cycles between the start of successive iterations in the loop

• If the II is n cycles, a new loop iteration will be completed every n cycles at steady state

• Scheduled II is the cycles per iteration of the pipelined loop

• Minimum II is the smallest ii that is feasible for pipelined loop (according to the compiler’s coded uarch scheduling rules).

• Recurrence II is caused by loop-carried dependence edges (memory and register dependences) from instructions in one iteration to instructions in subsequent iterations

• If the recurrences are caused by memory-dependence edges, the report prints out details of such edges

Page 27: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 27Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Tips on how to read SWP reportTips on how to read SWP report

• If Recurrence II > 0 , means compiler detected loop carried dependencies

• If Minimum II = Scheduled II, means loop is optimally scheduled according to the compiler

• Percent of Resource II used by memory ops, floating point ops and integer ops shows the utilization of the corresponding execution units throughout the loop kernel

• If your floating point Resource II utilization is less than memory – not optimal situation for number crunching algorithm. Consider loop balancing.

• If Recurrence II > 0 , means compiler detected loop carried dependencies

• If Minimum II = Scheduled II, means loop is optimally scheduled according to the compiler

• Percent of Resource II used by memory ops, floating point ops and integer ops shows the utilization of the corresponding execution units throughout the loop kernel

• If your floating point Resource II utilization is less than memory – not optimal situation for number crunching algorithm. Consider loop balancing.

Page 28: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 28Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Phase Ordering inside HLOPhase Ordering inside HLO

Loop Recognition and Normalization Construct Dependence Graph Loop Distribution Linear Loop Transformations Loop Fusion Loadpair Detection

Affine Condition Unswitching Block Unroll and Jam Loop Fusion Scalar Replacement Data Prefetching Loadpair Insertion

Page 29: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 29Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Sample HLO Reporticc –O3 -opt_report –opt_report_phase hlo –opt_report_level max

Sample HLO Reporticc –O3 -opt_report –opt_report_phase hlo –opt_report_level max

…LOOP INTERCHANGE in loops at line: 7 8 9Loopnest permutation ( 1 2 3 ) --> ( 2 3 1 )LOOP INTERCHANGE in loops at line: 15 17Loopnest permutation ( 1 2 3 ) --> ( 3 2 1 )…

Loop at line 7 unrolled and jammed by 4Loop at line 8 unrolled and jammed by 4Loop at line 15 unrolled and jammed by 4Loop at line 16 unrolled and jammed by 4…

…LOOP INTERCHANGE in loops at line: 7 8 9Loopnest permutation ( 1 2 3 ) --> ( 2 3 1 )LOOP INTERCHANGE in loops at line: 15 17Loopnest permutation ( 1 2 3 ) --> ( 3 2 1 )…

Loop at line 7 unrolled and jammed by 4Loop at line 8 unrolled and jammed by 4Loop at line 15 unrolled and jammed by 4Loop at line 16 unrolled and jammed by 4…

Page 30: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 30Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Loop UnrollingLoop Unrolling• E.g. used to fully utilize execution units in each

iteration • Implicitly done by –O3• Can be controlled by –unrollx• Normally compiler uses a heuristic based on

complexity of loop body• Poor candidates: (similar issues for SWP or

vectorizer)– Low trip count loops – for (j=0; j < N; j++) : N=4 at runtime– Fat loops – loop body already has lots of computation

taking place– Loops containing procedure calls– Loops with branches

• E.g. used to fully utilize execution units in each iteration

• Implicitly done by –O3• Can be controlled by –unrollx• Normally compiler uses a heuristic based on

complexity of loop body• Poor candidates: (similar issues for SWP or

vectorizer)– Low trip count loops – for (j=0; j < N; j++) : N=4 at runtime– Fat loops – loop body already has lots of computation

taking place– Loops containing procedure calls– Loops with branches

Page 31: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 31Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Loop Unrolling: +/-Loop Unrolling: +/-

• Benefits– Perform more computations per loop iteration– Reduces the effect of loop overhead– Can increase floating point to memory access

ratio (F/M)– Can increase instruction level parallelism

• Costs– Register pressure– Code bloat ( I-cache pollution )– Maintainability

• Benefits– Perform more computations per loop iteration– Reduces the effect of loop overhead– Can increase floating point to memory access

ratio (F/M)– Can increase instruction level parallelism

• Costs– Register pressure– Code bloat ( I-cache pollution )– Maintainability

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries.

Page 32: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 32Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Loop Unrolling ExampleLoop Unrolling Example

N=1025M=5DO I=1,NDO J=1,MA(J,I) = B(J,I) + C(J,I) * D

ENDDOENDDO

Unroll outer Unroll outer loop by 4loop by 4

K = IMOD (N,4)

DO I = 1,N-K,4DO J=1,MA(J,I) = B(J,I) + C(J,I) * DA(J,I+1) = B(J,I+1) + C(J,I+1) * DA(J,I+2) = B(J,I+2) + C(J,I+2) * DA(J,I+3) = B(J,I+3) + C(J,I+3) * D

ENDDOENDDO

DO I = N-K+1, NDO J=1,MA(J,I) = B(J,I) + C(J,I) * D

ENDDOENDDO

PostPostconditioning conditioning

looploopIf loop size is known, you can eliminate post-conditioning loop by specifying the number of times to unroll

If loop size is known, you can If loop size is known, you can eliminate posteliminate post--conditioning conditioning loop by specifying the number loop by specifying the number of times to unrollof times to unroll

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries.

Page 33: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 33Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

#pragma unrollUnroll next loop; allows the compiler to determine the

unroll factor#pragma unroll(n)

Unroll next loop exactly n times

#pragma unrollUnroll next loop; allows the compiler to determine the

unroll factor#pragma unroll(n)

Unroll next loop exactly n times

#pragma unroll(4)for (i = 0; i < MAX; ++i) {

a[i] = b[i] * c[i] + d[i];}

#pragma unroll(4)for (i = 0; i < MAX; ++i) {

a[i] = b[i] * c[i] + d[i];}

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries.

Loop Unrolling DirectivesLoop Unrolling Directives

Page 34: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 34Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

HLO: Cache blockingHLO: Cache blocking

• Cache blocking ensures effective cache utilization by blocking computations within non-overlapping areas (where data references would not guarantee to reside in cache)

• Currently compiler is aware of L2 cache size, but not L3.

• Assume cache blocking for L3 needs to be done manually

• Cache blocking ensures effective cache utilization by blocking computations within non-overlapping areas (where data references would not guarantee to reside in cache)

• Currently compiler is aware of L2 cache size, but not L3.

• Assume cache blocking for L3 needs to be done manually

Page 35: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 35Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

HLO: Loop DistributionHLO: Loop Distribution• Defined as: extracting different statements from a

loop and distributing them into separate loops (usually with identical loop bounds)

• Used to:– create “tighter” code, minimize CPI– reduce empty cycles– Enable more effective usage of the CPU resources and

execution units in SWP kernel– Reducing memory references within single loop and lower

pressure on L2 OZQ– Reaching theoretical maximum per single loop by assisting

compiler to produce balanced number of FLOPs or INT ops per load/store

– Remove limitation on rotating register usage when SWP

• Defined as: extracting different statements from a loop and distributing them into separate loops (usually with identical loop bounds)

• Used to:– create “tighter” code, minimize CPI– reduce empty cycles– Enable more effective usage of the CPU resources and

execution units in SWP kernel– Reducing memory references within single loop and lower

pressure on L2 OZQ– Reaching theoretical maximum per single loop by assisting

compiler to produce balanced number of FLOPs or INT ops per load/store

– Remove limitation on rotating register usage when SWP

Page 36: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 36Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

HLO: Loop Distribution Triggered by DirectiveHLO: Loop Distribution Triggered by Directive#pragma distribute point• Placed outside a loop, it tells the compiler to attempt

to distribute the loop based on its internal heuristic• Placed within a loop, it tells the compiler to attempt

to distribute the loop at that point. All loop-carried dependencies will be ignored.

#pragma distribute point• Placed outside a loop, it tells the compiler to attempt

to distribute the loop based on its internal heuristic• Placed within a loop, it tells the compiler to attempt

to distribute the loop at that point. All loop-carried dependencies will be ignored.

for (i = 0; i < MAX; ++i) {a1[i] = b1[i] * c1[i] + d1[i];

#pragma distribute pointa2[i] = b2[i] * c2[i] + d2[i];

}

for (i = 0; i < MAX; ++i) {a1[i] = b1[i] * c1[i] + d1[i];

#pragma distribute pointa2[i] = b2[i] * c2[i] + d2[i];

}

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries.

Page 37: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 37Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Memory Reference DisambiguationOptions/Directives related to AliasingMemory Reference DisambiguationOptions/Directives related to Aliasing

• -alias_args[-]• -ansi_alias[-] • -fno-alias: No aliasing in whole program• -fno-fnalias: No aliasing within single units• -restrict (C99): -restrict, restrict attribute

– enables selective pointer disambiguation

• -safe_cray_ptr: No aliasing introduced by Cray-pointers

• -assume dummy_alias• Related: Switch –ipo and directive IFDEP

• -alias_args[-]• -ansi_alias[-] • -fno-alias: No aliasing in whole program• -fno-fnalias: No aliasing within single units• -restrict (C99): -restrict, restrict attribute

– enables selective pointer disambiguation

• -safe_cray_ptr: No aliasing introduced by Cray-pointers

• -assume dummy_alias• Related: Switch –ipo and directive IFDEP

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries.

Page 38: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 38Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Vector Addition ExampleVector Addition Example

.b1_7:{ .mmi

ld4 r9=[r32],4 ;; // cycle 0add r3=r34,r9 // cycle 1nop.i 0 ;; // cycle 1

}{ .mib

st4 [r33]=r3,4 // cycle 2nop.i 0 // cycle 2br.cloop.sptk .b1_7 ;; // cycle 2

}

.b1_7:{ .mmi

ld4 r9=[r32],4 ;; // cycle 0add r3=r34,r9 // cycle 1nop.i 0 ;; // cycle 1

}{ .mib

st4 [r33]=r3,4 // cycle 2nop.i 0 // cycle 2br.cloop.sptk .b1_7 ;; // cycle 2

}

C Source CodeC Source CodeC Source Codevoid add(int a[10],int b[10],int c){ int i;for(i=0;i<10;i++) {

b[i] = a[i] + c; }}

void add(int a[10],int b[10],int c){ int i;for(i=0;i<10;i++) {

b[i] = a[i] + c; }}

Intel Itanium®® Architecture Software Developer's Manual, vol. 1, sec. 5.3.2 Software Pipelining

ecc Compiler Assembly Output

ecc Compiler ecc Compiler Assembly Assembly OutputOutput

ecc -S -opt_report -opt_report_phase ecg_swp add.cecc ecc --S S --opt_report opt_report --opt_report_phase ecg_swp add.copt_report_phase ecg_swp add.c loop-carried memory dependence => loop not pipelined

looploop--carried carried memory memory dependence => dependence => loop not loop not pipelinedpipelined

Itanium and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries.

Page 39: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 39Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Enabling Compiler to SWPEnabling Compiler to SWP

ecc -S –restrict -opt_report -opt_report_phase ecg_swp add-swp.cppecc -S –restrict -opt_report -opt_report_phase ecg_swp add-swp.cpp

void add(int *a,int * restrict b,int c,int N){ for(int i=0;i<10;i++)

b[i] = a[i] + c; }

void add(int *a,int * restrict b,int c,int N){ for(int i=0;i<10;i++)

b[i] = a[i] + c; }

SWP report for loop at line 3 in _Z3addPiS_ii in file add-swp.cppResource II = 1Recurrence II = 0Minimum II = 1Scheduled II = 1

Percent of Resource II needed by arithmetic ops = 100%Percent of Resource II needed by memory ops = 100%Percent of Resource II needed by floating point ops = 0%

Number of stages in the software pipeline = 3

SWP report for loop at line 3 in _Z3addPiS_ii in file add-swp.cppResource II = 1Recurrence II = 0Minimum II = 1Scheduled II = 1

Percent of Resource II needed by arithmetic ops = 100%Percent of Resource II needed by memory ops = 100%Percent of Resource II needed by floating point ops = 0%

Number of stages in the software pipeline = 3

Restrict keyword disambiguates pointer, enables compiler to SWP

Restrict keyword disambiguates pointer, enables compiler to SWP

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries.

Page 40: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 40Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Vector Addition ExampleVector Addition ExampleLoop to be software pipelined: Software pipeline stages:Loop to be software pipelined: Software pipeline stages:

stage 1: ld4 r32= [r8],4stage 2: add r34 = r3,r33stage 3: st4 [r2] = r35,4

stage 1: ld4 r32= [r8],4stage 2: add r34 = r3,r33stage 3: st4 [r2] = r35,4

L1: ld4 r9 = [r32],4 ;; // cycle 0 add r3 = r34,r9 ;; // cycle 1st4 [r33] = r3,4 // cycle 2br.cloop L1 ;; // cycle 2

L1: ld4 r9 = [r32],4 ;; // cycle 0 add r3 = r34,r9 ;; // cycle 1st4 [r33] = r3,4 // cycle 2br.cloop L1 ;; // cycle 2

For 10 iterations:SWP = 12 cyclesNo SWP=30 cycles

For 10 iterations:SWP = 12 cyclesNo SWP=30 cycles

Look at five pipeline iterations1 2 3 4 5 Cycle Phase--- --- --- --- --- ------- ----- .ld4 X add ld4 X+1 Prolog st4 add ld4 X+2 .

st4 add ld4 X+3 st4 add ld4 X+4 Kernel .

st4 add X+5 st4 X+6 Epilog

1 2 3 4 5 Cycle Phase--- --- --- --- --- ------- ----- .ld4 X add ld4 X+1 Prolog st4 add ld4 X+2 .

st4 add ld4 X+3 st4 add ld4 X+4 Kernel .

st4 add X+5 st4 X+6 Epilog

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries.

Page 41: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 41Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Some Compiler DirectivesSome Compiler DirectivesIA-64

IA-64

IA-64

IA-64

#pragma Architecture Description

swp, noswp Place before a loop to override the compiler’s heuristics for deciding to software pipeline the loop or not.

loop count(n) IA-32, IA-64 Place before a loop to communicate the approximate number of iterations the loop will execute. Affects software pipelining, vectorization and other loop transformations.

distribute point Placed before a loop, the compiler will attempt to distribute the loop based on its internal heuristic. Placed within a loop, the compiler will attempt to distribute the loop at the point of the pragma. All loop-carried dependencies will be ignored.

unroll, unroll(n), nounroll

Place before an inner loop (ignored on non-inmost loops).#pragma unroll without a count allows the compiler to determine the unroll factor. #pragma unroll(n) tell the compiler to unroll the loop n times.#pragma nounroll is the same as #pragma unroll(0).

prefetch a,b,…noprefetch x,y,…

Place before a loop to control data prefetching. Supported when -O3 is on.#pragma prefetch a will cause the compiler to prefetch for future accesses to array a. The compiler determines the prefetch distance.#pragma noprefetch x will cause the compiler to not prefetch for accesses to array x.

ivdep IA-32, IA-64 Place before a loop to control vectorizaton/software pipelining The compiler is instructed to ignore “assumed” ( not proven ) dependencies preventing vectorization/software pipelining. For Itanium: Assume no BACKWARD dependencies, FORWARD loop-carried dependencies still can exist w/o preventing SWP. Use with –ivdep_parallel option to exclude loop-carried dependencies completely ( e.g. for indirect addressing)

Page 42: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 42Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

C++ Compiler for Linux*C++ Compiler for Linux*

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

• Switch-compatible to GNU GCC for all basic options• Object file interoperability ICC and GCC

– “C and C” binary mix• No restrictions for whatever combination

– “C++ and C” binary mix• No restrictions for whatever combination

– “C++ and C++” binary mix• No restrictions since 8.0 • Support for GCC STL (libstdC++ library) since 8.0

• Support of most GCC C/C++ language extensions• Linux kernel build

– ICC 8.1 and kernel 2.6.x: No manual changes required

• Switch-compatible to GNU GCC for all basic options• Object file interoperability ICC and GCC

– “C and C” binary mix• No restrictions for whatever combination

– “C++ and C” binary mix• No restrictions for whatever combination

– “C++ and C++” binary mix• No restrictions since 8.0 • Support for GCC STL (libstdC++ library) since 8.0

• Support of most GCC C/C++ language extensions• Linux kernel build

– ICC 8.1 and kernel 2.6.x: No manual changes required

Page 43: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 43Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

C++ Compiler for Linux*Some new featuresC++ Compiler for Linux*Some new features

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

• C99 support– Many features but not all in 8.1

• New pragmas– e.g. #pragma member alignment

• -Wcheck– Diagnostics for problems that could

manifest as runtime problems• Variable “x” is used before its value is set• Conversion from “int” to “short” may lose

significant bits

• C99 support– Many features but not all in 8.1

• New pragmas– e.g. #pragma member alignment

• -Wcheck– Diagnostics for problems that could

manifest as runtime problems• Variable “x” is used before its value is set• Conversion from “int” to “short” may lose

significant bits

Page 44: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 44Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Current Release: ICC 8.1Current Release: ICC 8.1• GCC 3.4 C/C++ Object Compatibility & Interoperability• Linux: GCC conformance as default

– GCC run-time libraries• Different compiler drivers for C and C++ code

– Like gcc and g++ , there is icc and icpc– Will reduce code size of pure C-programs– Not an option – you have to use icpc for LINKING

C++ application• KMP_SCHEDULE Environment Variable for OpenMP

Scheduling Control – Both for C/C++ and Fortran

*Third party brands and names are the property of their respective owners

Page 45: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 45Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

PCH – Pre-Compiled-HeadersPCH – Pre-Compiled-Headers– Motivation: Large Project typically

contains a common subset of includes– What: “Memory Dump” of the compiler

after header file processing.– Goal: Create the maximum subset of

common header files

*Third party brands and names are the property of their respective owners

Compile Time Improvement by PCH

Benchmark

29%Eon* (-O2)39%Povray* (-O2)

Page 46: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 46Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Intel® Compiler Release 9.0 Intel® Compiler Release 9.0 • Major focus: Stability and performance improvements• Enhanced usability for IPO – Inter-Procedural optimization• Improved security checking

– memory leak detection– bounds checking– Fortran routine argument checking

• Enhanced debugging of optimized code in the compiler and IDB• Intel® C/C++ Linux IA32 Compiler Integration with Eclipse* CDT

2.0• Support for additional GNU GCC option enhancing C++

conformance• Intel® Itanium® Architecture stack unwinding: Mosberger stack

unwinding routines will be default for compiler and debugger

Beta program started – Release planned for end of Q2/05

Page 47: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 47Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Fortran Compiler for LinuxFortran Compiler for Linux

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

• Full compatibility to ANSI Fortran 77, Fortran90, Fortran95– Switches support language conformance

• Many extensions like– New options on ATTRIBUTES directives

• DECORATE, DEFAULT – REAL(16), COMPLEX(16)

• Many extensions of Fortran2003 already ( former Fortran2000 )– Standard not approved yet– Some existing extensions standardized in Fortran 2003 (eg.

VOLATILE)• 8.x compiler is merger of Intel 7.x compiler and

Compaq Visual Fortran 6.6

• Full compatibility to ANSI Fortran 77, Fortran90, Fortran95– Switches support language conformance

• Many extensions like– New options on ATTRIBUTES directives

• DECORATE, DEFAULT – REAL(16), COMPLEX(16)

• Many extensions of Fortran2003 already ( former Fortran2000 )– Standard not approved yet– Some existing extensions standardized in Fortran 2003 (eg.

VOLATILE)• 8.x compiler is merger of Intel 7.x compiler and

Compaq Visual Fortran 6.6

Page 48: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 48Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

How did we get here?How did we get here?

LanguageFeatures Libraries Code

Gen

Compaq Fortran 6.6 Intel Fortran 7.x

LanguageFeatures Libraries Code

Gen

Parallel

ArrayVis

LanguageFeatures Libraries Array

VisCodeGen

Intel Fortran 8.x

Parallel

Page 49: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 49Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Fortran CompilerSome switches to know aboutFortran CompilerSome switches to know about

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

-warn declarations

Enforce strict typing ( similar to IMPLICIT NONE)

-module <path>Location of module files to be generated

-I<path>Location of module files to be used

-convertData conversion Little-endian/Big-endian ( see also run-time environment variables for more selective conversion )

-double_size-real_size-integer_size

Data type sizes

-f66 / -e90 / -e95 / -std90 / -std95

Language Standard Semantic/Restrictions/Warnings

Page 50: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 50Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Fortran CompilerNon-standard External RoutinesFortran CompilerNon-standard External Routines

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

USE IFPOSIXPOSIX Routines like PXFWAIT ( wait for child process)

USE IFCOREMisc routines like FOR_GET_FPE and FOR_SET_FPE to get and set the floating point status registers

USE IFQWINUSE IFLOGMUSE IFCOMUSE IFAUTO…

Only Windows compiler: Routines to access Microsoft Windows* environment

USE IFPORTPortability routines like e.g GETENV, GETPID, SIGNAL, DTIME, ETIME …( has been -Vaxlib before IFORT 8.0 )

Page 51: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 51Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

“Black-Belt” User’s Guide“Black-Belt” User’s Guide

Page 52: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 52Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

“Black-Belt” User Guide “Black-Belt” User Guide

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

• Intel memo documenting numerous compiler options not listed in official product documentation useful – For compiler debugging

• E.g. finding the application module where the compiler optimizer fails

– Benchmarking/tuning in special situation• Warning: These options are

– Not supported– Not part of the usual regression testing– Are not intended to become “supported” options– Can be non-functional in future releases or have a different

function in the future • The memo is no longer Intel-confidential information

but should not be made public

• Intel memo documenting numerous compiler options not listed in official product documentation useful – For compiler debugging

• E.g. finding the application module where the compiler optimizer fails

– Benchmarking/tuning in special situation• Warning: These options are

– Not supported– Not part of the usual regression testing– Are not intended to become “supported” options– Can be non-functional in future releases or have a different

function in the future • The memo is no longer Intel-confidential information

but should not be made public

Page 53: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 53Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Black Belt: Some samples Black Belt: Some samples

Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries

• mP2OPT_hlo_prefetch=TRUE/FALSE– Enable/disable data pre-fetching

• mP2OPT_hlo_distribution=TRUE/FALSE– Enable/disable loop distribution

• mP3OPT_ecg_mm_fp_ld_latency=<value>– Default=11; cache miss latency

• mP3OPT_ecg_non_ieee_div=TRUE/FALSE– Move to a faster, non-IEEE division

• mP2OPT_opt_threshold_enforce=TRUE/FALSE– Consider/ignore internal limits for optimizer; ( do not) give

up on optimization when certain threshold for resource usage is reached

• mP2OPT_hlo_prefetch=TRUE/FALSE– Enable/disable data pre-fetching

• mP2OPT_hlo_distribution=TRUE/FALSE– Enable/disable loop distribution

• mP3OPT_ecg_mm_fp_ld_latency=<value>– Default=11; cache miss latency

• mP3OPT_ecg_non_ieee_div=TRUE/FALSE– Move to a faster, non-IEEE division

• mP2OPT_opt_threshold_enforce=TRUE/FALSE– Consider/ignore internal limits for optimizer; ( do not) give

up on optimization when certain threshold for resource usage is reached

Page 54: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 54Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

SummarySummary

• For all 4 combinations of Linux/Windows and C++/FORTRAN, Intel® offers mature compilers to develop for Itanium® architecture

• Easy way to get best performance results for Itanium®

• Wide range of options allow sophisticated and individual tuning

• For all 4 combinations of Linux/Windows and C++/FORTRAN, Intel® offers mature compilers to develop for Itanium® architecture

• Easy way to get best performance results for Itanium®

• Wide range of options allow sophisticated and individual tuning

Page 55: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 55Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Intel Compiler ArchitectureIntel Compiler Architecture

Profiler

C++Front End

Interprocedural analysis and optimizations: inlining, constant prop, whole program detect, mod/ref, points-to

Loop optimizations: data deps, prefetch, vectorizer, unroll/interchange/fusion/dist, auto-parallel/OpenMP

Global scalar optimizations: partial redundancy elim, dead store elim, strength reduction, dead code elim

Code generation: predication, software pipelining, global scheduling, register allocation, code generation

FORTRAN 95Front End

Disambiguation:types, array,

pointer, structure, directives

Page 56: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 56Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Do we predicate?Do we predicate?

102

50% 50%

Yes: Average path length increased from 6 to 12; mispredictscould happen each time otherwise

Yes: Average path length increased from 6 to 12; mispredictscould happen each time otherwise

Predication: PGO BenefitsPredication: PGO Benefits

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

Page 57: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 57Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

Do we predicate?Do we predicate?

102

5% 95%

Yes – optimal move: Average path length increased by 20% only but no mispredicts

Yes – optimal move: Average path length increased by 20% only but no mispredicts

Predication: PGO BenefitsPredication: PGO Benefits

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries

Page 58: Intel Compilers and Libraries for Itanium® Architecture file®3 * Other brands and names may be claimed as the property of others. Copyright © 2002 Intel Corporation. All rights

®® 58Copyright © 2002 Intel Corporation. All rights reserved.* Other brands and names may be claimed as the property of others.

102

Without profilewe won’tpredicate.

Without profilewe won’tpredicate.

Not any worsethan traditionalarchitectures.

Not any worsethan traditionalarchitectures.

Predication: PGO BenefitsPredication: PGO Benefits

The Intel logo is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States or other countries