and Function Multiversioning Architecture Specific Code ...
Transcript of and Function Multiversioning Architecture Specific Code ...
![Page 1: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/1.jpg)
Architecture Specific Code Generation and Function Multiversioning
Eric Christopher ([email protected])
![Page 2: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/2.jpg)
Talk Outline
Motivation
Current Status and Changes
Future Work
![Page 3: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/3.jpg)
Motivation
Where are we coming from?
Link Time Optimization and Architecture Interworking
Function Multiversioning
![Page 4: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/4.jpg)
Subtarget Architecture Support
X86: SSE3, SSSE3, SSE4.2, AVX
ARM: NEON, ARM, Thumb
Mips: Mips32, Mips16, Mips3d
PowerPC: VSX
![Page 5: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/5.jpg)
Subtarget Interworkingstatic inline __attribute__((mips16)) int i1 (void) { return 1; }static inline __attribute__((nomips16)) int i2 (void) { return 2; }static inline __attribute__((mips16)) int i3 (void) { return 3; }static inline __attribute__((nomips16)) int i4 (void) { return 4; }
int __attribute__((nomips16)) f1 (void) { return i1 (); }int __attribute__((mips16)) f2 (void) { return i2 (); }int __attribute__((mips16)) f3 (void) { return i3 (); }int __attribute__((nomips16)) f4 (void) { return i4 (); }
![Page 6: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/6.jpg)
Subtarget LTO
clang -g -c foo.c -emit-llvm -o foo.bc -mavx2
clang -g -c bar.c -emit-llvm -o bar.bc
clang -g -c baz.c -emit-llvm -o baz.bc -mavx2
llvm-link foo.bc bar.bc baz.bc -o lto.bc
clang lto.bc -o lto.x
![Page 7: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/7.jpg)
foo.c:int foo_avx(void *x, int a) { return _mm_aeskeygenassist_si128(x, a);}
bar.c:int foo_generic(void *x, int a) { // Lots of code }
baz.c:const unsigned AVXBits = (1 << 27) | (1 << 28); bool HasAVX = ((ECX & AVXBits) == AVXBits) && OSHasAVXSupport(); bool HasAVX2 = HasAVX && MaxLeaf >= 0x7 && !GetX86CpuIDAndInfoEx(0x7, 0x0, &EAX, &EBX, &ECX, &EDX) && (EBX & 0x20); GetX86CpuIDAndInfo(0x80000001, &EAX, &EBX, &ECX, &EDX); if (HasAVX) return foo_avx(x, a); else return foo_generic(x, a);
![Page 8: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/8.jpg)
Function Multiversioning
Avoid splitting code between files.
Avoid expensive runtime checks.
Performance and code size benefits of per-cpu features.
![Page 9: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/9.jpg)
__attribute__ ((target ("default")))int foo () { // The default version of foo. return 0;}__attribute__((target("sse4.2"))) int foo() { // foo version for SSE4.2 return 1;}__attribute__((target("arch=atom"))) int foo() { // foo version for the Intel ATOM processor return 2;}int main() { int (*p)() = &foo; assert((*p)() == foo()); return 0;}
![Page 10: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/10.jpg)
Function Multiversioning - Linux/IFUNC
Functions are specially mangled
All calls go through the PLT
Dispatch function is generated to determine CPU features
Special symbol type and relocation to help minimize the dispatch overhead
![Page 11: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/11.jpg)
Why not a function pointer?
Another function to do the dispatch one call through the PLTfor a shared library
Then the indirect call through the function table
With IFUNC the PLT resolves to the method that gets chosen by theIFUNC resolver
![Page 12: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/12.jpg)
define float @_Z3barv() #0 {entry: ret float 4.000000e+00}define float @_Z4testv() #1 {entry: ret float 1.000000e+00}define float @_Z3foov() #2 {entry: ret float 4.000000e+00}define float @_Z3bazv() #3 {entry: ret float 4.000000e+00}attributes #0 = { "target-cpu"="x86-64" "target-features"="+avx2" }attributes #1 = { "target-cpu"="x86-64" }attributes #2 = { "target-cpu"="corei7" "target-features"="+sse4.2" }attributes #3 = { "target-cpu"="x86-64" "target-features"="+avx2" }
![Page 13: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/13.jpg)
Talk Outline
Motivation
Current Status and Changes
Future Work
![Page 14: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/14.jpg)
Subtarget Specific Code GenerationTargetSubtargetInfo &ST = const_cast<TargetSubtargetInfo&>(TM.getSubtarget<TargetSubtargetInfo>());ST.resetSubtargetFeatures(MF);
Only works for instruction selection
Requires a global lock on the Subtarget - no parallel code generation!
![Page 15: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/15.jpg)
define float @_Z3barv() #0 {entry: ret float 4.000000e+00}define float @_Z4testv() #1 {entry: ret float 1.000000e+00}define float @_Z3foov() #2 {entry: ret float 4.000000e+00}define float @_Z3bazv() #3 {entry: ret float 4.000000e+00}attributes #0 = { "target-cpu"="x86-64" "target-features"="+avx2" }attributes #1 = { "target-cpu"="x86-64" }attributes #2 = { "target-cpu"="corei7" "target-features"="+sse4.2" }attributes #3 = { "target-cpu"="x86-64" "target-features"="+avx2" }
![Page 16: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/16.jpg)
TargetMachine
Target OptionsSubtargetData LayoutInstruction Selection InformationFrame LoweringSchedulingPass ManagerObject File Layout
![Page 17: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/17.jpg)
TargetMachine
Target OptionsSubtargetData LayoutInstruction Selection InformationFrame LoweringSchedulingPass ManagerObject File Layout
![Page 18: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/18.jpg)
TargetMachine
Target OptionsSubtargetData LayoutInstruction Selection InformationFrame LoweringSchedulingPass ManagerObject File Layout
![Page 19: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/19.jpg)
TargetMachine
Target OptionsSubtargetData LayoutInstruction Selection InformationFrame LoweringSchedulingPass ManagerObject File Layout
![Page 20: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/20.jpg)
TargetMachine
Target OptionsSubtargetData LayoutInstruction Selection InformationFrame LoweringSchedulingPass ManagerObject File Layout
![Page 21: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/21.jpg)
TargetMachine
Target OptionsData LayoutPass ManagerObject File Layout
Handles everything for the Target and Object File emission.
![Page 22: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/22.jpg)
Subtarget Cache
getSubtarget still exists
template <typename STC> const STC &getSubtarget(const Function *) const
mutable StringMap<std::unique_ptr<STC>> SubtargetMap
![Page 23: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/23.jpg)
const X86Subtarget *X86TargetMachine::getSubtargetImpl(const Function &F) const { AttributeSet FnAttrs = F.getAttributes(); Attribute CPUAttr = FnAttrs.getAttribute(AttributeSet::FunctionIndex, "target-cpu"); Attribute FSAttr = FnAttrs.getAttribute(AttributeSet::FunctionIndex, "target-features");
std::string CPU = !CPUAttr.hasAttribute(Attribute::None) ? CPUAttr.getValueAsString().str() : TargetCPU; std::string FS = !FSAttr.hasAttribute(Attribute::None) ? FSAttr.getValueAsString().str() : TargetFS;
auto &I = SubtargetMap[CPU + FS]; if (!I) { resetTargetOptions(F); I = llvm::make_unique<X86Subtarget>(TargetTriple, CPU, FS, *this, Options.StackAlignmentOverride); } return I.get();}
![Page 24: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/24.jpg)
const X86Subtarget *X86TargetMachine::getSubtargetImpl(const Function &F) const { AttributeSet FnAttrs = F.getAttributes(); Attribute CPUAttr = FnAttrs.getAttribute(AttributeSet::FunctionIndex, "target-cpu"); Attribute FSAttr = FnAttrs.getAttribute(AttributeSet::FunctionIndex, "target-features");
std::string CPU = !CPUAttr.hasAttribute(Attribute::None) ? CPUAttr.getValueAsString().str() : TargetCPU; std::string FS = !FSAttr.hasAttribute(Attribute::None) ? FSAttr.getValueAsString().str() : TargetFS;
auto &I = SubtargetMap[CPU + FS]; if (!I) { resetTargetOptions(F); I = llvm::make_unique<X86Subtarget>(TargetTriple, CPU, FS, *this, Options.StackAlignmentOverride); } return I.get();}
![Page 25: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/25.jpg)
Subtarget Cache
Implemented for X86, ARM, AArch64, Mips
Trivial to implement for other architectures
![Page 26: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/26.jpg)
TargetTransformInfo
Uses a lot of Subtarget specific information
Pass manager doesn’t support boundary crossing analysis passes
So we need a function specific TTI
![Page 27: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/27.jpg)
class FunctionTargetTransformInfo final : public FunctionPass {private: const Function *Fn; const TargetTransformInfo *TTI;public: void getUnrollingPreferences(Loop *L, TargetTransformInfo::UnrollingPreferences &UP) const { TTI->getUnrollingPreferences(Fn, L, UP); }};
![Page 28: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/28.jpg)
Talk Outline
Motivation
Current Status and Changes
Future Work
![Page 29: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/29.jpg)
IR Changes
Function attribute cpu and feature string
attributes #0 = { "target-cpu"="x86-64" "target-features"="+avx2" }
New call/invoke destination for IFUNC calls
![Page 30: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/30.jpg)
Optimization Directions
CFG Cloning
Auto-Autovectorization
Advanced Idiom Recognition
![Page 31: and Function Multiversioning Architecture Specific Code ...](https://reader031.fdocuments.us/reader031/viewer/2022012014/61591ce1b007e445a707e914/html5/thumbnails/31.jpg)
Questions?