Hyper Threading Technology
-
Upload
nayakslideshare -
Category
Technology
-
view
3.234 -
download
2
Transcript of Hyper Threading Technology
Hyper-Threading Hyper-Threading TechnologyTechnology
IQxplorerIQxplorer
OutlineOutline
What is Hyper-Threading Technology?
Hyper-Threadig Technology in Intel microprocessors
Microarchitecture Choices & Tradeoffs
Performance Results Conclusion
OutlineOutline
What is Hyper-Threading Technology?
Hyper-Threading Technology in Intel microprocessors
Microarchitecture Choices & Tradeoffs
Performance Results Conclusion
Hyper-Threading Technology Simultaneous Multi-threading – 2 logical processors (LP) simultaneously
share one physical processor’s execution resources
Appears to software as 2 processors (2-way shared memory multiprocessor)
– Operating System schedules software threads/processes to both logical processors
– Fully compatible to existing multi-processor system software and hardware.
Integral part of Intel Netburst Microarchitecture
Die Size Increase is SmallDie Size Increase is Small
Total die area added is small Total die area added is small
––A few small structures duplicated A few small structures duplicated
––Some additional control logic andSome additional control logic and
pointerspointers
Complexity is LargeComplexity is Large
Challenged many basic assumptions Challenged many basic assumptions New microarchitecture algorithmsNew microarchitecture algorithms ––To address new uop (micro-operation)To address new uop (micro-operation)
prioritization issues prioritization issues ––To solve potential new livelock scenariosTo solve potential new livelock scenarios High logic High logic design design complexity complexity Validation Effort Validation Effort ––Explosion of validation spaceExplosion of validation space
OutlineOutline
What is Hyper-Threading Technology?
Hyper-Threading Technology in Intel microprocessors
Microarchitecture Choices & Tradeoffs
Performance Results Conclusion
HT Technology in Intel microprocessors
Hyper-Threading is the Intel implementation of simultanious multi-threading
Integral part of Intel Netburst Microarchitecture
– e.g. Intel Xeon Processors
Intel Processors with Netburst Intel Processors with Netburst MicroarchitectureMicroarchitecture
Intel Xeon MP Processor Intel Xeon Processor Intel Xeon Intel Xeon MP Processor Intel Xeon Processor Intel Xeon ProcessorProcessor
256 KB 2nd-Level Cache 256 KB 2nd-Level Cache 512 KB 2nd-Level Cache256 KB 2nd-Level Cache 256 KB 2nd-Level Cache 512 KB 2nd-Level Cache
1 MB 3rd-Level Cache 1 MB 3rd-Level Cache
What was addedWhat was added
OutlineOutline
What is Hyper-Threading Technology?
Hyper-Threading Technology in Intel microprocessors
Microarchitecture Choices & Tradeoffs
Performance Results Conclusion
Managing ResourcesManaging Resources ChoicesChoices – – PartitionPartition – – Half of resource dedicated to each logical processor Half of resource dedicated to each logical processor – – ThresholdThreshold – – Flexible resource sharing with limit on maximum Flexible resource sharing with limit on maximum
resource usage resource usage – – Full SharingFull Sharing – – Flexible resource sharing with no limit on maximum Flexible resource sharing with no limit on maximum
resource usage resource usage
ConsiderationsConsiderations – – Throughput and fairnessThroughput and fairness – – Die size and ComplexityDie size and Complexity
PartitioningPartitioning Half of resource dedicated to each logical Half of resource dedicated to each logical
processor processor – – Simple, low complexity Simple, low complexity Good for structures where Good for structures where – – Occupancy time can be high and Occupancy time can be high and
unpredictable unpredictable – – High average utilization High average utilization Major pipeline queues are a good example Major pipeline queues are a good example – – Provide buffering to avoid pipeline stalls Provide buffering to avoid pipeline stalls – – Allow slip between logical processorsAllow slip between logical processors
Execution PipelineExecution Pipeline
Execution PipelineExecution Pipeline
Partition queues between major pipestages of pipelinePartition queues between major pipestages of pipeline
Partitioned Queue ExamplePartitioned Queue Example
WWith full sharing, a slow thread can get ith full sharing, a slow thread can get
unfair share of resourcesunfair share of resources!! So, Partitioning cSo, Partitioning can prevent a faster thread from an prevent a faster thread from
making rapid progress.making rapid progress.
Partitioned Queue ExamplePartitioned Queue Example
Partitioning resource ensures fairness and Partitioning resource ensures fairness and
ensures progress for both logical ensures progress for both logical processorsprocessors!!
ThresholdsThresholds Flexible resource sharing with limit on maximum Flexible resource sharing with limit on maximum
resource usage resource usage Good for small structures where Good for small structures where – – Occupancy time is low and predictable Occupancy time is low and predictable – – Low average utilization with occasional high Low average utilization with occasional high
peaks peaks Schedulers are a good example Schedulers are a good example – – Throughput is high because of dataThroughput is high because of data
speculationspeculation (get data regardless of cache hit)(get data regardless of cache hit) – – uOps pass through scheduler very quickly uOps pass through scheduler very quickly – – Schedulers are small for speedSchedulers are small for speed
Schedulers, QueuesSchedulers, Queues
5 schedulers:5 schedulers: MEMMEM ALU0ALU0 ALU1ALU1 FP MoveFP Move FP/MMX/SSEFP/MMX/SSE
Threshold prevents one logical processor from Threshold prevents one logical processor from consuming all entriesconsuming all entries
((Round RobinRound Robin until reach threshold) until reach threshold)
Variable partitioning allows a logical processor to use Variable partitioning allows a logical processor to use most resources when the other doesn’t need themmost resources when the other doesn’t need them
Full SharingFull Sharing Flexible resource sharing with no limit on maximum Flexible resource sharing with no limit on maximum
resource usage resource usage Good for large structures where Good for large structures where – – Working set sizes are variableWorking set sizes are variable – – Sharing between logical processors possible Sharing between logical processors possible – – Not possible for one logical processor to starve Not possible for one logical processor to starve Caches are a good example Caches are a good example – – All caches are sharedAll caches are shared – – Better overall performance vs. partitioned cachesBetter overall performance vs. partitioned caches – – Some applications share code and/or data Some applications share code and/or data – – High set associativity minimizes conflict misses.High set associativity minimizes conflict misses. – – Level 2 and 3 caches are 8-way set associativeLevel 2 and 3 caches are 8-way set associative
OnOn average, a shared cache has 40% better hit rate and 12% better average, a shared cache has 40% better hit rate and 12% better
performance for these applications.performance for these applications.
OutlineOutline
What is Hyper-Threading Technology?
Hyper-Threading Technology in Intel microprocessors
Microarchitecture Choices & Tradeoffs
Performance Results Conclusion
Server PerformanceServer Performance
Good performance benefit from small die area investmentGood performance benefit from small die area investment
Multi-taskingMulti-tasking
Larger gains can be realized by running dissimilar Larger gains can be realized by running dissimilar applications due to different resource requirementsapplications due to different resource requirements
OutlineOutline
What is Hyper-Threading Technology?
Hyper-Threading Technology in Intel microprocessors
Microarchitecture Choices & Tradeoffs
Performance Results Conclusion
ConclusionsConclusions
Hyper-Threading Technology is an integral part Hyper-Threading Technology is an integral part of the part of the Netburst Microarchitecture of the part of the Netburst Microarchitecture
– – Very little additional die area needed Very little additional die area needed – – Compelling performanceCompelling performance – – Currently enabled for Currently enabled for both both server server and desktop and desktop
processorsprocessors Microarchitecture design choices Microarchitecture design choices – – Resource sharing policy matched to traffic andResource sharing policy matched to traffic and
performance requirements performance requirements New challenging microarchitecture directionNew challenging microarchitecture direction
Any Questions ???Any Questions ???