Hyper Threading Technology

Hyper-Threading Hyper-Threading TechnologyTechnology

IQxplorerIQxplorer

OutlineOutline

What is Hyper-Threading Technology?

Hyper-Threadig Technology in Intel microprocessors

Microarchitecture Choices & Tradeoffs

Performance Results Conclusion

OutlineOutline


Hyper-Threading Technology in Intel microprocessors



Hyper-Threading Technology Simultaneous Multi-threading – 2 logical processors (LP) simultaneously

share one physical processor’s execution resources

Appears to software as 2 processors (2-way shared memory multiprocessor)

– Operating System schedules software threads/processes to both logical processors

– Fully compatible to existing multi-processor system software and hardware.

Integral part of Intel Netburst Microarchitecture

Die Size Increase is SmallDie Size Increase is Small

Total die area added is small Total die area added is small

––A few small structures duplicated A few small structures duplicated

––Some additional control logic andSome additional control logic and

pointerspointers

Complexity is LargeComplexity is Large

Challenged many basic assumptions Challenged many basic assumptions New microarchitecture algorithmsNew microarchitecture algorithms ––To address new uop (micro-operation)To address new uop (micro-operation)

prioritization issues prioritization issues ––To solve potential new livelock scenariosTo solve potential new livelock scenarios High logic High logic design design complexity complexity Validation Effort Validation Effort ––Explosion of validation spaceExplosion of validation space

OutlineOutline





HT Technology in Intel microprocessors

Hyper-Threading is the Intel implementation of simultanious multi-threading

Integral part of Intel Netburst Microarchitecture

– e.g. Intel Xeon Processors

Intel Processors with Netburst Intel Processors with Netburst MicroarchitectureMicroarchitecture

Intel Xeon MP Processor Intel Xeon Processor Intel Xeon Intel Xeon MP Processor Intel Xeon Processor Intel Xeon ProcessorProcessor

256 KB 2nd-Level Cache 256 KB 2nd-Level Cache 512 KB 2nd-Level Cache256 KB 2nd-Level Cache 256 KB 2nd-Level Cache 512 KB 2nd-Level Cache

1 MB 3rd-Level Cache 1 MB 3rd-Level Cache

What was addedWhat was added

OutlineOutline





Managing ResourcesManaging Resources ChoicesChoices – – PartitionPartition – – Half of resource dedicated to each logical processor Half of resource dedicated to each logical processor – – ThresholdThreshold – – Flexible resource sharing with limit on maximum Flexible resource sharing with limit on maximum

resource usage resource usage – – Full SharingFull Sharing – – Flexible resource sharing with no limit on maximum Flexible resource sharing with no limit on maximum

resource usage resource usage

ConsiderationsConsiderations – – Throughput and fairnessThroughput and fairness – – Die size and ComplexityDie size and Complexity

PartitioningPartitioning Half of resource dedicated to each logical Half of resource dedicated to each logical

processor processor – – Simple, low complexity Simple, low complexity Good for structures where Good for structures where – – Occupancy time can be high and Occupancy time can be high and

unpredictable unpredictable – – High average utilization High average utilization Major pipeline queues are a good example Major pipeline queues are a good example – – Provide buffering to avoid pipeline stalls Provide buffering to avoid pipeline stalls – – Allow slip between logical processorsAllow slip between logical processors

Execution PipelineExecution Pipeline

Execution PipelineExecution Pipeline

Partition queues between major pipestages of pipelinePartition queues between major pipestages of pipeline

Partitioned Queue ExamplePartitioned Queue Example

WWith full sharing, a slow thread can get ith full sharing, a slow thread can get

unfair share of resourcesunfair share of resources!! So, Partitioning cSo, Partitioning can prevent a faster thread from an prevent a faster thread from

making rapid progress.making rapid progress.

Partitioned Queue ExamplePartitioned Queue Example

Partitioning resource ensures fairness and Partitioning resource ensures fairness and

ensures progress for both logical ensures progress for both logical processorsprocessors!!

ThresholdsThresholds Flexible resource sharing with limit on maximum Flexible resource sharing with limit on maximum

resource usage resource usage Good for small structures where Good for small structures where – – Occupancy time is low and predictable Occupancy time is low and predictable – – Low average utilization with occasional high Low average utilization with occasional high

peaks peaks Schedulers are a good example Schedulers are a good example – – Throughput is high because of dataThroughput is high because of data

speculationspeculation (get data regardless of cache hit)(get data regardless of cache hit) – – uOps pass through scheduler very quickly uOps pass through scheduler very quickly – – Schedulers are small for speedSchedulers are small for speed

Schedulers, QueuesSchedulers, Queues

5 schedulers:5 schedulers: MEMMEM ALU0ALU0 ALU1ALU1 FP MoveFP Move FP/MMX/SSEFP/MMX/SSE

Threshold prevents one logical processor from Threshold prevents one logical processor from consuming all entriesconsuming all entries

((Round RobinRound Robin until reach threshold) until reach threshold)

Variable partitioning allows a logical processor to use Variable partitioning allows a logical processor to use most resources when the other doesn’t need themmost resources when the other doesn’t need them

Full SharingFull Sharing Flexible resource sharing with no limit on maximum Flexible resource sharing with no limit on maximum

resource usage resource usage Good for large structures where Good for large structures where – – Working set sizes are variableWorking set sizes are variable – – Sharing between logical processors possible Sharing between logical processors possible – – Not possible for one logical processor to starve Not possible for one logical processor to starve Caches are a good example Caches are a good example – – All caches are sharedAll caches are shared – – Better overall performance vs. partitioned cachesBetter overall performance vs. partitioned caches – – Some applications share code and/or data Some applications share code and/or data – – High set associativity minimizes conflict misses.High set associativity minimizes conflict misses. – – Level 2 and 3 caches are 8-way set associativeLevel 2 and 3 caches are 8-way set associative

OnOn average, a shared cache has 40% better hit rate and 12% better average, a shared cache has 40% better hit rate and 12% better

performance for these applications.performance for these applications.

OutlineOutline





Server PerformanceServer Performance

Good performance benefit from small die area investmentGood performance benefit from small die area investment

Multi-taskingMulti-tasking

Larger gains can be realized by running dissimilar Larger gains can be realized by running dissimilar applications due to different resource requirementsapplications due to different resource requirements

OutlineOutline





ConclusionsConclusions

Hyper-Threading Technology is an integral part Hyper-Threading Technology is an integral part of the part of the Netburst Microarchitecture of the part of the Netburst Microarchitecture

– – Very little additional die area needed Very little additional die area needed – – Compelling performanceCompelling performance – – Currently enabled for Currently enabled for both both server server and desktop and desktop

processorsprocessors Microarchitecture design choices Microarchitecture design choices – – Resource sharing policy matched to traffic andResource sharing policy matched to traffic and

performance requirements performance requirements New challenging microarchitecture directionNew challenging microarchitecture direction

Any Questions ???Any Questions ???

Hyper Threading Technology

Technology

Transcript of Hyper Threading Technology