Niagara: a 32-Way Multithreaded SPARC Processor P. Kongetira, K. Aingaran, K.Olokotun Sun...

12
Niagara: a 32-Way Multithreaded SPARC Processor P. Kongetira, K. Aingaran, K.Olokotun Sun Microsystems Presented by Bogdan Romanescu

Transcript of Niagara: a 32-Way Multithreaded SPARC Processor P. Kongetira, K. Aingaran, K.Olokotun Sun...

Niagara: a 32-Way Multithreaded SPARC

Processor

P. Kongetira, K. Aingaran, K.Olokotun

Sun Microsystems

Presented by Bogdan Romanescu

Goal

• Commercial server applications:– High thread level parallelism (TLP)

• Large numbers of parallel client requests

– Low instruction level parallelism (ILP)• High cache miss rates• Many unpredictable branches• Frequent load-load dependencies

• Power, cooling, and space are major concerns for data centers

Sun’s Solution• UltraSPARC T1 processor • “the highest-throughput and most eco-

responsible processor ever created”®

• Multicore • Fine-grain multithreading within core• Simple pipelines• Small L1 cache• Shared L2• Metric: Performance/Watt

Architecture

Sparc pipe• UltraSPARC II style • Single issue 6 stage: F, S, D, E, M, W• Shared units:

– L1 $ – TLB – X units – pipe registers

• Hazards:– Data– Structural

Integer Register file

• One register file / thread• SPARC window: in, out, local registers• Highly integrated cell structure to support 4

threads:– 8 windows of 32 locations / thread– 3 read ports + 2 write ports– Read/write: single cycle latency

• 1 Active Window Cell (copy of the architectural set window)

Thread scheduling• Thread selection based on:

– Previous long latency instruction in pipe– Instruction type– LRU status

• Select & Fetch

coupled

Memory

• 16 KB 4 way set assoc. I$/ core• 8 KB 4 way set assoc. D$/ core• 3MB 12 way set assoc. L2 $ shared

– 4 x 750KB independent banks– 2 cycle throughput, 8 cycle latency– Direct link to DRAM & Jbus– Manages cache coherence for the 8 cores– CAM based directory

Write through

• allocate LD

• no-allocate ST

Performance

Test\Architecture Sun Fire

T2000

IBM p5-550 with 2 dual-core Power5

chips Dell PowerEdge

SPECjbb2005 (Java server software) business operations/ sec 63,378 61,789

24,208 (SC1425 with dual single-core Xeon)

SPECweb2005 (Web server performance) 14,001 7,881

4,850 (2850 with two dual-core Xeon processors)

NotesBench (Lotus Notes performance)

16,061 14,740

“Home run“ ?• Relatively slow single-thread performance• Poor floating-point performance • Lack of software support ( Sun Fire T2000 does not

support Linux or Windows)• Price• Concurrency counterattack

– no place as a general-purpose computer running databases– small low-end market segment ?

• Niagara II & The “Rock” – multiprocessor & enhanced single thread support

References

• [1] P. Kongetira, et al, “A 32-Way Multithreaded SPARC Processor,” IEEE Micro, vol. 25, pp. 21-29, Mar., 2005.

• [2] A. S. Leon, et al, “A Power-Efficient High-Throughput 32-Thread SPARC Processor”, ISSCC 2006 , SESSION 5 , PROCESSORS

• [3] S. Chaudhry, S. Yip, P. Caprioli and M. Tremblay, “High Performance Throughput Computing” , IEEE Micro, vol. 25, Issue 3, 2005

• [4] http://opensparc.sunsource.net/nonav/opensparct1.html

• [5] http://www.sun.com/processors/UltraSPARC-T1/features.xml

• [6] http://www.sun.com/servers/coolthreads/t1000/benchmarks.jsp

• [7] http://news.com.com/Sun+begins+Sparc+phase+of+server+overhaul/2163-1010_3-5983365.html

• [8] http://h71028.www7.hp.com/ERC/cache/280124-0-0-0-121.html