June 1, 2015Computation Products Group 2 Top Level Agenda The
HPC Market & AMD AMD64 A Programmers View AMD Opteron Processor
The HW Core Improvements Integrated Memory Controller
HyperTransport Technology Clustering Performance System Solutions
& Applications Development platforms Recent Events Summary
Slide 3
June 1, 2015Computation Products Group 3 Computing System
Evolution: Mainframes to desktops to clusters Mainframes ~ 1965
Tightly coupled processor, computer, OS and software from a single
company Proprietary software >$1M Departmental Minicomputers ~
1970 Significant proliferation of servers as machines leave glass
houses
June 1, 2015Computation Products Group 42 AMD Athlon 64
Processor Replaces Address, Data and Control Bus L2 Cache L1
Instruction Cache L1 Data Cache AMD64 Processor Core DDR Memory
Controller HyperTransport 72 16 AMD64: Desktop Processor 8 Byte
memory controller supporting 200, 266, & 333 MHz DDR Memory
CHIPKILL ECC with x4 DRAMs Drive up to 4 registered DIMMs 4 DIMMs
333MHz Future memory technology supported as it is defined Up to
4GB x4 DRAMS (4GB DIMMs) HyperTransport Technology I/O On chip L1
& L2 cache 64KB L1 ICache, 64KB L1 DCache Up to 1M ECC
protected L2 Cache 740-pin PGA Package
Slide 43
June 1, 2015Computation Products Group 43 1P AMD Athlon 64
Desktop Processor System System Strengths Memory Latency, Bandwidth
and memory reach: 2 40 physical ( 1 Terabyte) 2 48 virtual I/O
Latency and Bandwidth ~1600M T/sec 6.4 GB/s 64-bit CPU More
Reliable Lower Chip count Improved machine check Improved error
handling AMD-8151 AGP 8X AMD-8151 AGP 8X 16x16 HyperTransport @
1600 MTs 32bits @ 533Mhz AMD Athlon 64 200-333MHz 72-Bit Reg DDR
AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub FLASH SIO LPC PCI 33/32 NIC
USB1.1,2.0 AC97 ACR 1.0 MII 10/100 EIDE 4GB DRAM
Slide 44
June 1, 2015Computation Products Group 44 1P AMD Opteron 100
Series L2 Cache L1 Instruction Cache L1 Data Cache AMD64 Processor
Core DDR Memory Controller HyperTransport Replaces Address, Data
and Control Bus 72/144 16 18 CAS lines for 32GB of memory AMD64: 1
way Value Server 16 Byte memory controller supporting 200, 266,
& 333 MHz DDR Memory CHIPKILL ECC with x4 DRAMs Drive up to 8
registered DIMMs 8 DIMMs 333MHz Future memory technology supported
as it is defined Up to 4GB x4 DRAMS (4GB DIMMs) Three 16-bit
non-Coherent HyperTransport Technology Links On chip L1 & L2
cache 64KB L1 ICache, 64KB L1 DCache Up to 1M ECC protected L2
Cache 940-pin PGA Package
Slide 45
June 1, 2015Computation Products Group 45 1P AMD Opteron 100
Desktop Processor System AMD-8151 AGP 8X AMD-8151 AGP 8X 16x16
HyperTransport @ 1600 MTs 32bits @ 533Mhz AMD Opteron AMD-8111 TM
I/O Hub AMD-8111 TM I/O Hub FLASH SIO LPC PCI 33/32 NIC USB1.1,2.0
AC97 ACR 1.0 MII 10/100 EIDE 8GB DRAM PCI-X AMD-8131 PCI-X Tunnel
AMD-8131 PCI-X Tunnel PCI-X AMD-8131 PCI-X Tunnel AMD-8131 PCI-X
Tunnel System Strengths Ideal for cost sensitive designs system
where I/O is the critical commodity Storage servers Low end DCC
workstations
Slide 46
June 1, 2015Computation Products Group 46 2P - AMD Opteron 200
Series L2 Cache L1 Instruction Cache L1 Data Cache AMD64 Processor
Core DDR Memory Controller HyperTransport Replaces Address, Data
and Control Bus 72/144 16 18 CAS lines for 32GB of memory AMD64: 2
Way Performance Server 16 Byte memory controller supporting 200,
266, & 333 MHz DDR Memory CHIPKILL ECC with x4 DRAMs Drive up
to 8 registered DIMMs 8 DIMMs 333MHz Future memory technology
supported as it is defined Up to 4GB x4 DRAMS (4GB DIMMs) One
coherent and two 16-bit non-Coherent HyperTransport Technology
Links On chip L1 & L2 cache 64KB L1 ICache, 64KB L1 DCache Up
to 1M ECC protected L2 Cache 940-pin PGA Package
Slide 47
June 1, 2015Computation Products Group 47 2P AMD Opteron 200
Server AMD Opteron AMD-8111 TM I/O Hub AMD-8111 TM I/O Hub FLASH
SIO LPC PCI 33/32 NIC USB1.1,2.0 AC97 ACR 1.0 MII 10/100 EIDE PCI-X
AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel AMD Opteron PCI-X
AMD-8131 PCI-X Tunnel AMD-8131 PCI-X Tunnel PCI-X AMD-8131 PCI-X
Tunnel AMD-8131 PCI-X Tunnel Bridge or SSL/IPSec. System Strengths
Ideal for systems where large flat memory is important (16GB of SMP
memory) Data mining Rational Data Base applications 8GB DRAM
Slide 48
June 1, 2015Computation Products Group 48 4P - 8P AMD Opteron
800 L2 Cache L1 Instruction Cache L1 Data Cache AMD64 Processor
Core DDR Memory Controller HyperTransport 72/144 16 AMD64: 4 - 8
Way Performance Server 16 Byte memory controller supporting 200,
266, & 333 MHz DDR Memory CHIPKILL ECC with x4 DRAMs Drive up
to 8 registered DIMMs 8 DIMMs 333MHz Future memory technology
supported as it is defined Up to 4GB x4 DRAMS (4GB DIMMs) Three
16-bit Coherent HyperTransport Technology Links On chip L1 & L2
cache 64KB L1 ICache, 64KB L1 DCache Up to 1M ECC protected L2
Cache 940-pin PGA Package
Slide 49
June 1, 2015Computation Products Group 49 AMD Opteron 800 HPC
Processing Node HPC Strengths Flat SMP like Memory Model: All four
reside with the same 2 48 memory map Expandable to 8P NUMA
Glue-less Coherent multi- processing: low Latency and high
Bandwidth ~1600M T/sec ( 6.4 GB/s) 32GB of High B/W external memory
bus (>5.3GB/sec.) Native high B/W memory map I/O
(>25Gbits/sec.)
Slide 50
Model Number Implementation First digitFirst digit =
scalability of AMD Opteron processor Second and third digitsSecond
and third digits = relative performance among AMD Opteron
processors Model number conveys directional improvement AMD Opteron
200 Series AMD Opteron 100 Series 8462.0GHz 8441.8GHz 8421.6GHz
8401.4GHz ModelClock 8462.0GHz 8441.8GHz 8421.6GHz 8401.4GHz
ModelClock 2462.0GHz 2441.8GHz 2421.6GHz 2401.4GHz ModelClock
2462.0GHz 2441.8GHz 2421.6GHz 2401.4GHz ModelClock AMD Opteron 800
Series Up to 8 way 1462.0GHz ModelClock 1441.8GHz ModelClock AMD
Opteron Processor Model _ _ _ 2.0GHz 146 Up to 2 way 1 way Model
Number Implementation
Slide 51
June 1, 2015Computation Products Group 51 Price Performance
Positioning Performance Price A solution unto it self 800 200 100
256K 1M
Slide 52
Opteron Processor Architecture
Slide 53
June 1, 2015Computation Products Group 53 The Elements of the
CPU L1 Instruction Cache 64KB 44-entry Load/Store Queue L2 Cache L1
Data Cache 64KB Crossbar Memory Controller HyperTransport TM System
Request Queue Fetch Int Decode & Rename OPs 36-entry FP
scheduler FADDFMISCFMUL Branch Prediction Instruction Control Unit
(72 entries) Fastpath Microcode Engine Scan/Align FP Decode &
Rename AGU ALU AGU ALU MULT AGU ALU Res Bus Unit
Slide 54
June 1, 2015Computation Products Group 54 Processor Throughput
Supply 16 instruction bytes to the decoder per cycle Convert x86
instructions to fixed length OPs 24-entry integer scheduler can
Dispatch 3 OPs per cycle to integer/FP schedulers Instructions use
one of two decoding pipelines Fastpath: instructions which are
decoded in to two or fewer mOPs are decoded by hardware and then
packed into 3 dispatch positions Microcode: x86 instructions which
are decoded in to more than two mOPs, calculate microcode ROM entry
point and fetch sequence from Microcode ROM Compared to AMD Athlon
XP, more instructions use the Fastpath Eg: Packed SSE is microcoded
in AMD Athlon XP and Fastpath in AMD Opteron processors AMD Opteron
has 8% fewer microcoded instructions for SPECint2000 AMD Opteron
has 28% fewer microcoded instructions for SPECfp2000
Slide 55
June 1, 2015Computation Products Group 55 Floating Point &
Integer Performance FPU Throughput SSE2, x87 Theoretical: (1 Mul +
1 Add)/cycle Realized: 1.9 FLOPs/cycle SSE, 3DNow! Theoretical: (2
Mul + 2 Add)/cycle Realized: 3.4+ FLOPs/cycle 32-bit Integer
Throughput 1 add / clock cycle 1 multiply / clock cycle Multiply
latency has shrunk from 5 cycles on AMD Athlon TM to 3 cycles on
the AMD Opteron 64-bit Integer Throughput 1 add / clock cycle 1
multiply every other clock cycle Multiply latency is 4 cycles
Integer Instruction Scheduler Out Of Order (OOO) from a queue of
24* Integer Macro-Ops *Athlon TM Instruction Scheduler is 18
Macro-Ops deep
Slide 56
June 1, 2015Computation Products Group 56 Internal Caching L1
caches 64k bytes instruction and data 2-way set associative Data
Cache is ECC protected Instruction Cache is Parity protected L2
cache Caches instruction and data streams 16-way set associative,
ECC protected >2X Athlon XP L2 L1 bandwidth Improved Translation
Look-aside Buffer for large multiprocessor workloads Twice the size
and Lower latencies then AMD Athlon XP L2 Translation Look-aside
Buffer 512 entry - 4-way associative L1 Translation Look-aside
Buffer 32 entry Instruction & Data -fully associative Machine
check architecture for reporting failures L1 Instruction Cache 64KB
44-entry Load/Store Queue L2 Cache L1 Data Cache 64KB Bus Unit
Slide 57
June 1, 2015Computation Products Group 57 Reliability Features
L1 Cache Data cache is ECC protected via background scrubber
Instruction cache is parity protected upon R/W L2 cache Cache Tag
arrays are ECC protected via background scrubber Instructions are
parity protected, Data is ECC protected ECC bit reused for Branch
Prediction and Instruction Decode (end bits) DRAM is ECC protected
with chipkill ECC support Each fetch is parity checked ECC via
scrubber period is user programmable for 40ns to 84usec. Remaining
arrays are parity protected Instruction cache, tags and TLBs Data
tags and TLBs Generally read only data which can be recovered
Machine Check Architecture Report failures and predictive failure
results ECC Branch Predictor ThermTrip Memory scrubbers
Slide 58
June 1, 2015Computation Products Group 58 Branch Prediction
Improvements Full L1 Cache Coverage Twice the selectors as AMD
Athlon XP 4K Branch Target Addresses Backed up by Branch Address
Calculator 4 cycle correction for unconditional relative branches
16K Bimodal Counters Four times AMD Athlon XP Full Pre-decode and
Branch Identification in L2 Cache New and unique to AMD Opteron
Family of Processors Reuses L2 ECC bits on clean/shared instruction
lines and on extra bit Branch Prediction Fetch OPs Instruction
Control Unit (72 entries) Fastpath Microcode Engine Scan/Align
Slide 59
Integrated Northbridge
Slide 60
June 1, 2015Computation Products Group 60 Firmware View of
Northbridge Performs same functions found in Northbridge Memory
Controller fully integrated Host-Bridge function as defined by the
PCI spec PCI to PCI Bridge as defined by the PCI spec Graphics
Address Resolution Table (GART) Multi-processor coherency
Controlled via PCI configuration registers Memory controller
configuration HyperTransport technology routing Configured by
Firmware HyperTransport initialization via Hardware Auto-size,
coherent or non-coherent, Legacy path to the ROM in Southbridge
HyperTransport technology speed and routing via firmware Everything
else in firmware follows existing paradigms PCI enumeration Memory
sizing and configuration I/O controller setup Crossbar Memory
Controller HyperTransport TM System Request Queue
Slide 61
June 1, 2015Computation Products Group 61 Systems View of
Northbridge (Assumes a 2GHz processor Clock)
Slide 62
June 1, 2015Computation Products Group 62 HyperTransport
Technology Screaming I/O for chip-to-chip communication High
bandwidth Point-to-point links Split transaction and full duplex
Differential Signaling Tunneling capability HyperTransport Links
Three 16-bit links (3.2 GB/s per direction) Reduced pin count
compared to the typical Bus based systems Compatible with
high-volume PC board infrastructure Each can be: cHT: coherent
(Processor-to-Processor) link or, ncHT: non-coherent
(Processor-to-I/O) link For more info see:
http://www.HyperTransport.org/http://www.HyperTransport.org/
Enables scalable 2-8 processor Cache-Coherent MP systems Glueless
MP
Slide 63
Performance
Slide 64
June 1, 2015Computation Products Group 64 Multi-Processor
Performance Evaluation Simulation Parameters Microbenchmark
Simulations: RTL based Cycle accurate DRAM Page hit System
Parameters: AMD Opteron 2 GHz CPU Memory Clock = 333 MHz Data Rate
Registered PC2700 DDR memory DRAM width = 128 bits interleaved CAS
latency = 2.5 memory clocks HT frequency = 1600 MHz Data Rate (16
bits) DDR Peak Bandwidth = 5.4 GB/s HT Peak Bandwidth = 3.2 GB/s
(each direction)
Slide 65
June 1, 2015Computation Products Group 65 SPECint Performance
AMD Opteron processor estimates Intel Xeon processor * *Source
http://www.spec.org/osg/cpu2000/results/cpu2000.html SPECint 2000
400 500 600 700 800 900 1000 1100 1200 1300
10001200140016001800200022002400260028003000 Operating Frequency
[MHz] SPECint 2000 *Based on 2GHz lab hardware Using 32 bit
binaries
Slide 66
June 1, 2015Computation Products Group 66 SPECfp Performance
Comparison *Sourcehttp://www.spec.org/osg/cpu2000/results/cpu2000.
html SPECfp 2000 *Based on 2GHz lab hardware Using 32 bit binaries
10001200140016001800200022002400260028003000 Operating Frequency
[MHz] AMD Opteron processor estimates Intel Xeon processor * 400
500 600 700 800 900 1000 1100 1200 1300 3200340036003800
400042004400460048005000 1400 1500 A A A A A B ~400 MHz ~ 1100 MHz
B B B
Slide 67
June 1, 2015Computation Products Group 67 Source:
http://www.spec.org SPECfp 2000 Scores 0 200 400 600 800 1000 1200
1400 00.511.522.53 CPU Frequency (GHz) Score Base
(IA32)Peak(IA32)AMD Opteron Processor (Estimated Performance) AMD
Opteron P4 400FSB P4 533FSB PIII 133FSB SPECfp 2000 Base
Competitive Summary (32-bit Windows, PC2700 CAS2.5) AMD Opteron
Redesign effort
Slide 68
June 1, 2015Computation Products Group 68 AMD Opteron SPEC
projections compared to Alpha EV7 AMD Opteron should be more
cost-effective versus Alpha EV7 Standards versus Proprietary
Millions per month versus 100s
Slide 69
June 1, 2015Computation Products Group 69 AMD Opteron SPEC
projections compared to Itanium-2 AMD Opteron will be more
cost-effective than Itanium-2 Standards versus Proprietary Millions
per month versus 1,000s
Slide 70
June 1, 2015Computation Products Group 70 Integrated Memory
Controller Latency (Local Memory Access, Registered Memory, CAS
2.5) 1.6GHz PC2700 65ns (L1 cache miss,TLB hit) 85-95ns (L1 cache
miss,TLB miss) Block Size (bytes) Time (ns) Stride (bytes) Stride
>1M 32k< Stride
June 1, 2015Computation Products Group 74 Sufficiently Uniform
Memory Organization (SUMO) Disadvantages 3P and 4P nodes work
better if the OS is aware of the memory map >4P may require a
NUMA aware OS if the CACHE hit rate is low Advantages Software view
of memory is SMP Latency difference between local & remote
memory is a function of the number of processors in the node 1P and
2P look like a SMP machine 3P and 4P are NUMA like but can still be
viewed as a ccUMA or asymmetric SMP node >4P can be viewed as
ccUMA and depending on CACHE hit rate, may or may not required NUMA
aware OS Physical address space is flat and can be viewed as fully
coherent or not (MOEIS state) DRAM can be contiguous or interleaved
Additional processor nodes bring true increased memory bandwidth
Designed for lower overall system chip count (glue-less
interface)
Slide 75
June 1, 2015Computation Products Group 75 Future NUMA Systems
Scaling beyond 8 Processor Scaling beyond 8P is enabled External
Coherent HyperTransport switch Coherent Interconnect Snoop filter
Data caching Up to 16 processors within the same 2 40 SPM memory
space 4P4P 4P4P 4P4P 4P4P SW2 SW3 4P4P 4P4P 4P4P 4P4P SW2 SW3
Interconnect Fabric 4P4P 4P4P 4P4P 4P4P SW0 SW1 4P4P 4P4P 4P4P 4P4P
SW2 SW3
Slide 76
AMD Opteron Support ICs
Slide 77
June 1, 2015Computation Products Group 77 AMD Opteron Support
ICs AMD is committed to deliver the highest quality systems
solutions Providing a family of x64-64 processors is just the start
AMD will promote and enable a broad range of HyperTransport support
silicon from internal and external design efforts. AMD, with the
HyperTransport consortium, will grow the HyperTransport
eco-system
Slide 78
June 1, 2015Computation Products Group 78 HyperTransport
Technology Consortium
Slide 79
June 1, 2015Computation Products Group 79 AMD-8131
HyperTransport PCI-X Tunnel Dual PCIx Master Each PCI-X Bridge
independently supports 66, 100, 133MHz PCI-X Protocol 33 and 66MHz
PCI 2.2 Protocol SHPC Controller 64-bit data path IOAPIC Arbiter
for up to 5 masters Hot-swap HyperTransport TM Support: 16/16 up,
8/8 down, independent support for Up to 1600MT/s up and down Full
Link Auto sizing and speed selection 829 OBGA, 37.5mm body, 1.27mm
pitch, full array, 6-Layer Motherboard Breakout AMD Opteron Or AMD
Athlon64 AMD-8111 TM I/O Hub FLASH SIO LPC 32bits @ 33Mhz
USB1.0,2.0 AC97 UDMA100 10/100 Ethernet 10/100 Phy 100 BaseT 8x8
HyperTransport @ 800MTs AMD-8131 HyperTransport Dual PCI-X 16x16
HyperTransport @ 1600MTs
Slide 80
June 1, 2015Computation Products Group 80 AMD-8111
HyperTransport I/O Hub I/O Hub Engineered from past successful AMD
I/O hub development efforts 8x8 wide 200 MHz DDR HyperTransport
technology interface (800MB/s aggregate BW) Enhanced 10/100
Ethernet MAC USB1.1, USB2.0, EDMA, AC97 LPC for BIOS ROM and Super
I/O PCI version 2.2 - 33/32 Bridge (legacy) Supports arbitration of
up to 8 external masters SMbus 1.0 and 2.0 controllers 492 PBGA,
35x35mm body, 1.27mm pitch AMD-8111 TM I/O Hub FLASH SIO LPC 32bits
@ 33Mhz NIC 10/100 BaseT 8x8 HyperTransport TM @ 800MHz USB1.1,2.0
AC97 MII EIDE
Slide 81
June 1, 2015Computation Products Group 81 AMD-8151
HyperTransport AGP Tunnel 8xAGP Fully AGP 3.0 Compliant
66,133,266,533MHz operation HyperTransport TM Support: 16/16 up,
8/8 down, independent support for Up to 1600MT/s up, Up to 800MT/s
down Full Link Auto sizing and speed selection 564 OBGA, 31x31mm
body, 1.27mm pitch, full array 8x AGP Int Gfx AMD 8151
HyperTransport AGP AMD Opteron Or AMD Athlon64 AMD-8111 TM I/O Hub
FLASH SIO LPC 32bits @ 33Mhz USB1.0,2.0 AC97 UDMA100 10/100
Ethernet 10/100 Phy 100 BaseT 8x8 HyperTransport @ 800MTs
Slide 82
June 1, 2015Computation Products Group 82 Opteron & Athlon
Server Chipset Roadmap 2H02 2003 2004 2005 AMD-760MP/MPX AMD-8111
HyperTransport I/O Hub 7 th Generation 8 th Generation AMD-8151
HyperTransport AGP Tunnel AMD-8131 HyperTransport PCI-X Tunnel 2
PCI-X Bridges HyperTransport Second Generation PCI Device Second
Generation HyperTransport I/O Hub
Slide 83
June 1, 2015Computation Products Group 83 Desktop
Infrastructure Roadmap Athlon 64 Desktop Chipset Roadmap
Slide 84
June 1, 2015Computation Products Group 84 A Growing ecosystem
of HyperTransport enabled ICs Available today: Dual MIPS processor
- Broadcom BCM1250 PCI 66/64 Bridge from Alliance Semi. NITROX
Security Macro Processor from Cavium Networks FPGA from XILINX and
Altera Announced: RM9000 MIPS processor from PMC Sierra 4 Port 8/8
HyperTransport TM switch swap support from Alliance Semi. SSL/TLS
Record Processing Systems Broadcom BC5850 Luminance Modular Array
Technology - Lightspeed Semiconductor Planned: InfiniBand Bridge
Proprietary High Speed Interconnect 4 Port 16/16 non-coherent
switch 4 port 16/16 coherent switch PCI-X Bridges
Slide 85
June 1, 2015Computation Products Group 85 HyperTransport TM
technology 4-way 16/16 Non-Coherent Switch Extends the fabric by
re-mapping Unit_IDs at each port Tracks path of packet that pass
through it, guaranteeing the same return path Records the incoming
Unit_ID so it can be restored in the response packet Follows same
rules as Processor Host interface Peer-to-peer through the switch
freeing up the host Facilitates multiple Host fabrics
June 1, 2015Computation Products Group 92 HyperTransport
Technology on the Backplane non coherent interconnect SI4041 Switch
SI4041 Switch SI4041 Switch 4P Blade Switches and 8111 on the
backplane Hot swap connection
Slide 93
June 1, 2015Computation Products Group 93 Two - 8 Processor
System Topologies (NUMA)
June 1, 2015Computation Products Group 95 AMD Athlon 64 1P
Blade Design 16x16 HyperTransport @ 1,000MT/s AMD Athlon 64 4GB
DRAM Luminance Modular Array ASIC Interface Device Luminance
Modular Array ASIC Interface Device HCA Interface Ultra low cost
Blade design 4GB 333MHz DRAM 2GHz processor ~35 Watts Luminance
Device Boots the Processor Provides HCA network interface Boot
ROM
Slide 96
June 1, 2015Computation Products Group 96 AMD Opteron Processor
DP 2P Graphics Workstation TM
Slide 97
June 1, 2015Computation Products Group 97 2P AMD Opteron
Processor Graphics Workstation (Cave)
Slide 98
June 1, 2015Computation Products Group 98 High density
SprayCooled Blade Configuration 4P 16G-flop Blade Design 64GB of
SMP DRAM ASIC boots the 4P unit PCI-X provides all I/O Vapor cooled
in sealed enclosure External VRM
Slide 99
June 1, 2015Computation Products Group 99 How ISR SprayCool TM
Technology Works b. Vapor travels though the heat exchanger to be
condensed c. Fluid collects in reservoir d. Fluid is purified by
the filtration system e. Fluid is pumped back into the electronics
in a continuous cycle a. As the electronics are sprayed, the fluid
vaporizes, cooling the electronics to a low, stable temperature. f.
Sealed enclosure protects electronics from dust, dirt,
salt-air
Slide 100
June 1, 2015Computation Products Group 100 16 cards
16G-flops/card 256G-flops peak throughput 64GB of memory per card
1TerraByte of sys. Memory 240 cubic inches 114M-flops/cubic inch
4.27GB of memory storage cubic inch ~6K watts ~3 watts/cubic inch
14 10 16 High Density HPC Cluster SprayCool Technology from
ISR
Slide 101
AMD Reference Design Kits
Slide 102
June 1, 2015Computation Products Group 102 Four Hardware
platforms Solo (AMD): 1P AMD Opteron mother-board for Desk top
applications Serenade (AMD): 2P AMD Opteron system board for HPC
and server applications Quartet (AMD): 4U-4P AMD Opteron system
board for HPC and server applications Khperi (Newisys): 1U-2P AMD
Opteron server board
Slide 103
June 1, 2015Computation Products Group 103 Solo Features
Athlon64 Uni-processor Two Unbuffered PC2700, PC2100 DDR DIMMs AMD
8151 AGP8X HyperTransport Tunnel AMD 8111 I/O Hub Four PCI 32b
33MHz slots Two ATA-100 EIDE connectors Size USB 2.0 ports 3 on
back panel, 2 on front panel, and 1 on ACR AC 97 audio SMBus 1.0
and 2.0 support One ACR slot; 1 Fan with sense and 1 Fan without
sense Floppy, serial, parallel, 2 PS/2 and 2 IEEE 1394a connectors
LPC Super I/O with 2 fans with sense 4-layer ATX form factor with
ATX power supply PC2001, WHQL, Energy Star, WFM 2.0 compliant
Slide 104
June 1, 2015Computation Products Group 104 Hammer Performance
Desktop (Solo-RDK)
Slide 105
June 1, 2015Computation Products Group 105 CPU/Memory Complex
Opteron processor 200 Series (supports up to 2 processors) Four
banks of 128bit registered DDR memory/CPU (DDR 200-333)I/O Full
size PCI-X slots: Two PCI-X 64/100 MHz or one PCI-X 64/133 (none
hot plug-able) One mini-PCI slot Dual Broadcom 10/100/1000 Ethernet
onboard Dual LSI U320 SCSI (one channel to disk, one channel to
rear expansion) Single USB1.1: to front SIO (Floppy, Serial,
Keyboard, Mouse)Management Single dedicated management, LAN10/100
Optional BMC management controller, IPMI 1.5 compliantStorage Dual
drive bays: (standard) IDE or (standard or hot-swap) SCSI drives
Slim-line IDE CD-ROM or slim-line floppy drivePhysicals 1U
Rack-mount server form factor, tool-less access, full extension
slide rails Single 500W power-supply, rear accessible to line cord
Removable blowers, cooling performed front-to-rear (passive CPU
heatsinks) Front LED panel with activity and status: PWR, RESET,
USB, PCI-Video Dimensions: (1U) x 19 W x 28 D 1U/2P Serenade
Slide 106
June 1, 2015Computation Products Group 106 1U/2P Serenade Front
View 28 500W Power Supply CDROM or Floppy (slimline) Drive Carriers
(x2) (SCSI hot swappable) 10 Redundant Blowers (front to back
cooling) AMD Opteron 200 Series (x2) 32/33MHz PCI (half-height/half
length) (Video option) 8 DIMMs DDR 266-333 ECC (4DIMMs/CPU) SCSI
Disk Option (Mini-PCI) Full Size PCI-X Slots (x2) 64/100 MHz or
single PCI-X 64/133 (riser w/sideband)
Slide 107
June 1, 2015Computation Products Group 107 1U/2P Serenade Rear
View Full Size PCI-X Slots (x2) 64/100 MHz or single PCI-X 64/133
module assembly (riser w/sideband) AMD Opteron 200 Series (x2)
cooling ducts Dedicated 10/100 IPMI Management Port Dual
10/100/1000 ENET 32/33MHz PCI (half-height/half length) (std.
half-height video option) PS2 ports U320 SCSI Option (Mini-PCI) USB
port
Slide 108
June 1, 2015Computation Products Group 108 Quartet: 4U/4P
SledgeHammer MP 940-pin Processor
Slide 109
June 1, 2015Computation Products Group 109 Quartet System
Features 4U Rack-mount server form factor (25 deep) EIA-Std 4P
Opteron (940-pin) Four banks of 128bit registered DDR memory per
CPU (designed for DDR-333) 16 Total Five full size PCI-X slots (AMD
8131): Two PCI-X 64/133 MHz (hot plug-able) Three PCI-X 64/66 MHz
Ethernet Ports: Dual Broadcom 10/100/1000 Ethernet onboard Single
10/100 (AMD-8111) Dual LSI U320 SCSI (one channel to disk, one
channel to rear expansion) System Management: Qlogic UL BMC IPMI
1.5 via dedicated LAN/Modem
Slide 110
June 1, 2015Computation Products Group 110 Quartet System
Features (cont) Dual IDE: Slim-line CD-ROM, Slim Floppy Dual USB:
one front, one rear SIO (Floppy, Serial, Keyboard, Mouse) Storage:
Four 1 hot-swap Ultra320 SCSI drives Video: ATI 4 Meg (via card
option PCI 32/33) Three 500W hot-swap power-supplies (2+1
redundancy) for 4U; rear accessible to three line cords Hot-swap
redundant fans (10) Front LED panel with activity and status: PWR,
RESET, USB, PCI-Video Full extension slide rails Dimensions: 5.25 H
x 19 W x 28 D (*5.25 is main/processor section; an additional 1.75
is the power supply bay) Cooling front to rear (passive CPU
heatsinks) Tool-less access
Slide 111
June 1, 2015Computation Products Group 111 Dual Processor
Opteron System 1U 2P Opteron 16 GigaBytes RAM, max Fully Managed
Linux 32 & 64 bit Windows 32 bit 2000 and.Net Server Windows 64
bit (when available) Khepri
Slide 112
June 1, 2015Computation Products Group 112 Khepri Block
Diagram
Slide 113
June 1, 2015Computation Products Group 113 Khepri Alpha
Internal View
Slide 114
June 1, 2015Computation Products Group 114 Availability Solo
(AMD Athlon 64) Prototypes are available now Production planned in
Sept. 2003 Serenade (AMD) Development platform RDK available now
Production planned for June 2003 Quartet (AMD) RDK available June
2003 Production planned for Aug. 2003 Khperi (Newisys) Development
units are available now through AMD Beachhead Program Production
Now
Slide 115
June 1, 2015Computation Products Group 115 Platform Enablement
Program Over the past 24 months, AMD has provided technical design
support to over ~50 companies To date, Newisys has enabled over 17
vendors with their Khepri 2P platform reference design By Launch
(April 2003) there will be 4+ announcements of 4P HPC servers based
on AMD Opteron. By Nov. 2003 there we be many more vendors with 4P
and up to four vendors with 8P SMP/NUMA AMD Opteron platforms. With
the availability of a HyperTransport coherent switch, the NUMA
server can grow to 32P and beyond.
Slide 116
June 1, 2015Computation Products Group 116 2002-2003 AMD Server
Roadmap Enterprise Scalable SH MP 2.2 Basic + SH MP 2.0 SH MP 1.8
Basic Value + SH DP 1.6 SH DP 1.4 THR 2.13/2600+ THR 2.0/2400+ THR
1.8/2200+ THR 1.67/2000+ BAR 2.2/2800+ SH DP 1.6 SH DP 1.4 BAR
2.2/2800+ SH DP 1.4 BAR 2.2/2800+ SH DP 1.4 Value Ultra-Value DP/MP
Systems 1Q03 2Q033Q03 4Q03 SH DP 2.4/4200SH DP 2.6/4500 SH DP
1.4/2600SH DP 1.6/3000 SH DP 1.4/2600 THR 2.13/2600+ THR 2.0/2400+
THR 1.8/2200+ THR 1.67/2000+ SH DP 1.8 SH DP 1.6 4Q02 THR 2.0/2400+
THR 1.8/2200+ THR 1.67/2000+ SH DP 1.8 SH DP 1.6 SH DP 1.4 SH DP
2.0 SH DP 1.8 SH DP 1.6 SH DP 2.2 SH DP 2.0 SH DP 1.8 SH DP 2.0 SH
DP 1.8 SH DP 2.4 SH DP 2.2 SH DP 2.0 SH DP 2.6 SH DP 2.4 SH DP 2.2
THR 2.13/2600+ THR 2.0/2400+ THR 2.13/2600+ AMD Opteron processor
SledgeHammer DP AMD Opteron processor SledgeHammer MP AMD Athlon MP
processor Barton (266MHz FSB) AMD Athlon MP processor Thoroughbred
(266MHz FSB) SH MP 2.0 SH MP 1.8 SH MP 2.2 SH MP 2.0 SH MP 2.6 SH
MP 2.4 SH MP 1.6SH MP 1.8 SH MP 1.4 SH MP 1.6 SH MP 1.4
Slide 117
Summary
Slide 118
June 1, 2015Computation Products Group 118 AMD Opteron
Processor Optimized for high performance operation Chip
infrastructure optimized for sub micron process impacting: Power
distribution, Clocking, Circuit design and layout 20-25% better
performance per clock than AMD Athlon XP Smart low-latency memory
controller Branch prediction, Cache and TLB improvements Advanced
clock distribution methods New operand/address sizes, rather than
new instructions Integrated DDR Memory System Controller Closing
the gap between external memory access and CPU speed Reduced
latency of current Stare of Art (AMD Athlon processor) Greater the
bandwidth of current State of Art (AMD Athlon system) Integrated
Coherent HyperTransport I/O supporting High speed peripheral
connections - >6.4GB/s throughput Coherent HyperTransport
technology to support glueless MP interface
Slide 119
Slide 120
June 1, 2015Computation Products Group 120 Trademark
Attribution Copyright 2002 Advanced Micro Devices, Inc. All rights
reserved. AMD, the AMD Arrow Logo, AMD Athlon, AMD Opteron, 3DNow!
and combinations thereof are trademarks of Advanced Micro Devices,
Inc. HyperTransport is a licensed trademark of the HyperTransport
Consortium. MMX is trademark of Intel Corporation. Other product
names used in this presentation are for identification purposes
only and may be trademarks of their respective companies.