CSCE 212Introduction to Computer Architecture
Instructor: Jason D. Bakos
What is Computer Architecture?
• The design of computer systems, to…– To improve “performance”
• Run programs faster• Use less power, last longer on battery power• Generate less or more uniformally distributed heat• Improve video, 3D rendering, encoding, or decoding frame rate• Handle more secure encryption standards with reasonable latency• Achieve routing or network intrution detection at higher line speeds• Be more scalable• Be less expensive (e.g. higher integration)
– Can be achieved via:• Software (better OS, more optimized application code)or• Hardware (processor)
• Designing any complex system requires abstraction
CSCE 212 2
CSCE 212 3
Abstraction
• Abstration used to manage complexity of design– Hide details that are
not important
Application Software
Programs
Compiler
Operating Systems
Device Drivers
Architecture Instructions Registers
Micro-architecture
Datapaths Controllers
Logic Adders Memories
Digital circuits
AND gates NOT gates
Analog circuits
Amplifiers Filters
Devices Transistors Diodes
Physics Electrons
145/146/240/245
311
212
211
211/611
ELCT 371
330
CSCE 212 4
Domains and Levels of Modeling
high level of abstraction
FunctionalStructural
Geometric
low level of abstraction
“Y-chart” from Gajski & Kahn
CSCE 212 5
Domains and Levels of Modeling
Algorithm(behavioral)
Register-TransferLanguage
Boolean Equation
Differential Equation
FunctionalStructural
Geometric
“Y-chart” from Gajski & Kahn
CSCE 212 6
Domains and Levels of Modeling
Processor-MemorySwitch
Register-Transfer
Gate
Transistor
FunctionalStructural
Geometric
“Y-chart” from Gajski & Kahn
CSCE 212 7
Domains and Levels of Modeling
Polygons
Sticks
Standard Cells
Floor Plan
FunctionalStructural
Geometric
“Y-chart” from Gajski & Kahn
CSCE 212 8
Structure
CSCE 212 9
MIPS Microarchitecture
RTL (datapath)
fetch instruction
1. Address <= PC
2. MemRead
3. PC <= PC + 1
4. IR <= MemData
Control
fetch instruction
1. IorD = 0
2. MemRead = 1
3. PCEn = 1
ALUSrcA = 0
ALUSrcB = 01
ALUOp = ADD
PCSource = 01
4. IRWrite = 1
CSCE 212 10
Structure
CSCE 212 11
Logic Synthesis
• Behavior:– S = A + B– Assume A is
2 bits, B is 2 bits, C is 3 bits
A B C
00 (0) 00 (0) 000 (0)
00 (0) 01 (1) 001 (1)
00 (0) 10 (2) 010 (2)
00 (0) 11 (3) 011 (3)
01 (1) 00 (0) 001 (1)
01 (1) 01 (1) 010 (2)
01 (1) 10 (2) 011 (3)
01 (1) 11 (3) 100 (4)
10 (2) 00 (0) 010 (2)
10 (2) 01 (1) 011 (3)
10 (2) 10 (2) 100 (4)
10 (2) 11 (3) 101 (5)
11 (3) 00 (0) 011 (3)
11 (3) 01 (1) 100 (4)
11 (3) 10 (2) 101 (5)
11 (3) 11 (3) 110 (6)
)()(
))((
)()(
010011101012
010101100101012
010100011010101012
010101010101
0101010101012
BBABBAAAABBC
BBAABBAAAAAABBC
BBAAAABBAAAAAAABBC
BBAABBAABBAA
BBAABBAABBAAC
CSCE 212 12
Logic Gates
AY BAY
BAY
inv NAND2NAND3
NOR2
BAY
BAY
CSCE 212 13
Latches
Positive edge-sensitive latch
CSCE 212 14
Elements
CSCE 212 15
Semiconductors
• Silicon is a group IV element (4 valence electrons, shells: 2, 8, 18, 32…)– Forms covalent bonds with four neighbor atoms (3D cubic crystal lattice)– Si is a poor conductor, but conduction characteristics may be altered– Add impurities/dopants (replaces silicon atom in lattice):
• Makes a better conductor• Group V element (phosphorus/arsenic) => 5 valence electrons
– Leaves an electron free => n-type semiconductor (electrons, negative carriers)
• Group III element (boron) => 3 valence electrons– Borrows an electron from neighbor => p-type semiconductor (holes, positive carriers)
forward biasreverse bias
+ + +
+ + +
- - -
- - -P-N junction
+ -- ++ + +
+ + +
- - -
- - -
CSCE 212 16
MOSFETs
body/bulk
GROUND
NMOS/NFET PMOS/PFET
channelshorter length, faster transistor
(dist. for electrons)
body/bulk
HIGH
positive voltage (Vdd)
negative voltage (rel.
to body) (GND)
(S/D to body is reverse-biased)
- - - + + +
+ + + - - -
current current
• Metal-poly-Oxide-Semiconductor structures built onto substrate– Diffusion: Inject dopants into substrate– Oxidation: Form layer of SiO2 (glass)– Deposition and etching: Add aluminum/copper wires
CSCE 212 17
IC Fabrication
• Chips are fabricated using set of masks– Photolithography
• Basic steps– oxidize– apply photoresist– remove photoresist with mask– HF acid eats oxide but not
photoresist– pirana acid eats photoresist
– ion implantation (diffusion, wells)– vapor deposition (poly)– plasma etching (metal)
CSCE 212 18
Layout
3-input NAND
CSCE 212 19
Cell Library (Snap Together)
Layout
CSCE 212 20
Layout
CSCE 212 21
Synthesized and P&R’ed MIPS Architecture
CSCE 212 22
IC Fabrication
CSCE 212 23
8” Wafer
• 8 inch (200 mm) wafer containing Pentium 4 processors– 165 dies, die area = 250 mm2, 55 million transistors, .18m
CSCE 212 24
Another 8” Wafer
CSCE 212 25
Feature Size
• Shrink minimum feature size…– Smaller L decreases carrier time and increases current– Therefore, W may also be reduced for fixed current
– Cg, Cs, and Cd are reduced
– Transistor switches faster (~linear relationship)
CSCE 212 26
Minimum Feature Size
Year Processor Speed Transistors Process
1982 i286 6 - 25 MHz ~134,000 1.5 m
1986 i386 16 – 40 MHz ~270,000 1 m
1989 i486 16 - 133 MHz ~1 million .8 m
1993 Pentium 60 - 300 MHz ~3 million .6 m
1995 Pentium Pro 150 - 200 MHz ~4 million .5 m
1997 Pentium II 233 - 450 MHz ~5 million .35 m
1999 Pentium III 450 – 1400 MHz ~10 million .25 m
2000 Pentium 4 1.3 – 3.8 GHz ~50 million .18 m
2005 Pentium D 2 cores/package ~200 million .09 m
2006 Core 2 2 cores/die ~300 million .065 m
2008 Core i7 4 cores/die ~800 million .040 m
2010 “Sandy Bridge”
8 cores/die ?? .032 m
CSCE 212 27
Clock Speed
• Clock speed is affected by:– Fabrication technology– Architecture: how much work performed in a single cycle
• Execution time =– instructions per program * cycles per instruction * seconds per cycle
• Now we must add to the product:– (number of program threads / number of processor cores)
CSCE 212 28
Integration Density
Core 2 Duo (2007) has ~300M transistors
CSCE 212 29
Integration Density
CSCE 212 30
Microprocessor Technology
• Advances in fabrication (lithography, photoresist, metal layers)
• …faster transistor switching (faster processor)
• …smaller transistors/wires
• …higher integration density
• …more “real estate”
• …architectural improvements!
CSCE 212 31
Microarchitectural Parallelism
• Parallelism => perform multiple operations simultaneously– Instruction-level parallelism
• Execute multiple instructions at the same time• Multiple issue• Out-of-order execution• Speculation• Branch prediction
– Thread-level parallelism (hyper-threading)• Execute multiple threads at the same time on one CPU• Threads share memory space and pool of functional units
– Chip multiprocessing• Execute multiple processes/threads at the same time on multiple CPUs• Cores are symmetrical and completely independent but share a common
level-2 cache
Top Related