CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20%...

32
CS425 Computer Systems Architecture Fall 2019 Introduction CS425 - Vassilis Papaefstathiou 1

Transcript of CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20%...

Page 1: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

CS425Computer Systems Architecture

Fall 2019

Introduction

CS425 - Vassilis Papaefstathiou 1

Page 2: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Outline

• Logistics

• CPU Evolution

• Course goal (what is Computer Architecture?)

CS425 - Vassilis Papaefstathiou 2

Page 3: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Course Information

• Elective course in Hardware and Computer Systems (E4)• 6 ECTS

• Prerequisite: CS225 Computer Organization

• Instructors:• Dr. Vassilis Papaefstathiou ([email protected])

• Prof. Manolis Katevenis ([email protected])

• Teaching Assistants:• Mr. Sotiris Totomis ([email protected])

• Mr. Iasonas Mastorakis ([email protected])

• Lectures:• Monday 16:15 – 18:00 (H.206)

• Wednesday 16:15 – 18:00 (H.206)

• Friday 16:15 – 18:00 (H.206) backup slot when needed

CS425 - Vassilis Papaefstathiou 3

Page 4: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Grading

• Homeworks & Programming Assignments: 35%• Mandatory

• Average Grade > 4.5

• Midterm Exam: 20% (mandatory)

• Final Exam: 45% (grade > 4.5)

CS425 - Vassilis Papaefstathiou 4

Page 5: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Course Material

• Website:• http://www.csd.uoc.gr/~hy425

• Mailing List: • [email protected] (subscribe with majordomo)

• Textbook:• Hennessy and Patterson, Computer Architecture,

A Quantitative Approach. 3rd Edition Available in Greek (Tziolas publishers, translation by D. Pnevmatikatos, D. Serpanos and G. Stamoulis). ISBN 97896041807693

CS425 - Vassilis Papaefstathiou 5

Page 6: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Tentative Schedule

• Fundamentals, metrics, pipelining (1.5 weeks)

• Instruction Level Parallelism (2 weeks)

• Branch prediction (1 week)

• Multiple issue, VLIW, vector, multithreading (2 weeks)

• Memory hierarchy, caches and optimizations (2.5 weeks)

• Multicore processors, cache coherence (2 weeks)

• Main memory technologies (1 week)

• Advanced topics (1 week)

CS425 - Vassilis Papaefstathiou 6

Page 7: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

History in Computer Devices

• EDSAC, University of Cambridge, UK, 1949-1958 (mercury-based memory, logic, punched tape, teleprinter, EDSAC2 1965)

CS425 - Vassilis Papaefstathiou 7

Page 8: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Computing Systems Today

• The world is a large parallel system• Microprocessors everywhere

• Vast infrastructure behind them

CS425 - Vassilis Papaefstathiou 8

’70: microproc. & supercomputers

‘80: compilers, OS, RISC, x86

‘90: Internet, WWW, PDA

‘00: mobile, cell phones,

embedded cpus

‘10: internet of things (IoT)

MEMS for

Sensor Nets

Internet

Connectivity

Clusters

Massive Cluster

Gigabit Ethernet

RobotsRouters

Cars

Sensor

Nets

Refr

igera

tors

Page 9: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Improvement in Computer

• Radical progress in computers due to:• Technological improvements (next few slides)

o steady

• Better computer architectures (course focus)o less consistent

CS425 - Vassilis Papaefstathiou 9

Page 10: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Source Drain

Technology: Transistor Revolution

CS425 - Vassilis Papaefstathiou 10

Intel 4004, 1971(Moore, Noyce Intel 1968)

4-bit2,300 transistors740KHz operation10μm (=10000nm) PMOS technology

Bell Labs, 1948First Transistor

Intel Core i7, 201164-bit 2,600,000,000 transistors3.4GHz32nm

Page 11: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Technology: Moore’s Law

• In 1965, Gordon Moore predicted that the number of transistors that can be integrated on a die would double every 18 months (i.e., grow exponentially with time)

• He made a prediction that semiconductor technology will double its effectiveness every 18 months

• In practice a new technology is introduced every ~two years, with feature sizes of circuit layout 70% of the previous technology

CS425 - Vassilis Papaefstathiou 11

Page 12: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Technology: Transistor Count

CS425 - Vassilis Papaefstathiou 12

4004:

2.3K transistors

10-core Xeon:

2.6B transistors

28-core Xeon Platinum: 8 B

Apple A13: 8.5 B

Nvidia Volta V100 GPU: 21 B

32-core AMD Epyc: 19 B

64-core AMD Epyc Rome: 32 B

Page 13: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Technology constantly on the move

• Number of transistors is not the limiting factor⎻ Currently ~5+ billion transistors/chip ⎻ Problems: power, heat, latency

• 3-dimensional chip technology?⎻ Sandwiches of silicon (Package on Package)⎻ “Through-silicon Vias” TSVs for communication ⎻ FinFET

• On-chip optical connections?⎻ Power savings for large packets

• Intel Core i7 (“Ivy Bridge”)⎻ 4 cores + GPU⎻ 22 nm, tri-gate (“3D”) transistors⎻ 1.4B Transistors ⎻ Shared L3 Cache - 8MB ⎻ L2 Cache - 1MB (256K x 4) , L1 – 64KB/core

CS425 - Vassilis Papaefstathiou 13

Page 14: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Technology vs microarchitecture: Intel CPU evolution

• Kaby Lake optimization step (14nm)

• Intel announced March 2016: Process-Architecture-Optimization

• CannonLake upcoming 10 nm successor (process step)

• Ice Lake 10 nm (architecture step)CS425 - Vassilis Papaefstathiou 14

tick tock tick tock tick(2006) tock tick (2008) tock tick(2010)

tock tick(2012) tock tick(2014) tock opt(2016)10nm

process(2017)

Page 15: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Transistor size trends and questions

• Feature sizes, higher performance?⎻ Transistor size went down from 10 micros to 14 nanometers

⎻ Quadratic increase in density, linear drop in feature size

⎻ Linear increase in transistor performance

• Where is the catch?⎻ Smaller voltage reduction to maintain safe operation

⎻ Higher resistance and capacitance per unit of length

⎻ Shorter wires but with higher resistance/capacitance

⎻ Wire delays improving poorly compared to transistors

CS425 - Vassilis Papaefstathiou 15

Page 16: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Limiting Force: Power Density

Microprocessor Size ≈ 2 cm2

CS425 - Vassilis Papaefstathiou 16

Page 17: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Crossroads: Uniprocessor Performance

CS425 - Vassilis Papaefstathiou 17

Constrained by power, instruction level parallelism, memory latency

Move to multicore

Page 18: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Trends – All in one

CS425 - Vassilis Papaefstathiou 18

Page 19: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Trends – All in one

CS425 - Vassilis Papaefstathiou 19

Page 20: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

The End of the Uniprocessor Era

• Power wall: power expensive, transistors free ⎻ can put more on chip than can afford to turn on

• ILP wall: law of diminishing returns on more HW for ILP

• Memory wall: Memory slow, multiplies fast ⎻ 200 clock cycles to DRAM memory vs. 4 clocks for multiply

• Power Wall + ILP Wall + Memory Wall = Brick Wall⎻ Uniprocessor performance now 2X every 5(?) years

Single biggest change in the history of computing systems

CS425 - Vassilis Papaefstathiou 20

Page 21: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Many Core Chips: The future is here

• “Many Core” refers to many processors/chip⎻ 64? 128? Hard to say exact

boundary

• How to program these?⎻ Use 2 CPUs for video/audio

⎻ Use 1 for word processor, 1 for browser

⎻ 76 for virus checking???

• Something new is clearly needed here…

CS425 - Vassilis Papaefstathiou 21

Intel 80-core multicore chip, 2007, 65nm – 100M transistors

Intel Many Integrated Core Architecture (MIC), 50-cores, 2012, 22nm, commercial

Intel Single-Chip Cloud Computer (SCC), 48-cores, 2010, 4 memory controllers, 24-router mesh

Page 22: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

What is Computer Architecture

• In its broadest definition, computer architecture is the design of theabstraction layers that allow us to implement information processing applications efficiently using available manufacturing technologies.

CS425 - Vassilis Papaefstathiou 22

Application

Physics

Gap too large to

bridge in one step

Page 23: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Abstraction Layers in Modern Systems

Algorithm

Gates/Register-Transfer Level (RTL)

Application

Instruction Set Architecture (ISA)

Operating System/Virtual Machine

Microarchitecture

Devices

Programming Language

Circuits

Physics

Original domain

of the computer

architect

(‘50s-’80s)

Domain of recent

computer

architecture (‘90s)

Reliability,

power, …

Parallel computing,

security, …

Reinvigoration of

computer architecture,

mid-2000s onward.

CS425 - Vassilis Papaefstathiou 23

Page 24: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Computer Architecture is an Integrated Approach

• What really matters is the functioning of the complete system ⎻ hardware, runtime system, compiler, operating system, and application

⎻ In networking, this is called the “End to End argument”

• Computer architecture is not just about transistors, individual instructions, or particular implementations⎻ E.g., Original RISC projects replaced complex instructions with a compiler +

simple instructions

• It is very important to think across all hardware/software boundaries⎻ New technology New Capabilities New Architectures New Tradeoffs

⎻ Delicate balance between backward compatibility and efficiency

CS425 - Vassilis Papaefstathiou 24

Page 25: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Defining Computer Architecture (ISA)

Instruction Set Architecture

• ISAs converged to a common RISC paradigm⎻ CISC ISAs implemented on RISC pipelines

• Load-store architectures, general-purpose registers

• Aligned memory addressing, simple addresing modes

• Byte, word, double-word, quad-word operands

• Arithmetic, logic, control operations

• Fixed-length encoding

CS425 - Vassilis Papaefstathiou 25

Page 26: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Example: MIPS R30000r0

r1°°°r31

PC

Programmable storage

232 x bytes

31 x 32-bit GPRs (R0=0)

32 x 32-bit FP regs (paired DP)

PC

Data types ?

Format ?

Addressing Modes?

Arithmetic logical Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU, AddI, AddIU, SLTI, SLTIU, AndI, OrI, XorI, LUISLL, SRL, SRA, SLLV, SRLV, SRAVMUL, DIV

Memory AccessLB, LBU, LH, LHU, LW, LWL,LWRSB, SH, SW, SWL, SWR

ControlJ, JAL, JR, JALRBEq, BNE, BLEZ, BGTZ, BLTZ, BGEZ, BLTZAL, BGEZAL

32-bit instructions on word boundary

CS425 - Vassilis Papaefstathiou 26

Page 27: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

ISA vs Computer Architecture

• Old definition of computer architecture == instruction set design ⎻ Other aspects of computer design called implementation

⎻ Suggests that implementation is uninteresting or less challenging

• Computer architecture >> ISA

• Architect’s job much more than instruction set design; technical hurdles today more challenging than those in instruction set design

CS425 - Vassilis Papaefstathiou 27

Page 28: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Defining Computer Architecture

Architecture = ISA (+prog. lang.) + Organization + Hardware

• Processor Architecture⎻ Pipelining, hazards, ILP, HW/SW interface

• Memory hierarchies

• Interconnects

• I/O systems

• Hardware technology used (e.g. component size)

• Computer architecture focuses on organization and quantitative principles of design

CS425 - Vassilis Papaefstathiou 28

Page 29: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Execution is not just about HW and ISA• The VAX fallacy

⎻ Produce one instruction for every high-level concept

⎻ Absurdity: Polynomial Multiply

o Single hardware instruction

o But Why? Is this really faster???

• RISC Philosophy⎻ Full System Design

⎻ Hardware mechanisms viewed in context of complete system

⎻ Cross-boundary optimization

• Modern programmer does not see assembly language⎻ Many do not even see “low-level”

languages like “C”.

CS425 - Vassilis Papaefstathiou 29

Hardware

Application Binary

Library Services

OS Services

Hypervisor

Linker

Program

Libraries

Source-to-SourceTransformations

Compiler

Page 30: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Computer Architecture Topics

Instruction Set Architecture

Pipelining, Hazard Resolution,Superscalar, Reordering, Prediction, Speculation,Vector, Dynamic Compilation

Addressing,Protection,Exception Handling

L1 Cache

L2 Cache

DRAM

Disks,Tape

Coherence,Bandwidth,Latency

Emerging TechnologiesInterleavingBus protocols

RAID

VLSI

Input/Output and Storage

Memory

Hierarchy

Pipelining and Instruction

Level Parallelism

Network

Communication

Oth

er

Pro

cessors

CS425 - Vassilis Papaefstathiou 30

Page 31: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Executive Summary

The processor

you built in

HY225

What you’ll

understand

after taking

HY425

Also, the

technology behind

multi-core

processors

CS425 - Vassilis Papaefstathiou 31

Page 32: CS425 Computer Systems Architecturehy425/2019f/lectures/HY425_L1_Intro.pdf · •Midterm Exam: 20% ... •Architect’s job much more than instruction set design; technical hurdles

Next Lecture: Major Design Challenges

• Power

• CPU time

• Memory latency/bandwidth

• Storage latency/bandwidth

• Transactions per second

• Intercommunication

• Dependability

CS425 - Vassilis Papaefstathiou 32

Everything Looks a Little Different