1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107...
-
Upload
mattie-blount -
Category
Documents
-
view
222 -
download
3
Transcript of 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107...
![Page 1: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/1.jpg)
1ECE369
ECE369A: Fundamentals of Computer Architecture
ECE 369A MWF 10:00 AM - 10:50 AM in ECE107
Instructor Teaching Assistant
Name: Ali Akoglu Shuai Chang
Office: ECE 356-B ECE232
Phone: (520) 626-5149
Email: [email protected] [email protected]
Office Hours:
Mondays, Wednesdays 13:00 – 14:00 or by appointment
![Page 2: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/2.jpg)
2ECE369
General Policies, Tips
• D. A. Patterson and J.L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, Fifth Edition, Morgan Kaufmann Publishers.
• Prerequisite: ECE274 & Programming in C & Programming in Veriolg/VHDL
• Course involves class (3 credits) and laboratory (1 credit) activities.
• Class activities will involve 3 to 4 assignments, 4 examinations, and a semester long project.
• Laboratory activities will accompany the topics covered in the lectures with hands on programming based assignments. A coordinated sequence of lab assignments will help students design, program, debug, and evaluate specific components of a datapath using the Xilinx ISE and a FPGA board.
• Laboratory activities (programming and debugging) and assignments (15) will help students build the foundation for the semester long datapath-design based project.
• Final project will require students implement a synthesizable datapath based on a given instruction set architecture and evaluate its performance on a specific FPGA platform.
![Page 3: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/3.jpg)
3ECE369
General Policies, Tips
• NO LATE ASSIGNMENTS• Make-ups may be arranged prior to the scheduled activity. • Inquiries about graded material => within 3 days of receiving a
grade. • You are encouraged to discuss the assignment specifications
with your instructor, your teaching assistant, and your fellow students. However, anything you submit for grading must be unique and should NOT be a duplicate of another source.
• Read before the class• Participate and ask questions• Manage your time• Start working on assignments early
![Page 4: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/4.jpg)
4ECE369
Grading
Distribution of Components Grades Scale
Component Percentage Percentage Grade
Class Exercises 10 90-100% A
Laboratory 15 80-89% B
Exam-I 8 70-79% C
Exam-II 12 60-69% D
Exam-III 14
Exam-IV 16
Project 25
![Page 5: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/5.jpg)
5ECE369
Labs and Project
• Laboratory: – Labs1-17: 2000pts
• Project– Labs18-28 (Single cycle datapath): 2500pts– Competition: !!!!
![Page 6: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/6.jpg)
6ECE369
Announcements
• ECE232 LAB (Starts Jan 22)
– TA
• Assignments & Project
– Pairs only !!!
– Who is my partner? (Jan 27)
• Assignment-0 Due Jan 22!
• Lecture notes on the web
– http://ece.arizona.edu/~ece369/
![Page 7: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/7.jpg)
7ECE369
Objective
• Understand “how computer works”
• Appreciate the design tradeoffs
![Page 8: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/8.jpg)
8ECE369
Chapter 1
![Page 9: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/9.jpg)
9ECE369
Introduction
• This course is all about how computers work
• But what do we mean by a computer?
– Different types: desktop, servers, embedded devices
– Different uses: automobiles, graphics, finance, genomics…
– Different manufacturers: Intel, Apple, IBM, Microsoft, Sun…
– Different underlying technologies and different costs!
![Page 10: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/10.jpg)
10ECE369
First Microprocessor Intel 4004, 1971
• 4-bit accumulator architecture
• 8m pMOS
• 2,300 transistors
• 3 x 4 mm2
• 750kHz clock
• 8-16 cycles/inst.
![Page 11: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/11.jpg)
11ECE369 11[ Personal Computing Ad, 11/81]
Hardware•Team from IBM building PC prototypes in 1979•Motorola 68000 chosen initially, but 68000 was late•8088 is 8-bit bus version of 8086 => allows cheaper system•Estimated sales of 250,000•100,000,000s sold
![Page 12: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/12.jpg)
12ECE369
• Carried in two tractor trailers, 12 tons + 8 tons
• Built for US Army Signal Corps
DYSEAC, first mobile computer!
![Page 13: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/13.jpg)
13ECE369 13
Intel cancelled high performance uniprocessor,
joined IBM and Sun for multiple processors
Intel cancelled high performance uniprocessor,
joined IBM and Sun for multiple processorsHardware based
Hardware
and softw
are, ILP!
End of Uniprocessors
![Page 14: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/14.jpg)
14ECE369 14
Processor Technology Trends
• Shrinking of transistor sizes: 250nm (1997) 130nm (2002) 65nm (2007) 32nm (2010) 28nm(2011, AMD GPU, Xilinx FPGA) 22nm(2011, Intel Ivy Bridge, die shrink of the Sandy Bridge architecture)
• Transistor density increases by 35% per year and die size increases by 10-20% per year… more cores!
![Page 15: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/15.jpg)
15ECE369 15
Trends: Historical Perspective
![Page 16: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/16.jpg)
16ECE369 16
Power Consumption Trends
• Dyn power activity x capacitance x voltage2 x frequency
• Capacitance per transistor and voltage are decreasing, but number of transistors is increasing at a faster rate; hence clock frequency must be kept steady
• Leakage power is also rising
• Power consumption is already between 100-150W in high-performance processors today
3.3GHz Intel core i7: 130 watts
![Page 17: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/17.jpg)
17ECE369
Recent Microprocessor Trends
2004 2010
Source: Micron University Symp.
Transistors: 1.43x / year
Cores: 1.2 - 1.4x
Performance: 1.15x
Frequency: 1.05x
Power: 1.04x
![Page 18: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/18.jpg)
18ECE369
What you will learn
• How programs are translated into the machine language
– And how the hardware executes them
• Hardware and Software Interface
• Both Hardware and Software affect performance:
– Algorithm determines number of source-level statements
– Language/Compiler/Architecture determine machine instructions
– Processor/Memory determine how fast instructions are executed
• What is parallel processing
![Page 19: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/19.jpg)
19ECE369
Below Your Program
• Application software– Written in high-level language
• System software– Compiler: translates HLL code to machine code– Operating System: service code
• Handling input/output
• Managing memory and storage
• Scheduling tasks & sharing resources• Hardware
– Processor, memory, I/O controllers
![Page 20: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/20.jpg)
20ECE369
Abstraction
• Delving into the depths reveals more information
• An abstraction omits unneeded detail, helps us cope with complexity
![Page 21: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/21.jpg)
21ECE369
Instruction Set Architecture
• A very important abstraction
– interface between hardware and low-level software
– standardizes instructions, machine language bit patterns, etc.
– advantage: different implementations of the same architecture
– disadvantage: sometimes prevents using new innovations
• Modern instruction set architectures:
– IA-32, PowerPC, MIPS, SPARC, ARM, and others
![Page 22: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/22.jpg)
22ECE369
Sneak Peak (Lecture 18)
![Page 23: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/23.jpg)
23ECE369
• Measure, Report, and Summarize
• Make intelligent choices
• See through the marketing hype
• Key to understanding underlying organizational motivation
Why is some hardware better than others for different programs?
What factors of system performance are hardware related?(e.g., Do we need a new machine, or a new operating system?)
How does the machine's instruction set affect performance?
Performance
![Page 24: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/24.jpg)
24ECE369
Defining Performance
• Which airplane has the best performance?
0 100 200 300 400 500
DouglasDC-8-50
BAC/ SudConcorde
Boeing 747
Boeing 777
Passenger Capacity
0 2000 4000 6000 8000 10000
Douglas DC-8-50
BAC/ SudConcorde
Boeing 747
Boeing 777
Cruising Range (miles)
0 500 1000 1500
DouglasDC-8-50
BAC/ SudConcorde
Boeing 747
Boeing 777
Cruising Speed (mph)
0 100000 200000 300000 400000
Douglas DC-8-50
BAC/ SudConcorde
Boeing 747
Boeing 777
Passengers x mph
![Page 25: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/25.jpg)
25ECE369
• Response Time (latency)
— How long does it take for my job to run?
— How long does it take to execute a job?
— How long must I wait for the database query?
• Throughput
— How many jobs can the machine run at once?
— What is the average execution rate?
— How much work is getting done?
• If we upgrade a machine with a new processor what do we increase?
• If we add a new machine to the lab what do we increase?
Computer Performance: TIME, TIME, TIME
![Page 26: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/26.jpg)
26ECE369
• Elapsed Time
– counts everything (disk and memory accesses, I/O , etc.)
– a useful number, but often not good for comparison purposes
• CPU time
– doesn't count I/O or time spent running other programs
– can be broken up into system time, and user time
• Our focus: user CPU time
– time spent executing the lines of code that are "in" our program
Execution Time
![Page 27: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/27.jpg)
27ECE369
• For some program running on machine X,
PerformanceX = 1 / Execution timeX
• "X is n times faster than Y"
PerformanceX / PerformanceY = n
• Problem:
– machine A runs a program in 20 seconds
– machine B runs the same program in 25 seconds
Book's Definition of Performance
![Page 28: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/28.jpg)
28ECE369
Clock Cycles
• Instead of reporting execution time in seconds, we often use cycles
• clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)
seconds
program
cycles
program
seconds
cycle
Clock (cycles)
Data transferand computation
Update state
Clock period
![Page 29: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/29.jpg)
29ECE369
So, to improve performance (everything else being equal) you can either
(increase or decrease?)
________ the # of required cycles for a program, or
________ the clock cycle time or, said another way,
________ the clock rate.
How to Improve Performance
seconds
program
cycles
program
seconds
cycle
![Page 30: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/30.jpg)
30ECE369
• Could assume that number of cycles equals number of instructions
This assumption is incorrect,
different instructions take different amounts of time on different machines.
Why? hint: remember that these are machine instructions, not lines of C code
time
1st
inst
ruct
ion
2nd
inst
ruct
ion
3rd
inst
ruct
ion
4th
5th
6th ...
How many cycles are required for a program?
![Page 31: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/31.jpg)
31ECE369
• Multiplication takes more time than addition
• Floating point operations take longer than integer ones
• Accessing memory takes more time than accessing registers
• Important point: changing the cycle time often changes the number of cycles required for various instructions (more later)
time
Different numbers of cycles for different instructions
![Page 32: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/32.jpg)
32ECE369
• Our favorite program runs in 10 seconds on computer A, which has a 4 GHz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?"
• Don't Panic, can easily work this out from basic principles
Example
![Page 33: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/33.jpg)
33ECE369
Example
• Our favorite program runs in 10 seconds on computer A, which has a 4 GHz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?"
seconds
program
cycles
program
seconds
cycle
![Page 34: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/34.jpg)
34ECE369
• A given program will require
– some number of instructions (machine instructions)
– some number of cycles
– some number of seconds
• We have a vocabulary that relates these quantities:
– cycle time (seconds per cycle)
– clock rate (cycles per second)
– CPI (cycles per instruction)
a floating point intensive application might have a higher CPI
– MIPS (millions of instructions per second)
this would be higher for a program using simple instructions
Now we understand cycles
![Page 35: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/35.jpg)
35ECE369
Back to the Same Formula, CPI (Cycles/Instruction)
seconds
program
cycles
program
seconds
cycle
![Page 36: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/36.jpg)
36ECE369
CPI Example
• Suppose we have two implementations of the same instruction set architecture (ISA).
For some program,
Machine A has a clock cycle time of 250 ps and a CPI of 2.0 Machine B has a clock cycle time of 500 ps and a CPI of 1.2
What machine is faster for this program, and by how much?
seconds
program
cycles
program
seconds
cycle
![Page 37: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/37.jpg)
37ECE369
Let’s Complicate Things A Little bit…
A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively).
The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C
The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C.
What is the CPI for each sequence?
Which sequence will be faster? How much?
cycle
seconds
program
cycles
program
seconds
![Page 38: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/38.jpg)
38ECE369
Scary Stuff
Op Frequency Cycle Count
ALU 43% 1Load 21% 1Store 12% 2Branch 24% 2
Let’s say we were able to reduce the cycle count for “Store” operations to 1 with a cost of slowing our clock by15%. Is this new design feasible?
n
i
ii
n
iii
original CountnInstructio
ICCPI
CountnInstructio
ICCPI
CPI1
1
__
![Page 39: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/39.jpg)
39ECE369
Example(Contd.)
Old CPI = 0.43 + 0.21 + 0.12x2 + 0.24x2 = 1.38
New CPI = 0.43 + 0.21 + 0.12 + 0.24x2 = 1.24
Speed up = old time/new time = (ICx oldCPI x T)/(IC x newCPI x 1.15T) =0.97
so, don't make this change.
![Page 40: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/40.jpg)
40ECE369
Component Analysis
![Page 41: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/41.jpg)
41ECE369
IR optimization example
compile
C source code
unoptimized IR
![Page 42: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/42.jpg)
42ECE369
Constant folding
cfold
![Page 43: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/43.jpg)
43ECE369
Constant propagation
constprop
![Page 44: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/44.jpg)
44ECE369
Dead code elimination
dce
![Page 45: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/45.jpg)
45ECE369
What is MIPS?
• Instruction execution rate => higher is better
• Issues:
– Can not compare processors with different instruction sets
– Varies between programs on the same processor
– Can vary inversely with the performance… ?
![Page 46: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/46.jpg)
46ECE369
MIPS Example
![Page 47: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/47.jpg)
47ECE369
• Performance best determined by running a real application
– Use programs typical of expected workload
– Or, typical of expected class of applicationse.g., compilers/editors, scientific applications, graphics,
etc.
• Small benchmarks
– nice for architects and designers
– easy to standardize
– can be abused
• SPEC (System Performance Evaluation Cooperative)
– companies have agreed on a set of real program and inputs
– valuable indicator of performance (and compiler technology)
– can still be abused
Benchmarks
![Page 48: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/48.jpg)
48ECE369
Benchmark Games
• An embarrassed Intel Corp. acknowledged Friday that a bug in a software program known as a compiler had led the company to overstate the speed of its microprocessor chips on an industry benchmark by 10 percent. However, industry analysts said the coding error…was a sad commentary on a common industry practice of “cheating” on standardized performance tests…The error was pointed out to Intel two days ago by a competitor, Motorola …came in a test known as SPECint92…Intel acknowledged that it had “optimized” its compiler to improve its test scores. The company had also said that it did not like the practice but felt to compelled to make the optimizations because its competitors were doing the same thing…At the heart of Intel’s problem is the practice of “tuning” compiler programs to recognize certain computing problems in the test and then substituting special handwritten pieces of code…
Saturday, January 6, 1996 New York Times
![Page 49: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/49.jpg)
49ECE369
SPEC CPU2000
![Page 50: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/50.jpg)
50ECE369
Problems with Benchmarking
• Hard to evaluate real benchmarks:
– Machine not built yet, simulators too slow
– Benchmarks not ported
– Compilers not ready
Benchmark performance is composition of hardware and software (program, input, compiler, OS) performance, which must all be specified
![Page 51: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/51.jpg)
51ECE369
Compiler and Performance
Application Set
0
50
100
150
200
250
300
350
400
450
500
Xeon
3GHz
512K
B (-O
0)
B1.6GHz/1
MB, -O3 -
xB
B1.6GHz/1
MB, -O3-x
W
Xeon
3GHz
512K
B(-xW
)
Xeon
3GHz
512K
B(op
timize
d)
![Page 52: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/52.jpg)
52ECE369
• Performance is specific to a particular program
– Total execution time is a consistent summary of performance
• For a given architecture performance increases come from:
– increases in clock rate (without adverse CPI affects)
– improvements in processor organization that lower CPI
– compiler enhancements that lower CPI and/or instruction count
– Algorithm/Language choices that affect instruction count
• Pitfall: expecting improvement in one aspect of a machine’s
performance to affect the total performance
Remember
![Page 53: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/53.jpg)
53ECE369
Performance Measurement Overview
ClockCycle
Seconds
nInstructio
sClcokCycle
ogram
nsInstructio
ogram
SecondsCPUtime
PrPr
TimeCycleClockpogramtheforcyclesCPUclockCPUtime ______
RateClock
pogramtheforcyclesCPUclockCPUtime
_
____
IC
pogramtheforcyclesCPUclockCPI
____
TimeCycleClockCPIICCPUtime __
RateClock
CPIICCPUtime
_
![Page 54: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/54.jpg)
54ECE369
Performance Measurement Overview
TimeCycleClockICCPICPUtimen
iii __
1
n
i
ii
n
iii
CountnInstructio
ICCPI
CountnInstructio
ICCPI
CPIoverall1
1
___
n
iiiprogramtheforcyclesclock ICCPICPU
1____
![Page 55: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/55.jpg)
55ECE369
Exercise
Suppose we have made the following measurements:
• Frequency of FP operations = 25%
• Average CPI of FP operations = 4.0
• Average CPI of other instructions = 1.33
• Frequency of FPSQR = 2%
• CPI of FPSQR = 20
Assume that the two design alternatives are:
a) to reduce the CPI of FPSQR to 2 or
b) to reduce the average CPI of all FP operations to 2.
Compare these alternatives.
![Page 56: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/56.jpg)
56ECE369
Solution
n
i
ii
n
iii
original CountnInstructio
ICCPI
CountnInstructio
ICCPI
CPI1
1
__
0.2%7533.1%254
36.0)220(%2)(%2__ newFPSQRoldFPSQRFPSQRonSaved CPICPICPI
64.136.02_____ FPSQRonSavedoriginalFPSQRnewforoverall CPICPICPI
5.10.2%2533.1%75___ FPnewforoverallCPI
33.15.1
00.2
new
original
new
original
new
original
CPI
CPI
CPIClockCycleIC
CPIClockCycleIC
CPUTime
CPUTimeSpeedupFP
![Page 57: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/57.jpg)
57ECE369
Amdahl's Law
• The performance enhancement of an improvement is limited by how The performance enhancement of an improvement is limited by how much the improved feature is used. In other words: Don’t expect an much the improved feature is used. In other words: Don’t expect an enhancement proportional to how much you enhanced something.enhancement proportional to how much you enhanced something.
• Example:
"Suppose a program runs in 100 seconds on a machine, with multiply operations responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"
How about making it 5 times faster?
![Page 58: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/58.jpg)
58ECE369
Amdahl’s Law
2. Old execution time = 100
3. New execution time = 100/4 = 25
4. If 80 seconds is used by the affected part =>
5. Unaffected part = 100-80 = 20 sec
6. Execution time new = Execution time unaffected + Execution time affected / Improvement
7. 25= 20 + 80/Improvement
8. Improvement = 16
1. Speed up = 4
![Page 59: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/59.jpg)
59ECE369
Example: Speed up using parallel processors
Suppose an application is “almost all” parallel: 90%.What is the speedup using 10, 100, and 1000 processors?
new time = old time * 10% + ( old time * 90% ) / 10
Speed up (P=10) = old time / new time
Speedup (P=10) = 5.3
Speedup (P = 100 ) = 9.1 Speedup ( P = 1000 ) = 9.9
![Page 60: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/60.jpg)
60ECE369
Amdahl’s Law Overview
![Page 61: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/61.jpg)
61ECE369
Example
• Suppose we are considering an enhancement that runs 10times faster than the original machine but is only usable 40% of the time. What is the overall speedup gained by incorporating the enhancement?
56.16.0
10
4.01
Speedup
![Page 62: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/62.jpg)
62ECE369
Example
• Implementations of floating point square root vary significantly in performance. Suppose FP square root (FPSQR) is responsible for 20% of the execution time of a critical benchmark on am machine, One proposal is to add FPSQR hardware that will speed up this operation by a factor of 10. The other alternative is just to try to make all FP instructions run faster; FP instructions are responsible for a total of 50% of the execution time. The design team believes that they can make all FP instructions run two times faster with the same effort as required for the fast square root. Compare those two design alternatives.
22.1)2.01(
10
2.01
FPSQRSpeedup
33.1)5.01(
25.0
1
SpeedupFP
![Page 63: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/63.jpg)
63ECE369
Multiprocessors
• Multicore microprocessors
– More than one processor per chip
• Requires explicitly parallel programming
– Compare with instruction level parallelism
• Hardware executes multiple instructions at once
• Hidden from the programmer– Hard to do
• Programming for performance
• Load balancing
• Optimizing communication and synchronization
![Page 64: 1 ECE369 ECE369A: Fundamentals of Computer Architecture ECE 369A MWF 10:00 AM - 10:50 AM in ECE107 InstructorTeaching Assistant Name:Ali AkogluShuai Chang.](https://reader034.fdocuments.us/reader034/viewer/2022042518/551a8a6a550346b52d8b5adf/html5/thumbnails/64.jpg)
64ECE369
Hardware/Software Tradeoffs
• Which operations are directly supported in “hardware” and which are synthesized in software?
• How much hardware should you employ to speed up certain functions?
Hardware Software
Advantages Speed, Flexibility,
Consistency easier/faster design
lower cost of errors
Disadvantages Cost
No flexibility Slowness