MODULE 1 - Universiti Teknologi Malaysia · 2015. 9. 10. · Computer Architecture Computer...
Transcript of MODULE 1 - Universiti Teknologi Malaysia · 2015. 9. 10. · Computer Architecture Computer...
-
R E F E R E N C E : D A V I D A . P A T T E R S O N & J O H N L . H E N N E S S Y – C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N
Module 1a: Organization & Architecture:
Structure & Function
MODULE 1 (Overview & Computer Performance)
1
-
Computer Architecture Computer Organization
Refers as a set of attributes of a system as seen by programmer
Deals with all physical components of computer systems that interacts with each other to perform various functionalities
The lower level of computer organization is known as microarchitecture which is more detailed and concrete.
Comp. Architecture vs Comp. Organization 2
The instruction set
The number of bits to
represent data types
I/O mechanisms
memory addressing techniques
Control signals
Interfaces between computer and
peripherals
The memory
technology being
used
-
Computer Architecture Computer Organization
The difference between architecture and organization is best described by a non-computer example.
“Is the gear level in a motorcycle part of it is architecture or organization?
The architecture of a motorcycle is simple; it transports you from A to B. The gear level belongs to the
motorcycle's organization because it implements the function of a motorcycle but is not part of that function”
Comp. Architecture vs Comp. Organization
3
-
Computer Architecture Computer Organization
Refers to those attributes visible to the programmer
Refers to the how features are implemented
Comp. Architecture vs Comp. Organization
Can we multiply 2 numbers?
Yes, we can multiply
How to multiply.
4
-
The Computer Family
Many computer manufacturers offer a family of computer models, all with the same architecture but with differences in organization.
All Intel x86 family share the same basic architecture
The IBM System/370 architecture first introduced in 1970 included a number of models that share the same basic architecture and has survived to this day as the architecture of IBM’s mainframe product line.
The newer models retained the same architecture so that the customer’s software investment was protected (code compatibility)
5
-
6
Register different
growing fast….
-
same
architecture
differences in
organization
7
-
Structure and Function
A computer is a complex system with a hierarchical system of interrelated subsystems with different levels.
At each level, the designer is concerned with structure and function:
Structure: The way in which the components are interrelated.
Function: The operation of each individual component as part of the structure.
8
-
Structure
4 main structural
components
Central processing unit (CPU)
Main memory
I/O
System interconnection
Controls the operation of the
computer and performs its data
processing functions
Stores data
Moves data between the
computer and its external
environment
Mechanism for
communication among
CPU, main memory, and
I/O
9
-
Structure: CPU
CPU
Control unit
Arithmetic and logic unit
(ALU)
Registers
CPU interconnection
Controls the operation
of the CPU
Performs the
computer’s data
processing
functions Provides storage
internal to the
CPU
Mechanism for
communication among the
control unit, ALU, and
registers
10
-
THE COMPUTER: TOP-LEVEL STRUCTURE
Computer
Main
Memory
Input
Output
Systems
Interconnection
Peripherals
Communication
lines
Central
Processing
Unit
Computer
11
-
FUNCTION
Functions
• process data in variety of forms and requirements Data Processing
• short and long term data storage for retrieval and update Data storage
• move data between computer and outside world. Data movement
• control of process, move and store data using instruction.
Control
There are only four functions
How are functions performed?
Through PROGRAMS
12
-
Program
A sequence of steps
For each step, a computer function is executed
For each operation, a different/new set of control signals is needed
For each operation a unique code (instruction) is provided e.g. ADD, MOVE
A hardware segment accepts the code and issues the control signals
13
-
Executing A Program
Approach 1: Hardwired program
connecting/combining various logic components to store data and perform arithmetic and logic operations
Hardwired systems are inflexible
14
-
Executing A Program
Approach 2: Software
General purpose hardware can do different tasks, given correct control signals
Instead of re-wiring, supply a new set of control signals through instruction codes
15
-
R E F E R E N C E : W I L L I A M S T A L L I N G S – C O M P U T E R O R G A N I Z A T I O N & A R C H I T E C T U R E ( C H : P E N T I U M E V O L U T I O N )
Computer Evolution
16
-
Von Neumann Machine
1945: stored-program concept first implemented for EDVAC (Electronic Discrete Variable Computer).
Key concepts:
Data and instructions are stored in a single read-write memory.
The contents of this memory are addressable by location, without regard to the type of data contained there
Execution occurs in a sequential fashion from one instruction to the next
17
-
Structure of von Neumann machine 18
-
Microprocessors (µP) Intel
Microprocessor : all CPU components on a single chip
1971 - 4004
First microprocessor
4 bit
Followed in 1972 by 8008
8 bit
Both designed for specific applications
1974 - 8080
Intel’s first general purpose microprocessor
Designed to be the CPU of a general purpose microcomputer
19
-
Intel µP Evolution ..
8080 first general purpose microprocessor
8 bit data path
Used in first personal computer – Altair
8086 much more powerful
16 bit
instruction cache, prefetch few instructions
8088 (8 bit external bus) used in first IBM PC
80286 16 MB memory addressable
80386 First 32 bit design
Support for multitasking- run multiple programs at the same time
20
-
.. Intel µP Evolution ..
80486 sophisticated powerful cache and instruction pipelining
built in maths co-processor
Pentium Superscalar technique - multiple instructions executed in parallel
Pentium Pro Increased superscalar organization
Aggressive register renaming
branch prediction
data flow analysis
speculative execution
21
-
.. Intel µP Evolution
Pentium II MMX technology graphics, video & audio processing
Pentium III Additional floating point instructions for 3D graphics
Pentium 4 Further floating point and multimedia enhancements
Itanium 64 bit
Core Duo starts of a multi core processor
22
-
Intel Evolution
23
-
R E F E R E N C E : D A V I D A . P A T T E R S O N & J O H N L . H E N N E S S Y – C O M P U T E R O R G A N I Z A T I O N A N D D E S I G N
Module 1b: Understanding & Measuring
Performance
24
-
Introduction
Hardware performance is often key to the effectiveness of an entire system of hardware and software.
For different types of applications, different performance metrics may be appropriate, and different aspects of a computer systems may be the most significant factor in determining overall performance.
Understanding how best to measure performance and limitations of performance is important when selecting a computer system
To understand the issues of assessing performance. Why a piece of software performs as it does?
Why one instruction set can be implemented to perform better than another?
How some hardware feature affects performance?
25
-
Why measure performance?
Performance is important!
Identify HW/SW performance problems
Comparisons:
Which machine is faster?
Which ISA is better?
Which implementation (of an ISA) is faster?
Expose significant performance issues (enable us to ignore unimportant issues)
26
-
More than one way to measure performance
Performance is evaluated differently by different entity.
Better performance means faster processing speed (e.g. faster completion of a task/job)
Better performance means higher throughput (doing more jobs in a time given)
Better performance means doing more jobs at a smaller cost
27
-
Which plane has better performance?
If higher throughput (transporting
more passengers) is better
performance If higher speed
is better
performance
If better performance
means having a long
range
28
-
Understanding terminology
Execution time (a.k.a response time) :The total time it takes from start to completion of a task
Throughput :The total amount of tasks completed in a given time interval
CPU execution time (a.k.a CPU time) :The actual time CPU spends on a specific task
User CPU time: time the CPU spends on running the actual program
System CPU time: time the CPU spends on OS overhead on behalf of the program
Clock cycle (a.k.a ticks, cycle) :Discrete time intervals (the processor clock which runs at a constant rate). Usually in nanoseconds (ns) or picoseconds (ps)
29
-
Understanding terminology
Clock period (a.k.a clock cycle time): the duration of one clock cycle. In sec, or msec
Clock rate (or frequency) : the speed that the microprocessor executes each instruction or each vibration of the clock. In MHz/GHz. Frequency = 1/clock period
1 MHz representing 1 million cycles per second,
1 GHz representing 1 thousand million cycles per second (109)
Clock cycles per instruction (CPI) : The average number of clock cycles each instruction takes to execute
30
-
Figure 1
Figure 2
1 cycle time =
how length of
this clock cycle
31
-
Common performance metrics
MB/s, Mb/s: Megabytes, Megabits Per Second
MIPS: Millions of Instructions Per Second
CPI: Clock Cycles Per Instruction
IPC: Instructions Per Clock cycle
Hz: (processor clock frequency) cycles Per Second
LIPS: Logical Interference Per Second
FLOPS: Floating-Point arithmetic Operations Per Second
32
-
Computer performance measures
Performance is related to execution time.
To maximize performance, we want to minimize the execution time
If performance of Computer A is 10 times better than Computer B, what is the relation between their execution times?
This shows that CompB
needs 10x more time than
CompA to execute a given
task.
33
-
CPU Execution Time 34
Clock period = 1
frequency
If a processor has frequency, 320 MHz:
Clock period = 1 = 3.125ns
320 000 000
Clock rate = frequency (Hz)
“the frequency at which a CPU is
running. It is measured in Hz unit”
-
Example 1: Improving Performance
Our favorite program runs in 10 seconds on computer A, which has a 4 GHz clock. Computer B will run this program in 6 seconds, given that computer B requires 1.2 times as many clock cycles as computer A for this program. What is computer B’s clock rate? Answer: 8Ghz
What do we know?
Computer A
CPU Execution Time = 10s
Clock rate (CR) = 4GHz = 4 x 109 Hz
Computer B
CPU Execution Time = 6s
Clock cycle (CC) = 1.2 x clock cycle Computer A
35
-
Example 1: Improving Performance
What do we know?
Computer A
CPU Execution Time = 10s
Clock rate (CR) = 4GHz = 4 x 109 Hz
Computer B
CPU Execution Time = 6s
Clock cycle (CC) = 1.2 x clock cycle Computer A
36
-
Clock Cycles per Instruction (CPI)
Previously, our calculations of Execution time did not include the number of instructions needed for the program.
Different instructions may take different amounts of time to execute, depending on what they do
Example: The MOV (Move) instruction – moving data from one place to another
37
-
The MOV instruction : Analogy
Analyze Conrad’s movement of putting the red balls into the container.
Balls from prime storage:
-walk
-fetch ball
-walk (halfway)
-walk
-put ball in container
Total = 5
Balls from sub storage:
-fetch ball
-walk
-put ball in container
Total = 3
To do 5 movements takes
longer to execute than 3
38
-
CPU Execution Time
a.k.a Instruction count
CPU clock cycle
39
-
Example 2: Using Performance Equation
Suppose we have two implementations of the same instruction set architecture (ISA) and for the same program. Which computer is faster and by how much?
Computer A: clock cycle time=250 ps and CPI=2.0
Computer B: clock cycle time=500 ps and CPI=1.2
Note: because both computer uses the same program, and the Instruction Count is not given, we can assume it to be a variable I
Remember the formula
40
-
Example 2: Using Performance Equation
Remember: the lower the
execution time, the better the
performance.
Computer A is faster
How much faster is Computer A?
41
-
Example 2 (continued)…
We can conclude, A is 1.2
times faster than B for
this program
42
-
Measuring the CPI
Sometimes it is possible to compute the CPU clock cycles by looking at the different types of instructions and using their individual clock cycle counts
Ci = count of the number of instructions of class i executed
CPIi = average number of cycles per instruction for that instruction class
n = number of instruction classes
Remember that overall CPI for a program will depend on both the number of cycles for each instruction type and the frequency of each instruction type in the program execution
43
-
Sample: Calculate CPI
You are on the design team for a new processor. The clock of the processor runs at 200 MHz. The following table gives instruction frequencies for Benchmark B, as well as how many cycles the instructions take, for the different classes of instructions. For this problem, we assume that (unlike many of today's computers) the processor only executes one instruction at a time.
If we say that there are 100 instructions, then:
30 of them will be loads and stores.
50 of them will be arithmetic instructions.
20 of them will be all others.
Formula: (30 * 6) + (50 * 4) + (20 * 3) = 440 cycles/100 instructions = 4.4 cycles per instruction
44
Instruction Type Frequency Cycles
Loads & Stores 30% 6 cycles
Arithmetic Instructions 50% 4 cycles
All Others 20% 3 cycles
-
Factors Affecting the CPU Performance 45
-
46
-
Example 3 : Comparing Code Segments
A compiler designer is trying to decide between two code sequences for a particular computer. The hardware designers have supplied the following facts:
For a particular high-level-language statement, the compiler writer is considering two code sequence that require the following instruction counts:
a) Which code sequence executes the most instructions?
b) Which will be faster?
c) What is the CPI for each sequence?
47
Example: code segments
-
Example 3 : Part (a)
Sequence 1 executes 2 + 1+ 2 = 5 Instructions
Sequence 2 executes 4 + 1+ 1 = 6 Instructions
Seq 2 executes THE MOST instructions
48
-
Example 3 : Part (b)
Using this equation
Takes 10 cycles to execute
5 instructions
Takes 9 cycles to execute
6 instructions Seq 2 is FASTER
49
-
Example 3 : Part (c)
Code SEQ2 uses fewer
clock cycles, it must
have a lower CPI
50
-
Example 4 : Comparing Code Segments
A processor has 3 classes of instructions:
Which code sequence is faster?
Instruction CPI Code SEQ1
Code SEQ2
Clock cycles SEQ1
Clock cycles SEQ2
A 1 5 3 5 3
B 2 3 2 6 4
C 5 1 2 5 10
9 ins. 16 clock
cycles
7 ins. 17 clock
cycles
Code SEQ1 Takes 16 cycles
to execute 9 instructions
Code SEQ2 Takes 17 cycles
to execute 7 instructions
Code SEQ1 is FASTER
Recall 51
-
Example 4a: Calculating with CPI
The ADD instruction takes 1 clock cycle to execute, while the MUL instruction takes 3 clock cycles. If a program consists of 20 ADD and 10 MUL instructions, what is the average CPI?
What do we know?
Instruction Clock cycles
Instruction count
ADD 1 20
MUL 3 10
There are 2 instructions
52
-
Example 4a: Calculating average CPI
Instruction Clock cycles
Instruction count
ADD 1 20
MUL 3 10
53
-
Homework (to make you cleverer )
Instruction Instructions
count
Clock
Cycles
a) CPI b) Execution time
A 20 3
B 25 1
C 10 2
D 30 2
E 10 3
F 5 4
CPU X runs a program/code sequence Y which consists of 100 instructions. Calculate and fill in the table below: a) The CPI for each instruction class given below. b) The execution time for each instruction class, given a clock
cycle time is 0.25miliseconds. c) The CPU X’s execution time d) The CPU X’s clock rate
55
-
Did you get the same???
Instruction Instruction
count Clock Cycles
a) CPI b) Execution
time
clock cycle time
0.25
A 20 3 0.15 0.75
B 25 1 0.04 0.25
C 10 2 0.2 0.5
D 30 2 0.07 0.5
E 10 3 0.3 0.75
F 5 4 0.8 1
15 3.75
56
(a) (b)
(c)
(d) Clock rate = ∑clock cycles = 15 = 4 ∑ execution time 3.75
-
Increasing the CPU Performance
Decreasing the clock cycle time
Datapath organization leading to lower CPI
Reduction in the number of executed instructions.
58
-
Example 5: Improve Performance
Our favourite program runs in 20 seconds on Computer P, which has 8 GHz clock. We are trying to help a computer designer build Computer Q that will run this program in 5 seconds. The designer has determined that the substantial increase in the clock rate is possible, but this will affect the rest of the CPU design, causing computer Q to require 1.5 times as many clock cycles as computer P for this program. What clock rate should we tell the designer to target?
What do we know?
Computer P
CPU Execution Time = 20s
Clock rate (CR) = 8GHz = 8 x 109 Hz
Computer Q
CPU Execution Time = 5s
Clock cycle (CC) = 1.5 x clock cycle Computer P
59
-
What do we know?
Computer P
CPU Execution Time = 20s
Clock rate (CR) = 8GHz = 8 x 109 Hz
Computer Q
CPU Execution Time = 5s
Clock cycle (CC) = 1.5 x clock cycle Computer P
60
-
Mandatory Homework
Do Tutorial Module 1 (e-learning).. It is COMPULSORY for my class!
Submission date will be announced.
63
-
Understanding the Units
CPU execution time for a program = Seconds for the program (S/P)
Clock cycle = clock cycles per program (C/P)
Clock cycle time = Seconds per clock cycle (S/C)
Clock rate = clock cycle per second (C/S)
Instruction count = Instructions executed for the program (I/P)
Clock cycle per instruction = Average number of clock cycles per instructions (C/I)
64
-
Understanding the Units
It cancels each other to
give the unit.
Example:
10s = 20cycle/ clock rate
Clock rate = 20/10 cycle per seconds = 2Hz
1 Hz is 1 cycle per second
65