Superscalar Architecture_AIUB

24
American International University-Bangladesh (AIUB) Presenter Nusrat Irin Chowdhury Mary Superscalar Architecture

description

Faster microprocessor design presentation in American International University-Bangladesh (AIUB). Presentation was taken under the subject "SELECTED TOPICS IN ELECTRICAL AND ELECTRONIC ENGINEERING (PROCESSOR AND DSP HARDWARE DESIGN WITH SYSTEM VERILOG, VHDL AND FPGAS) [MEEE]", as a final semester student of M.Sc at AIUB.

Transcript of Superscalar Architecture_AIUB

Page 1: Superscalar Architecture_AIUB

American International University-Bangladesh (AIUB)

Presenter

Nusrat Irin Chowdhury Mary

Superscalar Architecture

Page 2: Superscalar Architecture_AIUB

Superscalar Architecture (SSA) describes a microprocessor design that execute more than one instruction at a time during a single clock cycle .

In a SSA design, the processor or the instruction compiler is able to determine whether an instruction can be carried out independently of other sequential instructions, or whether it has a dependency on another instruction and must be executed sequentially.

The design is sometimes called “Second Generation RISC”. Another term used to describe superscalar processors is multiple instruction issue processors.

2

Superscalar Architecture

Page 3: Superscalar Architecture_AIUB

In a SSA, several scalar instructions can be initiated simultaneously and executed independently.

A long series of innovations aimed at producing ever-faster microprocessors.

Includes all features of pipelining but, in addition, there can be several instructions executing simultaneously in the same pipeline stage.

SSA introduces a new level of parallelism, called instruction-level parallelism.

3

Superscalar Architecture cont’d

Page 4: Superscalar Architecture_AIUB

In Superscalar CPU Architecture implementation of Instruction Level Parallelism (ILP) within a single processor allows faster CPU at a given clock rate.

A superscalar processor executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to functional units.

Each functional unit is not a separate CPU core but an execution resource within a single CPU such as an arithmetic logic unit, a bit shifter, or a multiplier. 4

Superscalar CPU Architecture

Page 5: Superscalar Architecture_AIUB

SimpleScalar is an open source computer architecture simulator which is written using ‘C’ programming language.

A set of tools that model a virtual computer system with CPU, Cache and Memory Hierarchy.

Using the tool, users can model applications that simulate programs running on a range of modern processors and systems.

The tool set includes sample simulators ranging from a fast functional simulator to a detailed. 5

SimpleScalar Architecture

Page 6: Superscalar Architecture_AIUB

The simplest processors are scalar processors. Each instruction executed by a scalar processor typically manipulates one or two data items at a time.

In a superscalar CPU the dispatcher reads instructions from memory and decides which one can be run in parallel.

Therefore a superscalar processor can be proposed having multiple parallel pipelines, each of which is processing instructions simultaneously from a single instruction thread. 6

Scalar to Superscalar

Page 7: Superscalar Architecture_AIUB

Pipelining is the process of breaking down task into substeps and executing them in different parts of processor.

In order to fully utilise a superscalar processor of degree m with pipelining, m instructions must be executable parallely. This situation may not be true in all clock cycles.

In a superscalar processor, the simple operation latency should require only one cycle, as in the base scalar processor.

7

Pipelining in Superscalar Architecture

Page 8: Superscalar Architecture_AIUB

A SSA processor fetches multiple instructions at a time, and attempts to find nearby instructions that are independent of each other and therefore can be executed in parallel.

Based on the dependency analysis, the processor may issue and execute instructions in an order that differs from that of the original machine code.

The processor may eliminate some unnecessary dependencies by the use of additional registers.

8

Implement Superscalar

Page 9: Superscalar Architecture_AIUB

I: Instructions from 1 to corresponding sequences

Fetch: fetches instructions from memory (ideally one per cycle for Scalar)Decode: reveals instruction operations to be performed and identifies the resources neededExecute: actual processing of operations as indicated by instructionStore (Write Back): writing results into the registers.

9

Superscalar with Scalar Instructions Flow

Page 10: Superscalar Architecture_AIUB

10

Effect of Dependencies

ADD r1, r2 (r1 := r1+r2;)

MOVE r3,r1 (r3 := r1;)

Can fetch and decode second instruction in parallel with first

Can NOT execute second instruction until fi rst is finished

Page 11: Superscalar Architecture_AIUB

11

General Superscalar Organization

Page 12: Superscalar Architecture_AIUB

12

Superscalar Operational Block Diagram

Page 13: Superscalar Architecture_AIUB

13

Instruction Flow in Superscalar Architecture

Page 14: Superscalar Architecture_AIUB

Superpipelining is based on dividing the stages into several sub-stages, and thus increasing the number of instructions which are handled by the pipeline at the same time.

For example, by dividing each stage into two sub-stages, a pipeline can perform at twice the speed in the ideal situation:Tasks that require less than half a clock cycle.No duplication of hardware is needed for these

stages.

14

Superpipelining

Figure: Duplication of

hardware is for Superscalar

Page 15: Superscalar Architecture_AIUB

Base machine: 4-stage pipeline• Instruction fetch• Operation decode• Operation execution• Result write backSuperpipeline of degree 2• A sub-stage often takes

half a clock cycle to finish.

Superscalar of degree 2• Two instructions are

executed.• Duplication of hardware

is required by definition.15

Superscalar vs. Superpipeline

Page 16: Superscalar Architecture_AIUB

16

Superpipelined Superscalar

Superpipeline of degree 3 and superscalar of degree 4:• 12 times speed-up over the

base machine.• 48 times speedup over

sequential execution.This is a new trend of architecture design: • Pentium Pro(P6): 3-degree

superscalar, 12-stage “superpipeline”.

• PowerPC 620: 4-degree superscalar, 4/6-stage pipeline.

Page 17: Superscalar Architecture_AIUB

A Pipeline architecture in

more detail

17

Instruction Flow

A Superscalar microarchitecture

Page 18: Superscalar Architecture_AIUB

18

Instruction Execution

Page 19: Superscalar Architecture_AIUB

Tasks can be divided into the following Parallel decoding Superscalar instruction issue Parallel instruction execution

preserving sequential consistency of exception processing

preserving sequential consistency of execution

19

Superscalar Issues to Consider

Page 20: Superscalar Architecture_AIUB

Parallel decoding – more complex task for scalar processors.

Superscalar instruction issue – A higher issue rate gives rise to higher processor performance, but amplifies the restrictive effects of control and data dependencies on the processor performance.

Parallel instruction execution task – While instructions are executed in parallel, instructions are usually completed out of order in respect to a sequential operating procedure. 20

Superscalar Issues to Consider

Page 21: Superscalar Architecture_AIUB

21

Two way Superscalar Execution

Figure: Two instruction parallel execution in one clock.

Page 22: Superscalar Architecture_AIUB

Instruction-fetch ineffi ciencies caused by both branch delays and instruction misalignment

not worthwhile to explore highly- concurrent execution hardware, rather, it is more appropriate to explore economical execution hardware

degree of intrinsic parallelism in the instruction stream (instructions requiring the same computational resources from the CPU)

complexity and time cost of the dispatcher and associated dependency checking logic

branch instruction processing.22

Limitations of Superscalar

Page 23: Superscalar Architecture_AIUB

[1] Zebo Peng. (2010). General format, “Superscalar Architecture – IDA”, Lecture 3 Introduction to Paral lel Architectures. [2] Wil l iam M. Johnson, “Super-Scalar Processor Design”, Stanford University Since 1989

[3] James E. Smith (1995), “The Microarchitecture of Superscalar Processors”, senior member IEEE [4] Subbarao Palacharla, “Complexity-Eff ective Superscalar Processors”, UNIVERSITY OF WISCONSIN—MADISON, Since 1998 [5] Alaa Alameldeen and Haitham Akkary, "The Microarchitecture of Superscalar Processors", Port land State University, s ince 2014

[5] http://www.techterms.com/defi nit ion/superscalar

[6] https://www.google.com.bd/search?q=superscalar+archit ecture&newwindow=1&tbm=isch&tbo=u&source=univ&sa=X&ei =Ll1oU9umJNWHuASgwoDACA&ved=0CDUQsAQ&biw=1048&bih=921

23

Citation

Page 24: Superscalar Architecture_AIUB