Precision Timed Embedded Systems Using TickPAD Memory

81
Precision Timed Embedded Systems Using TickPAD Memory Matthew M Y Kuo* Partha S Roop* Sidharta Andalam Nitish Patel* *University of Auckland, New Zealand TUM CREATE, Singapore

description

Precision Timed Embedded Systems Using TickPAD Memory. Matthew M Y Kuo* Partha S Roop* Sidharta Andalam † Nitish Patel* *University of Auckland, New Zealand † TUM CREATE, Singapore. Introduction. Hard real time systems Need to meet real time deadlines - PowerPoint PPT Presentation

Transcript of Precision Timed Embedded Systems Using TickPAD Memory

Precision Timed Embedded Systems Using TickPAD Memory

Precision Timed Embedded Systems Using TickPAD MemoryMatthew M Y Kuo*Partha S Roop*Sidharta AndalamNitish Patel*

*University of Auckland, New ZealandTUM CREATE, Singapore

IntroductionHard real time systemsNeed to meet real time deadlinesCatastrophic events may occur when missedSynchronous execution approachGood for hard real time systemsDeterministicReactiveAids static timing analysisWell bounded programsNo unbounded loops or recursions

Synchronous LanguagesExecutes in logical timeTicksSample input computation emit outputSynchronous hypothesis Tick are instantaneousAssumes system is executes infinitely fastSystem is faster than environment responseWorst case reaction timeTime between two logical ticksLanguagesEsterel ScadePRET-CExtension to C

Synchronous LanguagesExecutes in logical timeTicksSample input computation emit outputSynchronous hypothesis Tick are instantaneousAssumes system is executes infinitely fastSystem is faster than environment responseWorst case reaction timeTime between two logical ticksLanguagesEsterel ScadePRET-CExtension to C

PRET-CLight-weight multithreading in CProvides thread safe memory accessC extension implemented as C macrosStatementMeaningReactiveInput IDeclares I as a reactive environment input ReactiveOutput ODeclares O as a reactive environment outputPAR(T1, . Tn)Synchronously executes n threads in parallel, where thread ti has a higher priority than ti+1EOTMarks the end of tick[weak] abort P when CPreempt p when c is true5

IntroductionPractical System require larger memoryNot all applications fit on on-chip memoryRequire memory hierarchy Processor memory gap

[1] Hennessy, John L., and David A. Patterson. Computer Architecture: A Quantitative Approach. San Francisco, CA: Morgan Kaufmann, 2011. IntroductionTraditional approachesCachesScratchpadsHowever, Scant research for memory architectures tailored for synchronous execution and concurrency.

CachesCPUMain MemoryCachesTraditionally CachesSmall fast piece of memoryTemporal localitySpatial localityHardware ControlledReplacement policy

CPUMain MemoryCacheCachesHard real time systemsNeeds to model the architectureCompute the WCRTCaches models Trade off between length of computation time and tightnessVery tight worse case estimate is not scalable

CPUMain MemoryCacheScratchpadScratchpad Memory (SPM)Software controlledStatically allocatedStatically or dynamically loadedRequires an allocation algorithme.g. ILP, Greedy

CPUMain MemorySPMScratchpadHard real time systemsEasy to compute tight the WCRTReduces the worst case performanceBalance between amount of reload points and overheadsMay perform worst than cache in the worst case performance

CPUMain MemorySPMTickPADCPUMain MemorySPMCacheGood at overall performanceHardware controlled

Good at worst case performanceEasy for fast and tight static analysis

TickPADCPUMain MemorySPMCacheGood at overall performanceHardware controlled

Good at worst case performanceEasy for fast and tight static analysis

TPMTickPADCPUMain MemoryTPMTickPAD MemoryTickPAD - Tick Precise Allocation DeviceMemory controllerHybrid between caches and scratchpadsHardware controlled featuresStatic software allocationTailored for synchronous languagesInstruction memory

TickPAD Design flow

PRET-Cint main() {init();PAR(t1,t2,t3);...}

void thread t1() {compute;EOT;compute;EOT;}

maint1t3t2PRET-Cint main() {init();PAR(t1,t2,t3);...}

void thread t1() {compute;EOT;compute;EOT;}

Computationmaint1t3t2PRET-Cint main() {init();PAR(t1,t2,t3);...}

void thread t1() {compute;EOT;compute;EOT;}

Spawn children threadsmaint1t3t2PRET-Cint main() {init();PAR(t1,t2,t3);...}

void thread t1() {compute;EOT;compute;EOT;}

End of tick Synchronization boundariesmaint1t3t2PRET-Cint main() {init();PAR(t1,t2,t3);...}

void thread t1() {compute;EOT;compute;EOT;}

Child thread terminatemaint1t3t2PRET-Cint main() {init();PAR(t1,t2,t3);...}

void thread t1() {compute;EOT;compute;EOT;}

Main thread resumemaint1t3t2PRET-C Execution

Timemaint1t3t2Sample inputsPRET-C Execution

maint1t3t2mainTimePRET-C Execution

maint1t3t2mainTimet1PRET-C Execution

maint1t3t2mainTimet1t2PRET-C Execution

maint1t3t2mainTimet1t2t2PRET-C Execution

maint1t3t2mainTimet1t2t2Emit OutputsPRET-C Execution

maint1t3t2mainTimet1t2t21 tick (reaction time)PRET-C Execution

maint1t3t2mainTimet1t2t2local tickAssumptions0x000x040x080x0C

4 Instructions1 Cache LineTakes 1 burst transfer from main memoryCache miss, takes 38 clock cycles [2]0x00Each instructions takes 2 cycles to executebufferBuffers are 1 cache line in size2. J. Whitham and N. Audsley. The Scratchpad Memory Management Unit for Microblaze: Implmentation, Testing, and Case Study. Technical Report YCS-2009-439, University of York, 2009. TickPAD - Overview

Spatial memory pipelineTo accelerate linear code TickPAD - Overview

Associative loop memoryFor predictable temporal locality Statically allocated and Dynamically loaded

TickPAD - Overview

Tick address queueStores the resumptions address of active threads

TickPAD - Overview

Tick instruction bufferStores the instructions at the resumption of the next active threadTo reduce context switching overhead at state/tick boundaries

TickPAD - Overview

Command tableStores a set of commands to be executed by the TickPAD controller. TickPAD - Overview

Command bufferA buffer to store operands fetched from main memory Command requiring 2+ operands TickPAD - OverviewSpatial Memory PipelineCache on missFetches from main memory on to cacheFirst instruction miss, subsequence instructions on that line hitsRequires history of cache needed for timing analysisScratchpad unallocatedExecutes from main memoryMiss cost for all instructionsSimple timing analysisSpatial Memory PipelineMemory controllerSingle line buffer Simple analysisAnalyse previous instructionFirst instruction miss, subsequence instructions on that line hits

CPUMain MemorySpatial Memory PipelineComputation required many lines of instructions

Exploit spatial localityPredictability prefetch the next line of instructionsAdd another buffer

Spatial Memory PipelineTo preserve determinismPrefetch only active if no branch

Spatial Memory Pipeline

Spatial Memory Pipeline

Spatial Memory Pipeline

Spatial Memory Pipeline

Spatial Memory Pipeline

Spatial Memory Pipeline

Spatial Memory Pipeline

Spatial Memory Pipeline

Spatial Memory Pipeline

Spatial Memory Pipeline

Spatial Memory PipelineTiming analysisSimple to analyseAnalysis next instruction lineIf has a branch next target line will misse.g. 38 clock cycles Else will be prefetchede.g. 38 8 = 30 clock cycles

Spatial Memory PipelineTiming analysisSimple to analyseAnalysis next instruction lineIf has a branch next target line will misse.g. 38 clock cycles Else will be prefetchede.g. 38 8 = 30 clock cycles

Spatial Memory PipelineTiming analysisSimple to analyseAnalysis next instruction lineIf has a branch next target line will misse.g. 38 clock cycles Else will be prefetchede.g. 38 8 = 30 clock cyclesTick Address QueueTick Instruction BufferReduce cost of context switchingMaintains a priority queueThread execution orderPrefetches instructions from next threadMake context switching points appear as linear codePaired using Spatial Memory PipelineTick Address QueueTick Instruction Buffer

Tick Address QueueTick Instruction Buffer

Tick Address QueueTick Instruction Buffer

Tick Address QueueTick Instruction Buffer

Tick Address QueueTick Instruction Buffer

Tick Address QueueTick Instruction Buffer

Context switching memory cost same as linear code

Tick Address QueueTick Instruction Buffer

Tick Address QueueTick Instruction Buffer

Tick Address QueueTick Instruction Buffer

Tick Address QueueTick Instruction Buffer

Tick Address QueueTick Instruction Buffer

Timing analysisSame prefetch lines for allocated context switching points

Associative Loop MemoryStatically AllocatedGreedyAllocates inner most look firstFetches Loop Before ExecutingPredictable easy and tight to modelExploits temporal locality

Command TableStatically AllocatedA Look Up table to dynamically loadTick Instruction BufferTick QueueAssociative Loop MemoryCommand are executed when the PC matches the address stored on the commandAllows the TickPAD to function without modification to source codeLibrariesPropriety programs

Command TableThree fieldsAddressThe PC address to execute the commandCommandDiscard Loop Associative MemoryStore Loop Associative MemoryFill Tick Instruction BufferLoad Tick Address QueueOperandData used by the command

Command Table AllocationNodeCommandAddressFORKLoad Tick Address Queue x NFill Tick Instruction BufferAddress of FORKEOTLoad Tick Address QueueFill Tick Instruction BufferAddress of EOTKILLFill Tick Instruction BufferAddress of KillLoopsDiscard Loop Associative MemoryStore Loop Associative MemoryAddress at start of LoopCommand Table Allocation

NodeCommandAddressFORKLoad Tick Address Queue x NFill Tick Instruction BufferAddress of FORKEOTLoad Tick Address QueueFill Tick Instruction BufferAddress of EOTKILLFill Tick Instruction BufferAddress of KillLoopsDiscard Loop Associative MemoryStore Loop Associative MemoryAddress at start of Loop72Command Table AllocationNodeCommandAddressFORKLoad Tick Address Queue x NFill Tick Instruction BufferAddress of FORKEOTLoad Tick Address QueueFill Tick Instruction BufferAddress of EOTKILLFill Tick Instruction BufferAddress of KillLoopsDiscard Loop Associative MemoryStore Loop Associative MemoryAddress at start of Loop

73Command Table AllocationNodeCommandAddressFORKLoad Tick Address Queue x NFill Tick Instruction BufferAddress of FORKEOTLoad Tick Address QueueFill Tick Instruction BufferAddress of EOTKILLFill Tick Instruction BufferAddress of KillLoopsDiscard Loop Associative MemoryStore Loop Associative MemoryAddress at start of Loop

74Command Table AllocationNodeCommandAddressFORKLoad Tick Address Queue x NFill Tick Instruction BufferAddress of FORKEOTLoad Tick Address QueueFill Tick Instruction BufferAddress of EOTKILLFill Tick Instruction BufferAddress of KillLoopsDiscard Loop Associative MemoryStore Loop Associative MemoryAddress at start of Loop

75Results

Results

WCRT reduction8.5% Locked SPMs 12.3% Thread multiplexed SPM13.4% Direct Mapped CachesResults

Results - Synthesis

ConclusionPresented a new memory architectureTailored for synchronous programsHas better worst case performance Analysis time is scalableBetween scratchpad and abstract cache analysisThe presented architecture is also suitable for other synchronous languagesFuture workData TickPADTickPAD on multicoresThank YouTickPAD Allocation Analysis

TickPAD Timing Analysis

TCCFG

PRET-CProgram

Worst Case Reaction Time

Graph Construction

ReachabilityAnalysis

TickPAD Configuration File

Updated TCCFG

1

2

3

0x3100x3200x330

0x3B0

6

7

Toggle

Main Memory

TAG

Brach Instruction Check

TAG

Instruction[32]

ADDR[TAG]

ADDR[Block Offset]

Tick FIFO

Control Logic

WriteEn

Associative Loop Memory

Spatial Memory Pipeline

Demux

Demux

Demux

Demux

Demux

SMP Buffer 1

SMP Buffer 2

Command Buffer

hasBranch

clk

Address[32]

Toggle

Main Memory

TAG

Brach Instruction Check

TAG

Instruction[32]

ADDR[TAG]

ADDR[Block Offset]

Tick FIFO

Control Logic

WriteEn

Associative Loop Memory

Spatial Memory Pipeline

Demux

Demux

Demux

Demux

Demux

SMP Buffer 1

SMP Buffer 2

Command Buffer

hasBranch

clk

Address[32]

0x3100x3200x330

0x3B0

6

7

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

0x3100x3200x330

0x3B0

6

7

Toggle

Main Memory

TAG

Brach Instruction Check

TAG

Instruction[32]

ADDR[TAG]

ADDR[Block Offset]

Tick FIFO

Control Logic

WriteEn

Associative Loop Memory

Spatial Memory Pipeline

Demux

Demux

Demux

Demux

Demux

SMP Buffer 1

SMP Buffer 2

Command Buffer

hasBranch

clk

Address[32]

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

0x3100x3200x330

0x3B0

6

7

Toggle

Main Memory

TAG

Brach Instruction Check

TAG

Instruction[32]

ADDR[TAG]

ADDR[Block Offset]

Tick FIFO

Control Logic

WriteEn

Associative Loop Memory

Spatial Memory Pipeline

Demux

Demux

Demux

Demux

Demux

SMP Buffer 1

SMP Buffer 2

Command Buffer

hasBranch

clk

Address[32]

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

0x3100x3200x330

0x3B0

6

7

Toggle

Main Memory

TAG

Brach Instruction Check

TAG

Instruction[32]

ADDR[TAG]

ADDR[Block Offset]

Tick FIFO

Control Logic

WriteEn

Associative Loop Memory

Spatial Memory Pipeline

Demux

Demux

Demux

Demux

Demux

SMP Buffer 1

SMP Buffer 2

Command Buffer

hasBranch

clk

Address[32]

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

0x3100x3200x330

0x3B0

6

7

Toggle

Main Memory

TAG

Brach Instruction Check

TAG

Instruction[32]

ADDR[TAG]

ADDR[Block Offset]

Tick FIFO

Control Logic

WriteEn

Associative Loop Memory

Spatial Memory Pipeline

Demux

Demux

Demux

Demux

Demux

SMP Buffer 1

SMP Buffer 2

Command Buffer

hasBranch

clk

Address[32]

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

0x3100x3200x330

0x3B0

6

7

Toggle

Main Memory

TAG

Brach Instruction Check

TAG

Instruction[32]

ADDR[TAG]

ADDR[Block Offset]

Tick FIFO

Control Logic

WriteEn

Associative Loop Memory

Spatial Memory Pipeline

Demux

Demux

Demux

Demux

Demux

SMP Buffer 1

SMP Buffer 2

Command Buffer

hasBranch

clk

Address[32]

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

0x3100x3200x330

0x3B0

6

7

Toggle

Main Memory

TAG

Brach Instruction Check

TAG

Instruction[32]

ADDR[TAG]

ADDR[Block Offset]

Tick FIFO

Control Logic

WriteEn

Associative Loop Memory

Spatial Memory Pipeline

Demux

Demux

Demux

Demux

Demux

SMP Buffer 1

SMP Buffer 2

Command Buffer

hasBranch

clk

Address[32]

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

0x3100x3200x330

0x3B0

6

7

Toggle

Main Memory

TAG

Brach Instruction Check

TAG

Instruction[32]

ADDR[TAG]

ADDR[Block Offset]

Tick FIFO

Control Logic

WriteEn

Associative Loop Memory

Spatial Memory Pipeline

Demux

Demux

Demux

Demux

Demux

SMP Buffer 1

SMP Buffer 2

Command Buffer

hasBranch

clk

Address[32]

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

Toggle

Main Memory

TAG

Brach Instruction Check

TAG

Instruction[32]

ADDR[TAG]

ADDR[Block Offset]

Tick FIFO

Control Logic

WriteEn

Associative Loop Memory

Spatial Memory Pipeline

Demux

Demux

Demux

Demux

Demux

SMP Buffer 1

SMP Buffer 2

Command Buffer

hasBranch

clk

Address[32]

0x3100x3200x330

0x3B0

6

7

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

Toggle

Main Memory

TAG

Brach Instruction Check

TAG

Instruction[32]

ADDR[TAG]

ADDR[Block Offset]

Tick FIFO

Control Logic

WriteEn

Associative Loop Memory

Spatial Memory Pipeline

Demux

Demux

Demux

Demux

Demux

SMP Buffer 1

SMP Buffer 2

Command Buffer

hasBranch

clk

Address[32]

0x3100x3200x330

0x3B0

6

7

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

0x320

0x330

0x310

0x3B0

0x320

0x330

0x320

Disabled

0x330

0x3B0

Linear Code

Branch

Execute Buffer

Fetch Buffer

Fetching

Processor Execution

Fetching

Fetching

0x310

Stall

Stall

Stall

Execute Buffer

Fetch Buffer

Processor Execution

0x2B0

0x2C0

0x2B0

0x2C0

0x2C0

Fetching

Disabled

Empty

0x2F0

Stall

0x300

0x2F0

0x300

Fetching

Fetching

Stall

0x300

Tick Instruction Buffer

Stall

Invaild

0x310

0x310

Fetching

0x500

Fetching

0x310

Stall

Fetching

0x4F0

Stall

0x4F0

Disabled

Invaild

Stall

Tick Address Queue

PC = 2B0

PC = 2C0

*980

*4F0

*2F0

i2F0

PC = 2C0

*980

*4F0

i2F0

PC = 310

*310

*980

*4F0

i4F0

PC = 310

*310

*980

Tick Instruction Buffer

I

II

III

IV

V

i2F0

*980

*4F0

PC = 2F0

VI

i4F0

PC = 4F0

*310

*980

VII

0x2F00x300

0x310

0x2C0

0x2A00x2B0

0x4F00x5000x510

0x520

0x9800x9900x9A0

2

3

4

5

22

23

28

0x310

6

Execute Buffer

Fetch Buffer

Processor Execution

0x2B0

0x2C0

0x2B0

0x2C0

0x2C0

Fetching

Disabled

Empty

0x2F0

Stall

0x300

0x2F0

0x300

Fetching

Fetching

Stall

0x300

Tick Instruction Buffer

Stall

Invaild

0x310

0x310

Fetching

0x500

Fetching

0x310

Stall

Fetching

0x4F0

Stall

0x4F0

Disabled

Invaild

Stall

Tick Address Queue

PC = 2B0

PC = 2C0

*980

*4F0

*2F0

i2F0

PC = 2C0

*980

*4F0

i2F0

PC = 310

*310

*980

*4F0

i4F0

PC = 310

*310

*980

Tick Instruction Buffer

I

II

III

IV

V

i2F0

*980

*4F0

PC = 2F0

VI

i4F0

PC = 4F0

*310

*980

VII

0x2F00x300

0x310

0x2C0

0x2A00x2B0

0x4F00x5000x510

0x520

0x9800x9900x9A0

2

3

4

5

22

23

28

0x310

6

Execute Buffer

Fetch Buffer

Processor Execution

0x2B0

0x2C0

0x2B0

0x2C0

0x2C0

Fetching

Disabled

Empty

0x2F0

Stall

0x300

0x2F0

0x300

Fetching

Fetching

Stall

0x300

Tick Instruction Buffer

Stall

Invaild

0x310

0x310

Fetching

0x500

Fetching

0x310

Stall

Fetching

0x4F0

Stall

0x4F0

Disabled

Invaild

Stall

Tick Address Queue

PC = 2B0

PC = 2C0

*980

*4F0

*2F0

i2F0

PC = 2C0

*980

*4F0

i2F0

PC = 310

*310

*980

*4F0

i4F0

PC = 310

*310

*980

Tick Instruction Buffer

I

II

III

IV

V

i2F0

*980

*4F0

PC = 2F0

VI

i4F0

PC = 4F0

*310

*980

VII

0x2F00x300

0x310

0x2C0

0x2A00x2B0

0x4F00x5000x510

0x520

0x9800x9900x9A0

2

3

4

5

22

23

28

0x310

6

Execute Buffer

Fetch Buffer

Processor Execution

0x2B0

0x2C0

0x2B0

0x2C0

0x2C0

Fetching

Disabled

Empty

0x2F0

Stall

0x300

0x2F0

0x300

Fetching

Fetching

Stall

0x300

Tick Instruction Buffer

Stall

Invaild

0x310

0x310

Fetching

0x500

Fetching

0x310

Stall

Fetching

0x4F0

Stall

0x4F0

Disabled

Invaild

Stall

Tick Address Queue

PC = 2B0

PC = 2C0

*980

*4F0

*2F0

i2F0

PC = 2C0

*980

*4F0

i2F0

PC = 310

*310

*980

*4F0

i4F0

PC = 310

*310

*980

Tick Instruction Buffer

I

II

III

IV

V

i2F0

*980

*4F0

PC = 2F0

VI

i4F0

PC = 4F0

*310

*980

VII

0x2F00x300

0x310

0x2C0

0x2A00x2B0

0x4F00x5000x510

0x520

0x9800x9900x9A0

2

3

4

5

22

23

28

0x310

6

Execute Buffer

Fetch Buffer

Processor Execution

0x2B0

0x2C0

0x2B0

0x2C0

0x2C0

Fetching

Disabled

Empty

0x2F0

Stall

0x300

0x2F0

0x300

Fetching

Fetching

Stall

0x300

Tick Instruction Buffer

Stall

Invaild

0x310

0x310

Fetching

0x500

Fetching

0x310

Stall

Fetching

0x4F0

Stall

0x4F0

Disabled

Invaild

Stall

Tick Address Queue

PC = 2B0

PC = 2C0

*980

*4F0

*2F0

i2F0

PC = 2C0

*980

*4F0

i2F0

PC = 310

*310

*980

*4F0

i4F0

PC = 310

*310

*980

Tick Instruction Buffer

I

II

III

IV

V

i2F0

*980

*4F0

PC = 2F0

VI

i4F0

PC = 4F0

*310

*980

VII

0x2F00x300

0x310

0x2C0

0x2A00x2B0

0x4F00x5000x510

0x520

0x9800x9900x9A0

2

3

4

5

22

23

28

0x310

6

Execute Buffer

Fetch Buffer

Processor Execution

0x2B0

0x2C0

0x2B0

0x2C0

0x2C0

Fetching

Disabled

Empty

0x2F0

Stall

0x300

0x2F0

0x300

Fetching

Fetching

Stall

0x300

Tick Instruction Buffer

Stall

Invaild

0x310

0x310

Fetching

0x500

Fetching

0x310

Stall

Fetching

0x4F0

Stall

0x4F0

Disabled

Invaild

Stall

Tick Address Queue

PC = 2B0

PC = 2C0

*980

*4F0

*2F0

i2F0

PC = 2C0

*980

*4F0

i2F0

PC = 310

*310

*980

*4F0

i4F0

PC = 310

*310

*980

Tick Instruction Buffer

I

II

III

IV

V

i2F0

*980

*4F0

PC = 2F0

VI

i4F0

PC = 4F0

*310

*980

VII

0x2F00x300

0x310

0x2C0

0x2A00x2B0

0x4F00x5000x510

0x520

0x9800x9900x9A0

2

3

4

5

22

23

28

0x310

6

Execute Buffer

Fetch Buffer

Processor Execution

0x2B0

0x2C0

0x2B0

0x2C0

0x2C0

Fetching

Disabled

Empty

0x2F0

Stall

0x300

0x2F0

0x300

Fetching

Fetching

Stall

0x300

Tick Instruction Buffer

Stall

Invaild

0x310

0x310

Fetching

0x500

Fetching

0x310

Stall

Fetching

0x4F0

Stall

0x4F0

Disabled

Invaild

Stall

Tick Address Queue

PC = 2B0

PC = 2C0

*980

*4F0

*2F0

i2F0

PC = 2C0

*980

*4F0

i2F0

PC = 310

*310

*980

*4F0

i4F0

PC = 310

*310

*980

Tick Instruction Buffer

I

II

III

IV

V

i2F0

*980

*4F0

PC = 2F0

VI

i4F0

PC = 4F0

*310

*980

VII

0x2F00x300

0x310

0x2C0

0x2A00x2B0

0x4F00x5000x510

0x520

0x9800x9900x9A0

2

3

4

5

22

23

28

0x310

6

Execute Buffer

Fetch Buffer

Processor Execution

0x2B0

0x2C0

0x2B0

0x2C0

0x2C0

Fetching

Disabled

Empty

0x2F0

Stall

0x300

0x2F0

0x300

Fetching

Fetching

Stall

0x300

Tick Instruction Buffer

Stall

Invaild

0x310

0x310

Fetching

0x500

Fetching

0x310

Stall

Fetching

0x4F0

Stall

0x4F0

Disabled

Invaild

Stall

Tick Address Queue

PC = 2B0

PC = 2C0

*980

*4F0

*2F0

i2F0

PC = 2C0

*980

*4F0

i2F0

PC = 310

*310

*980

*4F0

i4F0

PC = 310

*310

*980

Tick Instruction Buffer

I

II

III

IV

V

i2F0

*980

*4F0

PC = 2F0

VI

i4F0

PC = 4F0

*310

*980

VII

0x2F00x300

0x310

0x2C0

0x2A00x2B0

0x4F00x5000x510

0x520

0x9800x9900x9A0

2

3

4

5

22

23

28

0x310

6

Execute Buffer

Fetch Buffer

Processor Execution

0x2B0

0x2C0

0x2B0

0x2C0

0x2C0

Fetching

Disabled

Empty

0x2F0

Stall

0x300

0x2F0

0x300

Fetching

Fetching

Stall

0x300

Tick Instruction Buffer

Stall

Invaild

0x310

0x310

Fetching

0x500

Fetching

0x310

Stall

Fetching

0x4F0

Stall

0x4F0

Disabled

Invaild

Stall

Tick Address Queue

PC = 2B0

PC = 2C0

*980

*4F0

*2F0

i2F0

PC = 2C0

*980

*4F0

i2F0

PC = 310

*310

*980

*4F0

i4F0

PC = 310

*310

*980

Tick Instruction Buffer

I

II

III

IV

V

i2F0

*980

*4F0

PC = 2F0

VI

i4F0

PC = 4F0

*310

*980

VII

0x2F00x300

0x310

0x2C0

0x2A00x2B0

0x4F00x5000x510

0x520

0x9800x9900x9A0

2

3

4

5

22

23

28

0x310

6

310

6

Tick Address Queue

PC = 2B0

PC = 2C0

*980

*4F0

*2F0

i2F0

PC = 2C0

*980

*4F0

i2F0

PC = 310

*310

*980

*4F0

i4F0

PC = 310

*310

*980

Tick Instruction Buffer

I

II

III

IV

V

i2F0

*980

*4F0

PC = 2F0

VI

i4F0

PC = 4F0

*310

*980

VII

Execute Buffer

Fetch Buffer

Processor Execution

0x2B0

0x2C0

0x2B0

0x2C0

0x2C0

Fetching

Disabled

Empty

0x2F0

Stall

0x300

0x2F0

0x300

Fetching

Fetching

Stall

0x300

Tick Instruction Buffer

Stall

Invaild

0x310

0x310

Fetching

0x500

Fetching

0x310

Stall

Fetching

0x4F0

Stall

0x4F0

Disabled

Invaild

Stall

0x2F00x300

0x310

0x2C0

0x2A00x2B0

0x4F00x5000x510

0x520

0x9800x9900x9A0

2

3

4

5

22

23

28

0x310

6

0x2F00x300

0x310

0x2C0

0x2A00x2B0

0x4F00x5000x510

0x520

0x9800x9900x9A0

2

3

4

5

22

23

28

0x2F00x300

0x310

0x4F00x5000x510

0x520

0x9800x9900x9A0

0x9B0

4

5

22

23

28

29

35

0x944

0x4B4

0xAE4

36

37

38

0x3B0

0x3A0

0x3300x340

0x390

0x390

0x3B0

7

9

31

10

11

8