CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... ·...
Transcript of CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... ·...
![Page 1: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/1.jpg)
CPU Modeling and Use for CPU Modeling and Use for Embedded SystemsEmbedded Systems
Lecturer: Trevor MeyerowitzLecturer: Trevor MeyerowitzEE249 Embedded Systems DesignEE249 Embedded Systems DesignProfessor: Professor: Alberto SangiovanniAlberto Sangiovanni--VincentelliVincentelli
October 21October 21stst, 2004, 2004etropolis
![Page 2: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/2.jpg)
2
OutlineOutline
IntroductionIntroductionMotivationMotivationComputer Architecture in 10 Minutes FlatComputer Architecture in 10 Minutes Flat
Processor ModelingProcessor Modeling
Use of Processor Modeling in Embedded SystemsUse of Processor Modeling in Embedded Systems
ConclusionsConclusions
![Page 3: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/3.jpg)
CS252/CullerLec 1.31/22/02
What is “Computer Architecture”?
I/O systemInstr. Set Proc.
Compiler
OperatingSystem
Application
Digital DesignCircuit Design
Instruction SetArchitecture
Firmware
• Coordination of many levels of abstraction• Under a rapidly changing set of forces• Design, Measurement, and Evaluation
Datapath & Control
Layout
![Page 4: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/4.jpg)
CS252/CullerLec 1.41/22/02
The Instruction Set: a Critical Interface
instruction set
software
hardware
![Page 5: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/5.jpg)
CS252/CullerLec 1.51/22/02
Levels of Representation (61C Review)
High Level Language Program
Assembly Language Program
Machine Language Program
Control Signal Specification
Compiler
Assembler
Machine Interpretation
temp = v[k];v[k] = v[k+1];v[k+1] = temp;
lw$15,0($2)lw$16,4($2)sw $16,0($2)sw $15,4($2)
0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
°°
ALUOP[0:3] <= InstReg[9:11] & MASK
![Page 6: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/6.jpg)
CS252/CullerLec 1.61/22/02
Execution Cycle
InstructionFetch
InstructionDecode
OperandFetch
Execute
ResultStore
NextInstruction
Obtain instruction from program storage
Determine required actions and instruction size
Locate and obtain operand data
Compute result value or status
Deposit results in storage for later use
Determine successor instruction
![Page 7: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/7.jpg)
CS252/CullerLec 1.71/22/02
Fast, Pipelined Instruction Interpretation
Instruction Register
Operand Registers
Instruction Address
Result Registers
Next Instruction
Instruction Fetch
Decode &Operand Fetch
Execute
Store Results
NIIF
DE
W
NIIF
DE
W
NIIF
DE
W
NIIF
DE
W
NIIF
DE
W
Time
Registers or Mem
![Page 8: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/8.jpg)
CS252/CullerLec 1.81/22/02
5 Steps of MIPS DatapathFigure 3.4, Page 134 , CA:AQA 2e
MemoryAccess
WriteBack
InstructionFetch
Instr. DecodeReg. Fetch
ExecuteAddr. Calc
ALU
Mem
ory
RegFile
MU
XM
UX
Data
Mem
ory
MU
X
SignExtend
Zero?
IF/ID
ID/EX
MEM
/WB
EX/M
EM4
Adder
Next SEQ PC Next SEQ PC
RD RD RD WB
Dat
a
• Data stationary control– local decode for each instruction phase / pipeline stage
Next PC
Address
RS1
RS2
Imm
MU
X
![Page 9: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/9.jpg)
CS252/CullerLec 1.91/22/02
Relationship of Caching and Pipelining
ALU
Mem
ory
RegFile
MU
XM
UX
Data
Mem
ory
MU
X
SignExtend
Zero?
IF/ID
ID/EX
MEM
/WB
EX/M
EM4
Adder
Next SEQ PC Next SEQ PC
RD RD RD WB
Dat
a
•
Next PC
Address
RS1
RS2
Imm
MU
X
I-Cache
D-C
ache
![Page 10: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/10.jpg)
CS252/CullerLec 1.101/22/02
A Modern Memory Hierarchy• By taking advantage of the principle of locality:
– Present the user with as much memory as is available in the cheapest technology.
– Provide access at the speed offered by the fastest technology.• Requires servicing faults on the processor
Control
Datapath
SecondaryStorage(Disk)
Processor
Registers
MainMemory(DRAM)
SecondLevelCache
(SRAM)
On-C
hipC
ache
1s 10,000,000s (10s ms)
Speed (ns): 10s 100s100s
GsSize (bytes):
Ks Ms
TertiaryStorage
(Disk/Tape)
10,000,000,000s (10s sec)
Ts
![Page 11: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/11.jpg)
11
The Other 90% of ArchitectureThe Other 90% of Architecture
Longer PipelinesLonger PipelinesThe Prescott Pentium 4 CPU has a 31 stage pipelineThe Prescott Pentium 4 CPU has a 31 stage pipeline
Wider Pipelines and SpeculationWider Pipelines and SpeculationSuperscalar Superscalar –– MultiMulti--issueissueSpeculate with branch predictionSpeculate with branch predictionOut of Order ExecutionOut of Order Execution
Caches and BuffersCaches and BuffersUp to 3 levels of caches + specialized cachesUp to 3 levels of caches + specialized cachesMemory Buffers and Reservation StationsMemory Buffers and Reservation Stations
Multiple EverythingMultiple EverythingMultithreadingMultithreadingMultiprocessor System On ChipMultiprocessor System On Chip
![Page 12: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/12.jpg)
12
OutlineOutline
IntroductionIntroduction
Processor ModelingProcessor ModelingSimpleScalarSimpleScalarLiberty Simulation EnvironmentLiberty Simulation EnvironmentMetropolis Processor ModelingMetropolis Processor Modeling
Use of Processor Modeling in Embedded SystemsUse of Processor Modeling in Embedded Systems
ConclusionsConclusions
![Page 13: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/13.jpg)
13
SimpleScalarSimpleScalar OverviewOverview
The Standard for The Standard for MicroarchitecturalMicroarchitectural SimulationSimulationFirst Released in 1996First Released in 1996Developed by Todd Austin and Doug BurgerDeveloped by Todd Austin and Doug BurgerMultiple Levels of Models for AccuracyMultiple Levels of Models for AccuracyWritten in lowWritten in low--level highlevel high--performance sequential Cperformance sequential C--codecode
Supports a Variety of Instruction SetsSupports a Variety of Instruction SetsAlpha, ARM, PowerPC, (x86)Alpha, ARM, PowerPC, (x86)
Supports a Variety of Supports a Variety of MicroarchitecturalMicroarchitectural FeaturesFeatures
![Page 14: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/14.jpg)
14Taken From: http://www.simplescalar.com
![Page 15: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/15.jpg)
15
SimpleScalarSimpleScalar ToolsuiteToolsuite
Taken From: http://www.simplescalar.com
![Page 16: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/16.jpg)
16Taken From: http://www.simplescalar.com
![Page 17: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/17.jpg)
17Taken From: http://www.simplescalar.com
![Page 18: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/18.jpg)
18Taken From: http://www.simplescalar.com
![Page 19: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/19.jpg)
19Taken From: http://www.simplescalar.com
![Page 20: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/20.jpg)
20
SimpleScalarSimpleScalar ConclusionsConclusions
Solid Framework for Solid Framework for MicroarchitecturalMicroarchitectural ResearchResearchUsed for ~33% of all Computer Architecture PapersUsed for ~33% of all Computer Architecture PapersGood for examining new Good for examining new microarchitecturalmicroarchitectural featuresfeaturesFast and Fast and ReliabileReliabile
ButBut……Difficult to Retarget and ModifyDifficult to Retarget and ModifyMonolithic Monolithic –– hard to use with other toolshard to use with other tools……Core Execution Semantics hard to modifyCore Execution Semantics hard to modifyPurely Sequential MOC.Purely Sequential MOC.
![Page 21: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/21.jpg)
21
Liberty Simulation EnvironmentLiberty Simulation Environment
A NextA Next--Generation Generation MicroarchitecturalMicroarchitectural EnvironmentEnvironmentFrom Prof. David AugustFrom Prof. David August’’s Princeton Research Groups Princeton Research GroupCompiler and Simulator FrameworkCompiler and Simulator FrameworkAdvanced Language FeaturesAdvanced Language Features
Structural SpecificationStructural SpecificationExtended PolymorphismExtended Polymorphism
Custom Model of ComputationCustom Model of ComputationComposabilityComposabilityPotential for Optimized SimulationPotential for Optimized Simulation
![Page 22: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/22.jpg)
22Taken From: http://liberty.cs.princeton.edu
![Page 23: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/23.jpg)
23Taken From: http://liberty.cs.princeton.edu
![Page 24: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/24.jpg)
24Taken From: http://liberty.cs.princeton.edu
![Page 25: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/25.jpg)
25Taken From: http://liberty.cs.princeton.edu
![Page 26: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/26.jpg)
26Taken From: http://liberty.cs.princeton.edu
![Page 27: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/27.jpg)
27Taken From: http://liberty.cs.princeton.edu
![Page 28: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/28.jpg)
28Taken From: http://liberty.cs.princeton.edu
![Page 29: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/29.jpg)
29
Liberty ConclusionsLiberty Conclusions
This improves upon This improves upon SimpleScalarSimpleScalar in thatin that……Modular and StructuralModular and StructuralDomain Specific LanguageDomain Specific LanguageSimulator and Compiler GenerationSimulator and Compiler Generation
ButBut……Large learning curve Large learning curve
Arcane entry languagesArcane entry languagesComplex communication protocolComplex communication protocol
Retargeting capabilities are unclearRetargeting capabilities are unclearStill a monolithic environmentStill a monolithic environment
![Page 30: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/30.jpg)
30
Modeling Microprocessors in MetropolisModeling Microprocessors in Metropolis
Focus on Focus on MicroarchitecturalMicroarchitectural Design Space Exploration in Design Space Exploration in the context of a Systemthe context of a System--Level Design frameworkLevel Design framework
Intuitive MOC and Simplified Modeling MethodologyIntuitive MOC and Simplified Modeling MethodologyConnectivity to other toolsConnectivity to other toolsRetargetabilityRetargetability
OutlineOutlineModeling using Kahn Process NetworksModeling using Kahn Process NetworksARM Processor ModelingARM Processor ModelingInstruction Set RetargetingInstruction Set Retargeting
![Page 31: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/31.jpg)
31
Modeling with YAPI + KPNModeling with YAPI + KPNKahn Process NetworksKahn Process Networks
Processes communicating via Processes communicating via unbounded unbounded FIFOFIFO’’ssBlocking Reads / Unblocking Blocking Reads / Unblocking WritesWritesFully deterministicFully deterministicNo notion of timeNo notion of time
YAPIYAPIExtension of KPNExtension of KPNNonNon--deterministic selectdeterministic selectRefinement to bounded Refinement to bounded FIFOFIFO’’ss
Our Work Our Work CharacteristicsCharacteristics
Synchronous assumptionSynchronous assumptionKeeps FIFO lengths fixedKeeps FIFO lengths fixedSeparation of function and Separation of function and timingtiming
MicroarchitecturalMicroarchitectural ModelsModelsSingle Process ModelSingle Process ModelOut of Order Execution ModelOut of Order Execution Model2 Process ARM Models2 Process ARM Models
XScale + StrongarmAbstract Speculative OOE ModelAbstract Speculative OOE Model
![Page 32: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/32.jpg)
32
FRXW
IC
DC
RF
Single Process YAPI Model*Single Process YAPI Model*
• Single Process Execution Order1. Read operands 2. Execute3. Write to register file
• Synchronous Assumption
•Parameterize the pipeline depth•Add hazard detection and bubbleinsertion (stalls)
Haz
ard
•Add a branch predictor•Pass prediction and PC down pipeline (new channels) •Resolve branch when it commits
Bra
nch
Pred
ict
* Collaboration With: Sam Williams
![Page 33: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/33.jpg)
33
OutOut--ofof--Order ArchitecturesOrder ArchitecturesTomasuloTomasulo style register renamingstyle register renaming
Highly ParameterizableHighly Parameterizable#ports, # integer units, depth, etc. #ports, # integer units, depth, etc.
Broadcast nature handled with multiple copies of Broadcast nature handled with multiple copies of each channeleach channel
ReadWriteIssueReadWriteIssue must maintain knowledge of must maintain knowledge of which instructions can be issued to which which instructions can be issued to which functional unitsfunctional units
All execution units are derived from the Station All execution units are derived from the Station classclass
Fetch
IC
RWI
RF
Ren
ame
…
•• NN--way Super scalar processor was realized by way Super scalar processor was realized by changing the depth of the instruction channel changing the depth of the instruction channel (depth can be treated as width) from Fetch(depth can be treated as width) from Fetch
Arb
MU0RS
DC
IU0RS
IUnRS
* Collaboration With: Sam Williams
![Page 34: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/34.jpg)
34
ARM Modeling OverviewARM Modeling Overview
Separate between Separate between MicroarchitecturalMicroarchitectural Performance Performance Model and Program ExecutionModel and Program ExecutionWe Only Need to ModelWe Only Need to Model
Operand and Condition Code DependenciesOperand and Condition Code DependenciesBranch ResultsBranch ResultsExecution LatenciesExecution LatenciesForwarding LatenciesForwarding Latencies
Trace ContainsTrace ContainsEvery InstructionEvery Instruction
Program CounterProgram CounterRead + Write Operands (including Read + Write Operands (including condcond. codes). codes)Instruction TypeInstruction Type
(Optional) Data Addresses Accessed(Optional) Data Addresses AccessedAdvantagesAdvantages
Higher execution speedHigher execution speedSimplified, reusable Simplified, reusable microarchitecturalmicroarchitectural modelingmodeling
ARM ISS
MicroarchModel
Program Code
Performance Characterization
Cross GCC
Inst.Trace
Exec Statistics
![Page 35: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/35.jpg)
35
Double Process ModelDouble Process ModelNeeded For:Needed For:
Modeling ForwardingModeling ForwardingModeling Variable Instruction LatenciesModeling Variable Instruction Latencies
Leverages Leverages FIFOFIFO’’ss for modeling delaysfor modeling delaysPreexecutionPreexecution Delay Delay
Fetch, Decode, etc.Fetch, Decode, etc.Execution Delay Execution Delay
Multiple Latencies, ForwardingMultiple Latencies, ForwardingSynchronizationSynchronizationStallsStalls
Issue StallsIssue StallsBranch MispredictionICache Misses
Result StallsResult StallsOperand Dependencies
Fetch Process
Execute Process
…
![Page 36: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/36.jpg)
36
Double Process ModelDouble Process Model
Fetch Process
Execute Process
Needed For:Needed For:Modeling ForwardingModeling ForwardingModeling Variable Instruction LatenciesModeling Variable Instruction Latencies
Leverages Leverages FIFOFIFO’’ss for modeling delaysfor modeling delaysPreexecutionPreexecution DelayDelay
Fetch, Decode, etc.Fetch, Decode, etc.Execution DelayExecution Delay
Supports Multiple Latencies, Supports Multiple Latencies, ForwardingForwarding
SynchronizationSynchronization
Preload_fifo();While(true) {
stall = stall_in.read();check_mispredict(stall);if (!stall) {inst = fetch(inst_num);if (inst.type == branch)
branch_pred(inst);inst_out.write(inst);
} }
Preload_each_results_fifo();While(true) {read_results();if (stall == 0)
ReadInst = FetchedInst.read();stall = check_stall();compute_memory();DoStall.write(stall);write_results();cycle_count++;
} }
![Page 37: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/37.jpg)
37
Models with MemoryModels with MemoryFeaturesFeatures
Cache ModelsCache ModelsAssociativeAssociativePerfectPerfectStatisticalStatistical
TranslationTranslation LookasideLookaside BuffersBuffersData Data CacheCache WriteWrite BuffersBuffersSharedShared Bus Bus betweenbetween CachesCaches
CloseClose toto SimplescalarSimplescalar ModelsModels ARMARM15% 15% forfor XScaleXScale25% 25% forfor StrongarmStrongarm
StillStill MissingMissingInstructionInstruction BufferBuffer
Fetch Process
Execute Process
…
I-$
D-$
![Page 38: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/38.jpg)
38
ICacheICache UsageUsage
Fetch Process
Execute Process
…
I-$
D-$
1. Instruction Fetch:get next instruction from trace. IssueStall = instruction.issue_stall
2. Instruction Cache Check:Query Instruction CacheIssueStall += iCache.read(PC)
3. Issue Stalling:Write Bubbles To Fetch Queue for Issue Stall Cycles, then write instruction to Fetch Queue
![Page 39: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/39.jpg)
39
DCacheDCache UsageUsage
Fetch Process
Execute Process
…
I-$
D-$
1. Load/Store Instruction:foreach (inst.data_address)
dCache.checkHit(data_address);
2. L/S Dispatch:if (inCache(addresses))
dispatch to hitQueue;else Dispatch to missQueue;
3. L/S Commit:Upon Completion:
Update dCache state
![Page 40: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/40.jpg)
40
Currently under development in collaboration Currently under development in collaboration with Haibo Zeng and Qi Zhu with Haibo Zeng and Qi Zhu {{zenghbzenghb, , zhuqi}@eecs.berkeley.eduzhuqi}@eecs.berkeley.edu
Adding in Speculation + OOE as an afterthoughts Adding in Speculation + OOE as an afterthoughts can be difficultcan be difficult
Why not begin with it and then constrain to a real Why not begin with it and then constrain to a real implementation?implementation?
Assume Perfect Model (for a given fetch width)Assume Perfect Model (for a given fetch width)Branch predictionBranch predictionPerfect Memory and Register FilesPerfect Memory and Register FilesUnlimited Execution Resources and ForwardingUnlimited Execution Resources and Forwarding
Analyze Performance for Different ApplicationsAnalyze Performance for Different ApplicationsParallelismParallelismResource UsageResource UsageEtc.Etc.
Abstract Exec Unit
Perfect FetchIC
Restrict to an actual implementation by adding constraints
to the model.Perfect RWIRF
Ren
ame
An Abstract Speculative ModelAn Abstract Speculative Model
![Page 41: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/41.jpg)
41
ISA_ML OverviewISA_ML OverviewMain PartsMain Parts
A Visual Instruction Set Description LanguageA Visual Instruction Set Description LanguageCurrently one describes the encoding of instructionsCurrently one describes the encoding of instructionsWritten using GMEWritten using GME**, a UML, a UML--based environment for constructing domain specific modeling envibased environment for constructing domain specific modeling environmentsronments
Generates a C++Generates a C++--based based disassemblerdisassembler and traceand trace--interface code for the given model ISA interface code for the given model ISA description description
Key Features:Key Features:Two high level modelsTwo high level models
ISA StateISA State: Register Files, Memories, Program Counter, etc.: Register Files, Memories, Program Counter, etc.InstructionsInstructions: Encoding and operand fields of each instruction: Encoding and operand fields of each instruction
Intuitive Visual InterfaceIntuitive Visual InterfaceLeverages Hierarchy + Compact RepresentationLeverages Hierarchy + Compact RepresentationExtensive Error CheckingExtensive Error CheckingEasy to Retarget to Output Other Formats (e.g. Easy to Retarget to Output Other Formats (e.g. verilogverilog, , nMLnML, etc), etc)
*GME website: http://www.isis.vanderbilt.edu/projects/gme/
ResultsResults MIPS Integer Subset PowerPC Integer Subset ARM (approximate)
Base Instructions 10 12 6
Actual Instructions 55 80 26
Hours to Enter (appox.) 8 6 5
Header File (# lines) 1357 2134 759
To Appear in 2004 OOPSLA Workshop on Domain Specific Modeling, October 24th, 2004Title: A Visual Language for Describing Instruction Sets and Generating DecodersAuthors: T. Meyerowitz, J. Sprinkle, A. Sangiovanni-Vincentelli
![Page 42: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/42.jpg)
42
BitfieldBitfieldOperandsOperands
NumBitsNumBits
EncodingEncoding
SingleEncodingSingleEncoding
AnchorsAnchors AnchorPointAnchorPoint
ConnectionConnection Specifies the Specifies the ordering of ordering of bitfieldsbitfields
Begin Anchor End Anchor Custom Anchor
Instruction Memory Ref Immediate Constant Register Ref
Ordering Connection
ISA_ML Instruction Elements
State ElementsState Elements WordSizeWordSize
Address BitsAddress Bits
Program CounterProgram Counter Source RegisterSource Register
ISA_ML State Elements
![Page 43: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/43.jpg)
43
Sample Instructions: Base InstructionSample Instructions: Base Instruction
RdRm RnReg-Base InstEndConfigBase
InstBegin
InstructionInstruction 00……11 2...52...5 66……99 10..1310..13 14..1714..17 18..2118..21 22..2322..23 24..2724..27 28..3128..31
Arith BaseArith Base 1111 xxxxxxxx RmRm RnRn xxxxxxxx xxxxxxxx xxxx xxxxxxxx RdRd
AddAdd 1111 01110111 RmRm RnRn xxxxxxxx xxxxxxxx xxxx configconfig RdRd
SubtractSubtract 1111 00010001 RmRm RnRn xxxxxxxx xxxxxxxx 0000 configconfig RdRd
MACMAC 1111 00110011 RmRm RnRn RmacRmac xxxxxxxx xxxx xxxxxxxx RdRd
InsReg
InsCon
ArithmeticBase
![Page 44: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/44.jpg)
44
Addition
Sample Instructions: Other InstructionsSample Instructions: Other Instructions
InstructionInstruction 00……11 2...52...5 66……99 10..1310..13 14..1714..17 18..2118..21 22..2322..23 24..2724..27 28..3128..31
Arith BaseArith Base 1111 xxxxxxxx RmRm RnRn xxxxxxxx xxxxxxxx xxxx xxxxxxxx RdRd
AdditionAddition 1111 01110111 RmRm RnRn xxxxxxxx xxxxxxxx xxxx configconfig RdRd
SubtractractionSubtractraction 1111 00010001 RmRm RnRn xxxxxxxx xxxxxxxx 0000 configconfig RdRd
Multiply AccumulateMultiply Accumulate 1111 00110011 RmRm RnRn RmacRmac xxxxxxxx xxxx xxxxxxxx RdRd
ConfigBeginConstants
InsReg
ConIns
ArithmeticBase
Subtraction
InsReg
InsCon
ArithmeticBaseBeginConstants
Config
InsReg
InsCon
ArithmeticBaseBeginConstants Rmac
Multiply Accumulate
![Page 45: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/45.jpg)
45
Modeling Microprocessors in Metropolis: ConclusionsModeling Microprocessors in Metropolis: Conclusions
Improvements on Prior ApproachesImprovements on Prior ApproachesMore AbstractMore AbstractMore More RetargetableRetargetable and Modularand ModularMethodology for Refinement and DSEMethodology for Refinement and DSE
ButBut……Ongoing accuracy comparison with other toolsOngoing accuracy comparison with other toolsPerformance needs to improvePerformance needs to improveCurrently requires an external ISS to drive itCurrently requires an external ISS to drive it
![Page 46: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/46.jpg)
46
OutlineOutline
IntroductionIntroduction
Processor ModelingProcessor Modeling
Use of Processor Modeling in Embedded SystemsUse of Processor Modeling in Embedded SystemsDifferent Levels of ModelingDifferent Levels of ModelingCoCo--SimulationSimulationBackBack--AnnotationAnnotation
ConclusionsConclusions
![Page 47: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/47.jpg)
47
Accuracy Accuracy vsvs Performance Performance vsvs CostCost
Hardware Emulation
Cycle accurate model
Cycle counting ISS
Static spreadsheet
Dynamic estimation
Accuracy Speed $$$*
+++ -----+-
++ --++ + -+
-
++ ++
+++ +++*$$$ = NRE + per model + per design
![Page 48: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/48.jpg)
48
Traditional Traditional CosimulationCosimulation
AdvantagesAdvantagesAllows prototyping without actual Allows prototyping without actual hardwarehardwareConsistency between HW and SW Consistency between HW and SW modelsmodels
DisadvantagesDisadvantagesOverhead for having 2+ simulatorsOverhead for having 2+ simulatorsOften requires custom Often requires custom microprocessor modelsmicroprocessor modelsDoesnDoesn’’t scale well for t scale well for multiprocessor systemsmultiprocessor systems
Microprocessor Simulator
Hardware Simulator
SoftwarePrograms
Hardware DescriptionsInterfacing
Simulator
![Page 49: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/49.jpg)
49
CoCo--Simulation in MetropolisSimulation in Metropolis
Functional (ISS) Model
TimingModel
uArch ModelApplication
Process
Program Code, Sync Points, and Data Arguments
Performance Information
OtherApplication Processes
Application Mapping Architecture
Other Architecture Components
![Page 50: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/50.jpg)
50
Back Annotation RequirementsBack Annotation RequirementsUserUser--specified level of granularityspecified level of granularityFlexibility for handling nonFlexibility for handling non--trivial interactionstrivial interactions
RTOSRTOS’’ss, Interrupts, Pipelining, Intra, Interrupts, Pipelining, Intra--process variationprocess variationNatural + Flexible SyntaxNatural + Flexible SyntaxFunction with Function with MetamodelMetamodel and Native Codeand Native Code
Our Proposed SolutionOur Proposed SolutionTwo functions to annotate the model codeTwo functions to annotate the model code
CPU.CPU.BackAnnotateBackAnnotate(begin_label(begin_label, , end_labelend_label, atomicity, (arguments)), atomicity, (arguments))CPU.CPU.BB_BackAnnotateBB_BackAnnotate(begin_label(begin_label, , end_labelend_label, atomicity, (arguments)), atomicity, (arguments))
Handle complicated features at the systemHandle complicated features at the system--levellevel
Backwards AnnotationBackwards Annotation
![Page 51: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/51.jpg)
51
ISS Model
TimingModel
uArch ModelApplication
Process
Program Code, Sync Points, and Data Arguments
Mapping Process
Sync
Back Annotation: Overall PictureBack Annotation: Overall Picture
Annotated Mapping Process
Annotated Mapping Process
![Page 52: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/52.jpg)
52
Back Annotation: ExampleBack Annotation: Example
In_data = InPort.ReadInputs();
Out_data = do_processing(In_data);
OutPort.WriteOutPuts()
CPU.read(IN_DATA_SIZE);
CPU.execute(CPU.back_annotate( do_processing.begin, do_processing.end, true));
CPU.write(OUT_DATA_SIZE)
uArch Model and
Back Annotator
CPU.execute(EXEC_TIME);
Application Process Mapping Process
CPU
![Page 53: CPU Modeling and Use for Embedded Systemsembedded.eecs.berkeley.edu/research/hsc/class/ee249/... · 2004. 10. 25. · Processor Modeling ... Lec 1.4 1/22/02 The Instruction Set: a](https://reader033.fdocuments.us/reader033/viewer/2022052006/601a0b2ecbc2046dae270b06/html5/thumbnails/53.jpg)
53
Final WordsFinal WordsSoftware is a key component in Embedded SystemsSoftware is a key component in Embedded Systems
Fast and Accurate Modeling is KeyFast and Accurate Modeling is KeyTime isnTime isn’’t the only factor to considert the only factor to consider
Power, Memory Usage, Communication Usage, etc.Power, Memory Usage, Communication Usage, etc.
Traditional Traditional MicroarchitecturalMicroarchitectural Environments are UnsuitableEnvironments are UnsuitableMonolithic designsMonolithic designsRetargeting and integration issuesRetargeting and integration issues
WeWe’’re developing an integrated approach within Metropolisre developing an integrated approach within Metropolis
What we havenWhat we haven’’t coveredt coveredSoftware Performance Estimation (Coming SoonSoftware Performance Estimation (Coming Soon……))
Estimate based on application, computation, and communicationEstimate based on application, computation, and communicationArchitecture Description LanguagesArchitecture Description LanguagesCommercial OfferingsCommercial Offerings
Mentor Graphics Mentor Graphics -- SeamlessSeamlessCoWareCoWare -- ConvergenceSCConvergenceSC + + LISAtekLISAtekVaSTVaST SystemsSystemsEt al.Et al.