Internals of SimpleScalar Simulators
description
Transcript of Internals of SimpleScalar Simulators
Tutorial Overview
Internals of SimpleScalar Simulators
CPEG323 Tutorial
Long Chen
November, 2005
Outline
The SimpleScalar Instruction SetInternal structure of the
SimpleScalar simulatorSoftware architecture of the simulatorSome
important modulesAbout the project
The SimpleScalar Instruction Set
Clean and simple instruction set architecture:MIPS + more
addressing modesBi-endian instruction set definitionFacilitetes
portability, build to match host endian64-bit inst encoding
facilitates instruction set research16-bit space for hints, new
insts, and annotationsFour operand instruction format, up to 256
registers
Simulation Architected State
Outline
The SimpleScalar Instruction SetInternal structure of the
SimpleScalar simulatorSoftware architecture of the
simulatorImportant modulesAbout the project
Simulator Structure
Outline
The SimpleScalar Instruction SetInternal structure of the
SimpleScalar simulatorSoftware architecture of the
simulatorImportant modulesAbout the project
Simulator Software Architecture
Interface programming styleAll .c files have an accompanying .h
file with same base.h files define public interfaces exported by
moduleMostly stable, documented with comments, studying these
files.c files implement the exported interfacesNot as stable, study
these if you need to hack the functionalitySimulator modulessim-*.c
files, each implements a complete simulator coreReusable S/W
components facilitate rolling your ownSystem componentsSimulation
componentsAdditional really useful components
Brief Source Roadmap
Start point: main.c
Simulator coressim-fast, sim-safe
Loader: loader.[c,h]
Memory: memory.[c,h]
Register: regs.[c,h]
ISA Def: machine.def
ISA routines: machine.[c,h]
System Call: syscall.[c,h]
Cache: cache.[c,h]
Options parsing: options.[c,h]
Machine Definition File (machine.def)
A single file describes all aspects of the architectureUsed to
generate decoders, dependency analyzers, functional components,
disassemblers, appendices, etc.e.g., machine definition + ~30 line
main = functional simulatorGenerates fast and reliable codes with
minimum effortInstruction definition example:
#define OR_IMPL \ { \ SET_GPR(RD, GPR(RS) | GPR(RT));\ }DEFINST(OR,
0x50, "or", "d,s,t", IntALU, F_ICOMP, DGPR(RD), DNA, DGPR(RS),
DGPR(RT), DNA)
disassembly template
FU reqs
output deps
semantics
opcode
inst flags
input deps
operands
SimpleScalar ISA Module (machine.[hc])
Macros to expedite the processing of instructionsConstants needed
across simulators, for example, the size of the register
fileExamples:
/* returns the opcode field value of SimpleScalar instruction INST */#define MD_OPFIELD(INST)(INST.a & 0xff)#define MD_SET_OPCODE(OP, INST)((OP) = ((INST).a & 0xff))
/* inst -> enum md_opcode mapping, use this macro to decode insts */#define MD_OP_ENUM(MSK)(md_mask2op[MSK])
/* enum md_opcode -> description string */#define MD_OP_NAME(OP)(md_op2name[OP])
/* enum md_opcode -> opcode operand format, used by disassembler */#define MD_OP_FORMAT(OP)(md_op2format[OP])
/* enum md_opcode -> opcode flags, used by simulators
*/#define MD_OP_FLAGS(OP)(md_op2flags[OP]) /* disassemble an
instruction */void md_print_insn(md_inst_t inst, md_addr_t pc,
FILE*stream);
Instruction Field Accessors
Instruction Semantics Specification
Main Loop (sim-fast.c)
/* set up initial default next PC */ regs.regs_NPC = regs.regs_PC +
sizeof(md_inst_t); while (TRUE) { /* maintain $r0 semantics */
regs.regs_R[MD_REG_ZERO] = 0; /* keep an instruction count
*/#ifndef NO_INSN_COUNT sim_num_insn++;#endif /* !NO_INSN_COUNT */
/* load instruction */ MD_FETCH_INST(inst, mem, regs.regs_PC); /*
decode the instruction */ MD_SET_OPCODE(op, inst); /* execute the
instruction */ switch (op) {#define
DEFINST(OP,MSK,NAME,OPFORM,RES,FLAGS,O1,O2,I1,I2,I3)\case OP:\
SYMCAT(OP,_IMPL);\ break;#include "machine.def"default:
panic("attempted to execute a bogus opcode"); } /* execute next
instruction */ regs.regs_PC = regs.regs_NPC; regs.regs_NPC +=
sizeof(md_inst_t); }
The instruction is executed in a shot here, consider the pipeline
approach in the project requirement
Outline
The SimpleScalar Instruction SetInternal structure of the
SimpleScalar simulatorSoftware architecture of the
simulatorImportant modulesAbout the project
Memory Module (memory.[hc])
Functions for reading from, writing to, initializing and dumping
the contents of the main memory
Functions to initialize the register files and dump their contentsAccess non-speculative register directly, e.g., regs_R[5] = 12e.g., regs_F.f[4] = 23.5;Floating point register file supports three viewsinteger wordsingle-precision, double-precision/* floating point register file format */union regs_FP_t { md_gpr_t l[MD_NUM_FREGS];/* integer word view */ md_SS_FLOAT_TYPE f[SS_NUM_REGS];/* single-precision FP view */ SS_DOUBLE_TYPE d[SS_NUM_REGS/2];/* double-precision FP view */};/* floating point register file */extern union md_regs_FP_t regs_F;/* (signed) hi register, holds mult/div results */extern SS_WORD_TYPE regs_HI;/* (signed) lo register, holds mult/div results */extern SS_WORD_TYPE regs_LO;/* program counter */extern SS_ADDR_TYPE regs_PC;
Register Module (reg.[hc])
Loader Module (loader.[hc])
Other Modules
cache.[hc]: general functions to support multiple cache types (you
may pay attention to this part for the coming project)misc.[hc]:
numerious useful support functions, such as warn(), info(),
elapsed_time()options.[hc]: process command-line argumentssim.h: a
few extern variable declarations and function prototypes
Outline
The SimpleScalar Instruction SetInternal structure of the
SimpleScalar simulatorSoftware architecture of the
simulatorImportant modulesAbout the project
Hints for the Phase 2
sim-safe is the simplest simulatorIn the main loop of the original
code, there are three steps to fetch inst, decode inst, and execute
inst. Although this is not a pipeline, you can start with itIn this
phase, you only need to take care of the data dependency among
instructions, no need to consider the execution latencyTherefore,
all instructions take 5 cycles to complete in the 5-stage pipeline
(each stage takes 1 cycle) if there is no need to stall the
pipeline because of dependenciesBut, keep in mind that this is not
true in the real world. For example, load may need 30 cycles to
complete in the real case, while add may need 5 cycles. You are
going to deal with the timing problem in the coming project
What are Expected?
In your project reportHow to formulate the problem?What is your
detailed design?How to distribute the workload among members?Any
problems with the design during the implementation? How did you
improve it?What is the result? Any things could be done better?Copy
of the source code you added/modified with proper comments
(printing the entire file is a waste of tree and time)Email me the
source code you added/modified, with a short description
Have fun with the simulator