HAsim Status Update Joel Emer Michael Adler Angshuman Parashar Michael Pellauer Murali...
-
date post
20-Dec-2015 -
Category
Documents
-
view
223 -
download
0
Transcript of HAsim Status Update Joel Emer Michael Adler Angshuman Parashar Michael Pellauer Murali...
HAsim Status UpdateHAsim Status Update
Joel EmerMichael AdlerAngshuman Parashar
Michael PellauerMurali Vijayaraghavan
Nikhil Patil
Abhishek Bhattacharjee
VSSAD, Intel
CSG Group, CSAILMIT
UT Austin
Princeton University
2
Recap: Virtual Platform
• Set of Abstractions– Provide common set of functionalities across multiple
physical platforms• XUP Board• PCI-express Board• Intel FSB Socket• Bluesim/Vsim• BEE3
– Leverage Asim Plug N Play• Minimize module replacements/recoding while moving across
platforms
3
Virtual Platform Infrastructure
Communication Layers
RRR Layers
FPGA Modules
Virtual Platform
Platform Interface
Communication Layers
RRR Layers
Hardware Software
Software Modules
MemoryFront Panel
ExeDecodeFetch
FuncModelControl Decode
Front Panel Memory
4
RRR Specification Language// ----------------------------------------// create a new service called ISA_EMULATOR// ----------------------------------------service ISA_EMULATOR{ // -------------------------------- // declare services provided by CPU // -------------------------------- server CPU <- FPGA; { method UpdateRegister(in REG_INDEX i, in REG_VALUE
v); method Emulate(in INST_INFO i, out INST_ADDR a); };
// --------------------------------- // declare services provided by FPGA // --------------------------------- server FPGA <- CPU; { method SyncRegister(in REG_INDEX i, in REG_VALUE
v); };};
5
FPGA CPU
Remote Request/ResponseClientStub_ISA_EMULATOR cpu;......cpu.UpdateRegister_MakeRequest( REG_R27, regFile[REG_R27]);......cpu.Emulate_MakeRequest(inst);......targetPC <- cpu.Emulate_GetResponse();
ISA_EMULATOR::UpdateRegister( REG_INDEX i, REG_VALUE v){ regFile[i] = v;}
ISA_EMULATOR::Emulate( INST_INFO inst){ // emulate the instruction
return target_PC;}
Client Stub Server Stub
Communication Layers(Runtime System)
Use
r Cod
e User Code
RRRspecification
files
6
Virtual Platform/RRR Status Update
• Software + Hardware, Client + Server Stubs• Multiple Arguments for method calls• Auto-generation of Soft Connections through
Platform Interface, and Remote Stubs• PCI-Express Physical Platform
– Physical Channel implementation using CSRs– Soft Reset
• Several services in HAsim– Very positive feedback from developers
7
HAsim: MIPS Alpha
• Motivation– Couldn’t find any Full System MIPS simulator with multi-
processor + large memory support
• HAsim-Alpha– M5 “running” in software
• Target Memory Image• Syscall Emulation• Other instructions not implemented on FPGA (e.g. FP currently)
– Functional + Timing model on FPGA
8
HAsim-Alpha Highlights• Implemented Alpha Functional Model
– Primary changes• ISA spec
– Instruction Format + Queries• Datapath
– Execution Semantics– Unchanged
• Dependency logic• Register File• Memory Subsystem (incl. Store Buffer)
• Multiple timing models– Unpipelined– 5 Stage– In order with caches– OoO
• Running long Alpha programs (e.g. SPEC2k)
9
Old Instruction Emulation with Cache Flush
FPG
ASo
ftw
are
Time
Execute
FunctionalCache
MemoryServer
Execute
EmulationServer
Sync Registers
Instruction Simulator
Write Line
Sync
Reg
ister
s
RRRLayer
Emulate Instruction
Emul
ation
Don
e
Execute
FlushDone
…
……
10
Write Line
Writ
e Ba
ck o
rIn
valid
ate
Hybrid Instruction EmulationFP
GA
Soft
war
e
Time
Execute
EmulationServer
Instruction Simulator
MemoryServer
FunctionalCache
Execute
EmulationServer
Sync Registers
Instruction Simulator
Done
Sync
Reg
ister
s
RRRLayer
Emulate Instruction
Emul
ation
Don
e
……
Ack
11
RRR ISA Emulation Specificationservice ISA_EMULATOR{ server sw (cpp, method) <- hw (bsv, connection) { method sync(in RNAME[RNAME_BITS] rname,
in RVAL[RVAL_BITS] rval);
method emulate(in INST[INST_BITS] inst, in ISA_ADDRESS[FUNCP_ISA_V_ADDR_SIZE] pc, out ISA_ADDRESS[FUNCP_ISA_V_ADDR_SIZE] newPc);
};
server hw (bsv, connection) <- sw (cpp, method) { method sync(in RNAME[RNAME_BITS] rname,
in RVAL[RVAL_BITS] rval); };};
12
Dynamic Simulator ConfigurationFP
GA
Soft
ware
Time
DynamicParam
Controller
Set Parameters
Param Node
Don
e
RRRLayer
Param Node
Param Node
Param Node
DynamicParam
Controller
Set
Val
ue
Done?
EnableFunctional Cache?
13
RRR Dynamic Parameter Specification
service PARAMS{ // // Send one dynamic parameter ID and value to the hardware. // An ACK is returned to guarantee that the parameter has // been received. // server hw (bsv, connection) <- sw (cpp, method) { method sendParam(in UINT32[32] pname, in UINT64[64] pval, out UINT8[8] ack); };};
14
Other Uses of RRR
• Stats• Events• Assertions• Control Messages• Streams
15
Producer Consumer
Data A-Port
Credits A-Port
Producer Consumer
No buffering present within the Ports
Producer Interface:Bool canSend() Do we have enough credits?
Action enq(Maybe#(t) x) Send data or invalid.
Action pass() Indicate end of cycle
Consumer Interface:Bool canReceive() Is data available?AV#(Data) pop() Receive dataAction done (cred) Indicate end of cycle, and send back credits
if (canSend) enq(x)else pass()
if (canReceive) x <- pop()done(x)
Modeling Back-Pressure using A-Ports
A-Port
Credit Port
16
Structures using Credit PortsModel FIFOs using Credit Ports
Data (A1)
Credits (A1)
Producer Consumer
“Stall ports”: A stall down the pipeline doesn’t get combinationally propagated
Data (A1)
Credits (A0)
Producer Consumer
“Pipeline ports”: The pipeline registers in traditional pipelines
17
Caches
• Functional Partition– Functional Cache
• Target memory image data from M5– Functional TLB
• Target V P translations
• Timing Partition– I and D Cache models– Attempting to unify interface for all caches
18
Request
MEMORYstage
L1 Cache
MAINMEMORY
Cache Req Interface:LOADSTOREPREFETCH
INVALIDATE LINEINVALIDATE ALLKILL ALLFLUSH LINEFLUSH ALL
Cache Response:
Immediate Response:HITMISS SERVICINGMISS RETRY
Delayed Response:MISS RESPONSE
Timing Partition Cache Interface
ImmediateResponse
DelayedResponse
19
Ongoing/Future Work• Virtual Platform Infrastructure
– More Sophisticated Type System– Virtual Memory for FPGA
• Share page tables with software application• Cache V P translations in a TLB
– FPGA requests user software for translations– Software kernel must shootdown FPGA TLB when mapping changes– Note: distinct from HAsim Functional TLB
• Functional Model– Multiple Contexts– Ultimate goal: Run a full system
• Timing Model– Multiple Contexts– Realistic Microarchitecture
Backup
21
“Connection”-style Stubs
interface ClientStub_ISA_EMULATOR; method Action makeRequest_UpdateRegister( REG_INFO reg_info);endinterface
typedef struct {...} REG_INFO deriving (Bits, Eq);
Connection_Send#(REG_INFO) link <- mkConnection_Send( “ISA_EMULATOR_UpdateRegister”);
link.send(reg_info);
RRR Stack
Connection_Receive#(REG_INFO) link <- mkConnection_Receive( “ISA_EMULATOR_UpdateRegister”);
ClientStub_ISA_EMULATOR <- mkClient...
let a = link.receive();stub.makeRequest_UpdateRegister(a);
Connections:Per-method or
Per-service?
Platform Interface
How does Platform Interface get the
RRR types?
Stub
User Code
auto-generated
auto-generated
hand-written
Soft connections
22
interface ClientStub_ISA_EMULATOR; method Action makeRequest_UpdateRegister( Bit#(70) reg_info);endinterface
typedef struct {...} REG_INFO deriving (Bits, Eq);
`include “remote_client_stub_ISA_EMULATOR.bsh”
ClientStub_ISA_EMULATOR stub <- mkClientStub_ISA_EM...
stub.makeRequest_UpdateRegister(reg_info);
RRR Stack
Connection_Receive#(Bit#(70)) link <- mkConnection_Receive(“ISA_EMULATOR_UpdateRegister”);
ClientStub_ISA_EMULATOR stub <- mkClientStub_ISA_EM...
let a = link.receive();stub.makeRequest_UpdateRegister(a);
Platform Interface
Connection_Receive#(Bit#(70)) link <- mkConnection_Send(“ISA_EMULATOR_UpdateRegister”);
method Action makeRequest_UpdateRegister( REG_INFO reg_info); link.send(pack(reg_info));endmethod
User Code
Stub
Remote Stub
auto-generated
auto-generated
auto-generated
hand-written
Soft connections
23
Hello, World!hello.bsv
module mkSystem#(LowLevelPlatformInterface llpi)();
Streams streams <- mkStreams(llpi); Reg#(Bool) done <- mkReg(False);
rule hello (!done); streams.makeRequest(`STREAMS_MESSAGE_HELLO); done <= True; endrule
endmodule
hello.dict
def STREAMS.MESSAGE.HELLO "Hello, World!\n";
24
RRR Memory Interface Specification
service FUNCP_MEMORY{ server sw (cpp, method) <- hw (bsv, connection) { method Load (in MEM_ADDRESS_RRR[64] addr, out MEM_VALUE[FUNCP_ISA_INT_REG_SIZE] data); method LoadCacheLine (in MEM_ADDRESS_RRR[64] addr, out
MEM_CACHELINE[FUNCP_CACHELINE_BITS] data);
method Store(in MEM_STORE_INFO_RRR[MEMORY_STORE_INFO_SIZE] info); method StoreCacheLine(in MEM_STORE_CACHELINE_INFO_RRR[MEMORY_STORE_CACHELINE_INFO_SIZE]
info); // Store cache line with ACK method StoreCacheLine_Sync(in
MEM_STORE_CACHELINE_INFO_RRR[MEMORY_STORE_CACHELINE_INFO_SIZE] info, out UINT32[32] ack);
method VtoP(in MEM_VALUE[FUNCP_ISA_INT_REG_SIZE] va, out MEM_ADDRESS_RRR[64] pa); };
server hw (bsv, connection) <- sw (cpp, method) { method Invalidate(in MEM_INVAL_CACHELINE_INFO_RRR[96] info, out UINT32[32] ack); method InvalidateAll(in UINT32[32] req, out UINT32[32] ack); };};
25
Request
MEMORYstage
L1 Cache
MAINMEMORY
Cache Req Interface:LOADSTOREPREFETCHINVALIDATE LINEINVALIDATE ALLKILL ALLFLUSH LINEFLUSH ALL
Cache Response:
Immediate Response:HITHIT SERVICINGMISS SERVICINGMISS RETRY
Delayed Response:MISS RESPONSEHIT RESPONSE
Timing Partition Cache Interface
ImmediateResponse
DelayedResponse
26
Producer Consumer
Data
Credits
Data A-Port
Credits A-Port
Producer Consumer
No buffering present in the Ports
Producer Interface:Bool canSend() Do we have enough credits?
Action enq(Maybe#(t) x) Send data or invalid.
Action pass() Indicate end of cycle
Consumer Interface:Bool canReceive() Is data available?AV#(Data) pop() Receive dataAction done (cred) Indicate end of cycle, and send back credits
if (canSend) enq(x)else pass()
if (canReceive) x <- pop()done(x)
Credit Ports
27
Producer
Data
Credits
ConsumerCompletion Buffer
Structures using Credit Ports
• Since buffering is not modeled in credit ports using FIFOs, any sort of buffer can sit on the consumer side
• Reduced the code size of timing models drastically