Post on 03-Jul-2020
1
Embedded Conference Scandinavia 2012-10-03
Embedded System Design for
Multi-core FPGA systems
Francesco Robino, Johnny Öberg,
Seyed Hosein Attarzadeh Niaki, Ingo Sander
Electronic System dept
School of Information and Communication Technology
2
Embedded Conference Scandinavia 2012-10-03
ITRS Roadmap trend
• From Single Core to Many Cores
– It is a fact today
– Tilera (64), Adapteva Epiphany IP, …
• ITRS 2007 Roadmap
2010 2020
Consumer Stationary 25 300
Networking 10 500
Consumer Portable 64 900
3
Embedded Conference Scandinavia 2012-10-03
Number of Cores on Chip
4
Embedded Conference Scandinavia 2012-10-03
Sea-of-Cores The new “SoC” paradigm
The ‘core’ is the logic gate of the 21st century
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
p
m
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
pm
s
5
Embedded Conference Scandinavia 2012-10-03
Adapteva Epiphany IP
• Multi processor architecture, using high-bandwidth on-chip network.
6
Embedded Conference Scandinavia 2012-10-03
Multi/Many cores challenges• Multi-core programming is a
headache (FrontEnd)
– High level programming model (standard)
– Keep legacy code and IP blocks
• Fast prototyping (BackEnd)
– Generate arbitrarily large systems in an easy fashion
– Customize and modify the generated system in a simple and intuitive way
• Avoiding Technology lock-in
Platform
architecture
Drivers and Low
level applications
High Level
programming model
7
Embedded Conference Scandinavia 2012-10-03
ForSyDe Design Tools
• ForSyDe FrontEnd
– High-Level Description
• Flexible BackEnd
– Pure VHDL
– NVIDIA GPGPU
– NoC Generator -> Fast-prototyping on FPGAs
0
3
0 1 2
3 4 5
2
1Platform
architecture
Drivers and Low
level applications
8
Embedded Conference Scandinavia 2012-10-03
ForSyDe: High-Level design entry
• ForSyDe methodology implemented as SystemC library
• ForSyDe provides also a XML + C entry
• ForSyDe permits simulation of the system abstracting from the platform
and
OR
9
Embedded Conference Scandinavia 2012-10-03
Models of Computation (MoCs)
• A model of computation (MoC) is a mathematical description that is used to specify the semantics of computation and concurrency
– How functionalities are executed?
– When functionalities are executed?
10
Embedded Conference Scandinavia 2012-10-03
Why Synchronous MoCs as design entry?
• The programming model is independent from the number of cores
– Processes can be mapped to any processor in any order.
– System can be implemented without RTOS and it is possible to predict the buffer size between processes.
– Clear separation between communication and computation through process constructors.
11
Embedded Conference Scandinavia 2012-10-03
ForSyDe Process Constructor
ForSyDe Container specifies• The MoC• How to connect the communication channels
SystemC code: auto p5 = make_comb(“p5", functionality_5, sig_from2, sig_to4);
12
Embedded Conference Scandinavia 2012-10-03
The System Description
System Description XML file:
- HW platform description
- Processes are assigned to the
processor they should be executed on
Application (ForSyDe SystemC files):
The System is viewed as a set of concurrent
communicating process constructors.
0
32
10
32
1
13
Embedded Conference Scandinavia 2012-10-03
The NoC system Generator• A fast prototyping BackEnd
– From XML and C to heterogeneous MPSoCimplementation on FPGA
14
Embedded Conference Scandinavia 2012-10-03
Process Main (.c)void p0_main(void){
int input_reg_value = NOC_RNI_CHK_MSG(NOC_RNI_BASE,p0_recv_channel_from_p15); if (input_reg_value>0) // Something for me?{
int value=IORD(p0_recv_value,0); // Map input parameters// Do something interestingvalue=value+1;IOWR(p0_send_value,0,value);
while (NOC_RNI_STATUS(NOC_RNI_BASE)!=0); // WAIT for CTS NOC_RNI_SEND(NOC_RNI_BASE,p0_priority,p0_pid,p1_pid,p0_send_channel_to_p1,
p0_msg_len);NOC_RNI_CLEAR(NOC_RNI_BASE,p0_recv_channel_from_p15);
}else{
// If this branch is executed, the global clock sync period is too narrow (inputs have not arrived)
IOWR(PIO_0_BASE,0,0xFF);};
};
! Automatically parsed from ForSyDe SystemC !
15
Embedded Conference Scandinavia 2012-10-03
XML System Description
• Consists of Four parts
– Software configuration
– Target description
– Interconnection description
– Hardware configuration
Platform
architecture
Drivers and Low
level applications
16
Embedded Conference Scandinavia 2012-10-03
Software Configuration
<software>
<parameter name="Repository" value="D:/NoC/Examples/HotPotato" />
<process name="p0"
moc="Synchronous"
node="0"
sources="{p15}"
targets="{p1}"
files="{process_0.c}" />
• The Designer specifies
– the process’ source files,
– its model of computation,
– which processor it should be executed on,
– which processes the process is talking to (this will be derived automatically by ForSyDe in the near future)
17
Embedded Conference Scandinavia 2012-10-03
Target System Description
<system name="NoC_2x2x2_experiment">
<parameter name="targetDirectory" value="D:/FPGA_Designs/Quartus_3D_NoC_2x2x2" />
<parameter name="targetManufacturer" value="Altera" />
<parameter name="targetManufacturerVersion" value="10.1" />
<parameter name="boardType" value="DE3" />
<parameter name="boardFrequency" value="50 MHz" />
# Syntax : {port name,pin}
<parameter name="Clock" value="{sys_clk,Q23}" />
<parameter name="Reset" value="{reset,Q24}" />
18
Embedded Conference Scandinavia 2012-10-03
Interconnect Description
<hardware>
<parameter name="nocType" value="Mesh" />
<parameter name="nocKind" value="3DNoC" />
<parameter name="rni_version" value="v2.0"/>
<parameter name="HDLrootDirectory" value="D:/NoC" />
<parameter name="nrofCols" value="2" />
<parameter name="nrofRows" value="2" />
<parameter name="nrofLayers" value="2" />
<parameter name="framesize" value="64" />
<parameter name="GlobalSync" value="16 kHz" />
<parameter name="LayoutMethod" value="floating" />
19
Embedded Conference Scandinavia 2012-10-03
NoC Types and NoC Kinds
1D NoC– 1D Layout
The Mesh NoC Type
2D Mesh – 2D Layout 3D Mesh – 3D Layout
20
Embedded Conference Scandinavia 2012-10-03
Hardware Configuration
<node nr="0"
mem_size="8192"
jtag="no"
perf_counter="no"
pio="{o,1}"
noc_irq="no"
cpu="{nios,tiny,fpu}"/>
• A node may have multiple CPUs, PIOs and Memories
• Calc2HW
– Resources as HW accellerators instead of soft-cores
21
Embedded Conference Scandinavia 2012-10-03
ForSyDe/NoC gen Design Flow
22
Embedded Conference Scandinavia 2012-10-03
Experiments and DemosRing message passing Neural Network 800 neurons
2772992 (70%)17625 (15%)22992 (20%)27870 (24%)2D NoX 2x4 –
40KB mem per core
2772992 (70%)18019(16%)
25180 (22%)30058 (26%)3D NoC 2x2x2 –40KB mem per
core
Total memory bits
Dedicated logic
registers
Total comb. functions
Total logic elements
NOC Architecture
23
Embedded Conference Scandinavia 2012-10-03
Conclusions
• Novel Design Flow for Embedded System targeting Multi-core FPGA platforms
– Flexible and customizable platform generator
• The programming model based on MoCs is independent from the number of cores
– The network handles the communication.
– System can be implemented without RTOS and it is possible to predict the buffer size
• MoCs theory allows optimization methods
– Design exploration
– Mapping and scheduling algorithms
24
Embedded Conference Scandinavia 2012-10-03
Thanks for your attention!
Do you have any Questions?
https://forsyde.ict.kth.se/
Francesco Robino
frobino@kth.se
Seyed Hosein Attarzadeh Niaki
shan2@kth.se
Johnny Öberg
johnnyob@kth.se
Ingo Sander
ingo@kth.se