Andrea Michelotti - Atmel Toolchain Overview & Demo 1/20 hArtes 2010 Final Review Toolchain Overview...
-
Upload
percival-mosley -
Category
Documents
-
view
228 -
download
1
Transcript of Andrea Michelotti - Atmel Toolchain Overview & Demo 1/20 hArtes 2010 Final Review Toolchain Overview...
1/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
hArtes 2010 Final Review
Toolchain Overview & Demo
Andrea Michelotti, Atmel
WP2 Leader
2/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Agenda• Key Achievements
• Developing an Application for a Heterogeneous Platform
– Initialization
– Sharing resources
– Calling a DSP function
– Targeting Execution
– Expressing parallelism
• Conclusions
3/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Key Achievements (1)
The hArtes toolchain…
• dramatically minimizes learning curve
• hides heterogeneous complexity: no expert knowledge of the platform,
tools or new programming languagesuse ‘C’ or tools that produce ‘C’
(Scilab/Nutech)
• automatically speeds-up application (in respect to GPP-only execution) by exploiting Processor Element capabilities
4/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Key Achievements (2)
Easily retarget new platforms:basic OMAP porting took ~1 monthother popular platforms like GPUs can be
targeted as wellmust conform to Molen machine
Quickly evaluate how an application behaves on different architectures
important for time to market
5/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Targeted Platforms/Architectures
Heterogeneous Platforms:Atmel DEB (Diopsis Evaluation Board):
ARM9 GPP + mAgicV DSP
UniFE’s hArtes HW Platform: ARM9 GPP + mAgicV DSP + Xilinx FPGA
Scaleo’s hArtes Emulation Platform: ARM9 GPP + mAgicV DSP + Altera FPGA
TI Experimenter OMAPL138: ARM9 GPP + C67 TI DSP
6/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Developing an Application for a Heterogeneous Platform in a nutshell
Before hArtes, toolchains for heterogeneous hardware (Diopsis and OMAP) consist of separate subchains: one for GPP and one for DSP.
MAIN PROBLEMS:
• High learning curve: target tools, architecture, APIs….
• Without knowledge of the underlying software/hardware it’s difficult to use and to benefit from the existence of PEs;
• Code maintainability: two distinct projects must be kept aligned;
• Code portability: usually GPP code contains specific APIs to load, execute and access PEs resources; An identical C-code cannot be produce correct result across all the platforms;
• Debugging: not an unified image, not an unified debugger.
7/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Writing an application for a heterogeneous platform in a nutshell
Suppose we want to port our legacy code on a powerful but heterogeneous architecture like OMAP or DIOPSIS…
void main(){
…
unsigned *shared_array=malloc(SIZE);
…
my_fft(int param,shared_array…);
another_kernel(…);
…
}
C code running on my host PC
Seems easy, but, after a while, becomes a nightmare! …
8/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
InitializationOMAP toolchain (through dsplink)OMAP toolchain (through dsplink)
int main(int argc,char**argv){… if (DSP_SUCCEEDED (status)) { status = PROC_setup (NULL) ; }if (DSP_SUCCEEDED (status)) { status = PROC_attach (processorId, NULL) ; if (DSP_FAILED (status)) { RDWR_1Print ("PROC_attach failed. Status: [0x%x]\n", status) ; } } else { RDWR_1Print ("PROC_setup failed. Status: [0x%x]\n", status) ; } if (DSP_SUCCEEDED (status)) { args [0] = strNumIterations ; { status = PROC_load (processorId, dspExecutable_myfft, NUM_ARGS, args) ; } if (DSP_FAILED (status)) { RDWR_1Print ("PROC_load failed. Status: [0x%x]\n", status) ; } } if (DSP_SUCCEEDED (status)) { status = PROC_start (processorId) ; if (DSP_FAILED (status)) { RDWR_1Print ("PROC_start failed. Status: [0x%x]\n", status) ; } }
Use of specific API to access DSP.
The DSP image is loaded as a file.
The DSP has its own main
GPP code
void myfft();int main(int argc,char**argv){
myfft();}
DSP code
9/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Initialization Diopsis toolchain caseDiopsis toolchain case
int main (int argc, char *argv[]){
....
ret = mAgicV_load_PM("myfft.bin",_m_fd_extm);if(ret!=0) return ret;
ret = mAgicV_load_DM(“myfft_datamem.bin");if(ret!=0) return ret;
ret = mAgicV_load_XM(“myfft_extmem.bin",(0x365890));if(ret!=0) return ret;
ret = mAgicV_init_PMU();if(ret!=0) return ret; magicV_start();..magicV_wait();.... }
GPP code
void myfft();
int main(int argc,char**argv){…myfft();…}
Very low API to access DSP.
The DSP image is binary loaded as a file. The
DSP has its own main
DSP code
10/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Initialization hArtes toolchain casehArtes toolchain case
int main (int argc, char *argv[]){
....}
GPP/Application codeThe hArtes Toolchain
and hArtes Runtime take care of hiding all the initialization details.
JUST one code.
11/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Sharing resources OMAP toolchain caseOMAP toolchain case Use of specific API
to create a shared area, translate the address for DSP and then pass the translated address
to DSP via messages.
Memory Layout must be configured
by recompiling drivers!!
int main(int argc,char**argv){ SMAPOOL_Attrs poolAttrs poolAttrs.bufSizes = (Uint32 *) &size ; poolAttrs.numBuffers = (Uint32 *) &numBufs ; poolAttrs.numBufPools = NUM_BUF_SIZES ; poolAttrs.exactMatchReq = TRUE ; volatile unsigned* my_shared_array; status = POOL_open (POOL_makePoolId(processorId, SAMPLE_POOL_ID), &poolAttrs) ; if (DSP_FAILED (status)) { MPCSXFER_1Print ("POOL_open () failed. Status = [0x%x]\n", status) ; } } if (DSP_SUCCEEDED (status)) { status = POOL_alloc (POOL_makePoolId(processorId, SAMPLE_POOL_ID), &my_shared_array, SIZE, DSPLINK_BUF_ALIGN)) ; /* Get the translated DSP address to be sent to the DSP. */ if (DSP_SUCCEEDED (status)) { status = POOL_translateAddr ( POOL_makePoolId(processorId, SAMPLE_POOL_ID), &dspCtrlBuf, AddrType_Dsp, (Void *) &my_shared_array_from_dsp, AddrType_Usr) ;
GPP code
int main(int argc,char**argv){
DSPlink_init();….}
DSP code
12/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Sharing resources Diopsis toolchain caseDiopsis toolchain case
Very raw access, sharing directly
addresses that are manually mapped,
using local copy and write back.
Use of specific APIs and
Specific compiler directives.
NO LINKER used, many problems of source alignment,
debug
#define MY_SHARED_ARRAY_ADDR 2
int main(int argc,char**argv){....unsigned local_my_shared_array[];mAgicV_read_buff(local_my_shared_array,MY_SHARED_ARRAY_ADDR,sizeof(my_shared_array));....// modify local copy, write backmAgicV_write_buff(local_my_shared_array,MY_SHARED_ARRAY_ADDR,sizeof(my_shared_array));…
GPP code
#define MY_SHARED_ARRAY_ADDR 2volatile long chess_storage(DATA:MY_SHARED_ARRAY_ADDR) my_shared_array;volatile long chess_storage(DATA:MY_SHARED_ARRAY_ADDR+SIZEOF(my_shared_array) my_other_variable;
int main(int argc,char**argv){ // access the variable}
DSP code
13/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Sharing resources hArtes toolchain casehArtes toolchain case
Very natural access
int main(int argc,char**argv){
....
Unsigned* my_shared_array;my_shared_array = malloc(MYSIZE);#pragma map call_hw dsp0 dsp_func(my_shared_array);
GPP codeIntuitive and portable.
DSE turns automatically malloc into hmalloc (hArtes API), that allocates and traces memory in a shared physical space of the target platform.
14/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Calling a DSP routineOMAP/Diopsis toolchain casesOMAP/Diopsis toolchain cases
int main(int argc,char**argv){… // initialization, see main
if (DSP_SUCCEEDED (status)) { status = PROC_start (processorId) ; if (DSP_FAILED (status)) { RDWR_1Print ("PROC_start failed. Status: [0x%x]\n", status) ; } }…
GPP code
DSP code
void my_fft(int pp, float*…);
int main(int argc,char**argv){
… // initialization, see mainmy_fft();…}
int main(int argc,char**argv){… // initialization, see main
mAgicV_start()…
GPP code
There is not concept of DSP call from GPP, the GPP can start a DSP process that executes the desired function.
The call can be inefficient or maybe cannot be executed correctly on the DSP. The programmer must know the underlying architecture!
For example the DSP in the DIOPSIS architecture the type int is 16 bit wide. Typically is 32 bit.
15/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Calling a DSP routinehArtes toolchain casehArtes toolchain case
GPP code
void my_fft(…){…}int main(int argc,char**argv){
..#pragma call_hw dsp 0my_fft();..
}
Intuitive and portable.
DSE checks if the my_fft function can be executed on the target DSP (checks parameters, used stack memory). It also estimates the cost of the call to decide if it’s convenient to move the execution on the DSP.
To call the DSP function, the DSE adds a pragma to the function call and generate a C-source that can be compiled by the DSP toolchain.
16/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Expressing Parallelism OMAP/Diopsis toolchainOMAP/Diopsis toolchain
NOT KNOWN/NOT IMPLEMENTED
17/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Expressing Parallelism hArtes toolchain casehArtes toolchain case
Void main(){
…
#pragma omp parallel sections
{
#pragma omp section
{
#pragma call_hw dsp 0
my_fft();
}
#pragma omp section
{
another_kernel(…);
}
}
}
Intuitive and portable.
hArtes supports some openMP construct to express parallelism.
DSE in some case automatically detects kernels that can go in parallel and adds openMP annotations to the c-source.
The parallelism can be also explicited via POSIX threads
#pragma call_hw dsp 0void my_fft(…);
int main(int argc,char**argv){… // initialization, see hthread_create(my_fft()…);Another_kernel();hthread_join();}
18/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Target Execution (under Linux) OMAP/Diopsis toolchainOMAP/Diopsis toolchain
Two separate binaries, not common symbols, not unified debugger, I/O
messages (printf) often relies on jtag connection.
RUN and HOPE!
bash$./my_fft_arm.elf <my_fft_dsp.bin>
19/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Target Execution (under Linux) hArtes toolchainhArtes toolchain
Single ELF binary, common symbols, unified debugger, I/O messages (printf)
on the target.
RUN!
$bash ./my_fft.elf
20/20
An
dre
a M
ich
elo
tti -
Atm
elT
oo
lch
ain
Ove
rvie
w &
Dem
o
Conclusions
• Although the original “Brain to Bit” (B2B) objective was very ambitious, the hArtes toolchain fulfilled its original promise: to support software development of heterogeneous hardware without expert knowledge of the target platform, and therefore allowing developers to achieve high-performance applications through complete automated solutions and by abstracting low-level hardware details.
• Areas of improvement regards mainly data flow analysis, automatic parallelization, AET integration in Eclipse, debugging capabilities.