EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing...

26
EuroPVM/MPI 2003. Venice, September 29 – Oct ober 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín José Manuel Rodríguez García Jesús María Álvarez Llorente Juan Luis García Zapata Departamento de Informática Universidad de Extremadura SPAIN

Transcript of EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing...

Page 1: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Porting P4 to Digital Signal Processing Platforms

Juan Antonio Rico GallegoJuan Carlos Díaz Martín

José Manuel Rodríguez GarcíaJesús María Álvarez Llorente

Juan Luis García Zapata

Departamento de InformáticaUniversidad de ExtremaduraSPAIN

Page 2: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Index

2

I. Introduction and goals

II. IDSP: A Distributed Framework for DSPs

III. Implementing the P4 functionality upon IDSP

IV. Measuring the P4 Overhead

V. Conclusions

VI. Current and Future Work

Page 3: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Fields of application:• Communications• Voice and Data Compression• Mobile Telephony• Speech Processing• Image and Video Processing• Medical • more ...

Introduction and goals

DSP processors show specialized architectures to run real-time digital signal processing

Page 4: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Sundance SMT310Q PCI carrier board with four TI C6201 DSPs

Nets of DSP multi-computers such as those from Sundance™, Motorola™ or Hunt Engineering™.

Introduction and goals

4

Target machines

Page 5: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Introduction and goals

5

Target machines

• 150-MHz. Capable of delivering 900 MFLOPS

• 16 or 32 MBytes of 100 MHz SDRAM

• 64 Kbytes of CACHE / internal RAM

• 128K Bytes of flash programmable and erasable ROM

• No MMU for virtual memory management

The Texas Instrumens C6000 family of DSPs:

• Very limited resources

• Targeted to embedded systems

Page 6: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Introduction and goals

6

High Computational Complexity and Real Time requirements

A distributed programming standard like MPI is needed

MPIMPICurrent DSP software poses the portability problem:• Platform specific• Provides only low level communication libraries• Poor support to build portable parallel applications

Most applications can be decoupled and distributed among two or more processors

Page 7: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

DSP/BIOS. Texas Instruments Kernel for C6000 family of DSP processors (21 Kb)

IDSP: A Distributed Framework for DSPs

7

Thread Synchronization: SEM_pend SEM_post

Thread Management: TSK_create TSK_delete

Timing services: CLK_gethtime

Tracing and Analysis

Page 8: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

IDSP: A Distributed Framework for DSPs

8

IDSP. Our own development.It extends DSP/BIOS with distributed facilities (30 Kb)

IDSP runs on• DSK (1 x C6000)• Sundance Multicomputer SMT310Q (4 x C6000)

C6000

DSP/BIOS

C6000

DSP/BIOS

C6000

DSP/BIOSIDSP

distributed DSP application

Thread P2P Communication: COMM_send COMM_recv COMM_asend COMM_arecv COMM_wait COMM_test ...

Thread Management: OPER_create OPER_destroy GROUP_create GROUP_destroy

Page 9: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2 9

IDSP: A Distributed Framework for DSPs

An IDSP application is a group of operators communicating by message passing

oper

1

oper

2

oper

3

input stream 1

input

stream 2

output

stream

oper

4

oper

5

An operator is a thread that runs an algorithm: FFT, etc

IDSP address• Machine• Group• Operator• Port

Page 10: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2 10

IDSP: A Distributed Framework for DSPs

IDSP shows a microkernel architecture:

Algorithm operator

P4 address mapper

RPC System Servers

I/O Server

Group Server

Operator Server

GROUP_

CIO_ OPER_

• System servers operators

Software BusKernel

COMM_

• A message passing kernel

Page 11: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2 11

Implementing the P4 functionality upon IDSP

DSP/BIOSC6000

DSP/BIOSC6000

DSP/BIOSC6000

IDSP

• We have put P4 on top of IDSP:

• MPICH is a portable implementation of MPI:

MPI

P4ADI

• It shows a three layers design:1. MPI macros2. Abstract Device Interface3. Channel Interface, being P4 a well known example

Page 12: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Implementing the P4 functionality upon IDSP

12

The P4 re-entrancy problem

• P4 is process based:

Operating system

P4 library P4 library P4 library

Processes

• IDSP is thread based

IDSP

A thread safe version of P4 has been built by:

Modified P4 library

Threads

Putting P4 global variables in IDSP threads private zone Using mutual exclusion mechanisms

Page 13: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Implementing the P4 functionality upon IDSP

13

Communication network

• IDSP provides its own addressing scheme

DSP/BIOS

C6000

DSP/BIOS

C6000

DSP/BIOS

C6000

IDSPIDSP address

P4 IP address

sockets

• P4 is based upon TCP/IP Berkeley sockets, but

We have done IDSP/Sockets, a thin and efficient implementation of Berkeley Sockets atop IDSP

IDSP/

Page 14: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

User Operator

Address Mapping Server

User Operator

Idsp_addr Ip_addr

Idsp_addr Ip_addr

receiver sender

Implementing the P4 functionality upon IDSP

14

The IP/IDSP mappingp4_send(rank, ...)

Every user operator keeps a cache of addresses

Register (idsp_addr, ip_addr)

Idsp_addr Ip_addr

1

Idsp_addr =

3

2

Get(ip_addr )

send(IP_address, ...)COMM_send(IDSP_address, ...)

4

Page 15: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Implementing the P4 functionality upon IDSP

15

Signals

• DSP/BIOS does not provide signals !!!

IDSP takes advantage of this principle for supporting the UNIX signal mechanism:

1. A special message is sent to the target thread2. The target thread receive these message on next

socket read

DSP involved threads, however, exhibits a quite frequent interaction with the kernel for data I/O

• P4 uses UNIX signals for time-outs and process management, but ...

Page 16: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Implementing the P4 functionality upon IDSP

16

The startup process

But embedded systems don’t use disks !!

The IDSP approach is as follows:1. Every operator has a well known integer identifier2. A limited number of operators is linked3. GROUP_create takes an array of operator identifiers4. Currently, it assigns each operator to the least loaded

machine

P4 uses a text file specifying program files and machines:Local 0Sun2 1 /home/user/P4pgms/sun/prog1Sun3 2 /home/user/P4pgms/sun/prog2rs6000 1 /home/user/P4pgms/rs6000/prog1

Page 17: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Measuring the P4 Overhead

17

0,030,035

0,040,045

0,050,055

0,060,065

0,07

100 200 300 400 500 600 700 800 900 1000

Size (bytes)

Tim

e (m

s)

Time (ms) BSD

Time (ms) IDSP

Time to send short messages between two operators

Overhead of the socket interface on IDSP

send

COMM_send

Page 18: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Measuring the P4 Overhead

18

0

0,1

0,2

0,3

0,4

0,5

100 200 300 400 500 600 700 800 900 1000

Size (bytes)

Tim

e (m

s)

Time (ms) P4

Time (ms) IDSP

P4_send

COMM_send

Time to send short messages between two operators

Overhead of P4 interface on IDSP

Page 19: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Conclusions

19

• IDSP, a message passing interface for DSPs, has been defined

and implemented

• The IDSP performance in the TI C6000 DSP architecture is

currently reasonably good (50µs for short messages)

• We have been able of supporting P4 upon the small IDSP

interface

• P4 performance upon IDSP is good, but not good enough for

high performance distributed digital signal processing

• A more tuned channel interface layer is needed for DSPs

Page 20: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Current and Future Work

20

• IDSP is currently been augmented with MPI-like p2p primitives

such as COMM_waitany, etc.

• A DSP specific channel interface layer will be developed.

• The ADI and MPI will be supported by such layer.

• The 64 bits C6400 family will be faced soon.

Page 21: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2 21

Thank you very much !

Page 22: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2 22

Thank you very much !

Page 23: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Implementing the P4 functionality upon IDSP

23

Groups• MPI implement the concept of group• IDSP have a different concept of group

¿How is this managed?

Groups and processes in a MPI application runs in the context of an IDSP group IDSP

group MPI application

MPI group

MPI group

MPI group

Page 24: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2

Implementing the P4 functionality upon IDSP

24

Listener process• P4 uses an auxiliary process for doing background work• IDSP have not an auxiliary thread

¿How do IDSP does this work?

• Doing this background work• Sending initial information for threads to run (threads

have not parameters at startup)

We use an asynchronous communicator for

Additional Port

Communication Port

Operator

SEND

RECEIVE

CONNECTION_REQ

DIEINITIAL_INFO

Page 25: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2 25

- Un thread IDSP corre un algoritmo en un sentido diferente que un proceso MPI/P4, que corren todos el mismo programa -

Page 26: EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

EuroPVM/MPI 2003. Venice, September 29 – October 2 26

User Operator

Address Mapping Server

User Operator

Idsp_addr Ip_addr

Idsp_addr Ip_addr

receiver sender

Implementing the P4 functionality upon IDSP

26

The IP/IDSP mapping

• P4 maps process ranks into IP addresses

Every user operator keeps a cache of addresses

Register (idsp_addr, ip_addr)

Idsp_addr Ip_addr

1

Idsp_addr =

3

2

Get(ip_addr )

• IDSP/Sockets maps IP addresses into IDSP addresses: