Parallel accelerator project

25
Parallel accelerator project Final presentation Summer 2008 Student Vitaly Zakharenko Supervisor Inna Rivkin Duration semester

description

Parallel accelerator project. Final presentation Summer 2008 Student Vitaly Zakharenko Supervisor Inna Rivkin Duration semester. System functionality Large picture. Multiple signal sources share the same media. Each source produces a periodic pulse sequence in the media. - PowerPoint PPT Presentation

Transcript of Parallel accelerator project

Page 1: Parallel accelerator project

Parallel accelerator project

Final presentationSummer 2008

Student Vitaly ZakharenkoSupervisor Inna Rivkin Duration semester

Page 2: Parallel accelerator project

System functionality Large picture

◦ Multiple signal sources share the same media.◦ Each source produces a periodic pulse sequence in

the media. ◦ Observer of the media senses superposed pulse

sequences with the addition of noise. ◦ Preprocessor detects pulses in the signal and

stores each pulse as pulse TOA (time of arrival). ◦ The pulse TOA array produced by the preprocessor

is conveyed to the system.

◦The system separates pulses into original signals (i.e. into periodic pulse sequences).

Page 3: Parallel accelerator project

Signal produced by source # 1

Signal produced by source # 2

Signal as seen by observer

TOA1 TOA2 TOA3 TOA4 TOA5 TOA6 TOA7 TOA8 TOA9 TOA10 TOA11

TOA1 TOA2 TOA3 TOA4 TOA5 TOA6 TOA7 TOA8 TOA9

Data structure for signal representation

Missing pulse effect Missing pulse effect

TOA1 TOA2 TOA3 TOA4 TOA5 TOA6 TOA7 TOA8 TOA9

System output : pulses separated by source

Page 4: Parallel accelerator project

System components

SimulatorOn a PC constructs datagrams.

Datagram switchOn the FPGA manages flow of datagrams between the simulator and the processing units.

Data processing unitsOn the FPGA each unit processes datagrams.

Page 5: Parallel accelerator project

Main system components

Simulator

Switch

Processing unit

Processing unit

Processing unit

Processing unit

Processing unit

Processing unit

FPGA

PC

Page 6: Parallel accelerator project

Data processing unitsEach unit contains Nios II processor and C2H generated H/W accelerators.

Sequence search C2H generated accelerator

Histogram builder C2H generated accelerator

Nios II embedded processor

Avalon switchfabric

Avalon switchfabric

Page 7: Parallel accelerator project

Data processing algorithm

for {level} := 1 up to {maximum level} do 1. Build histogram of differences (SDIF) of level:= {level}.2. Add SDIF to cumulative histogram (CDIF).

3. Find lowest periodicity column of CDIF above threshold.4. if {column found} = TRUE then

4.1. Detect all pulse sequences of the periodicity.4.2. Mark pulses as associated.

end if 5. Check whether to break the loop.

end for

Page 8: Parallel accelerator project

Source 1 signal

Source 2 signal

Source 3 signal

Observed signal

a b c a b c a b c a b c a b c

Data processing example

Page 9: Parallel accelerator project

Observed signal

a b c a b c a b c a b c a b c

c

ab

SDIF(LEVEL = 1) CDIF

c

ab

CDIF

Cumulative histogram (CDIF) update

Data processing example

Page 10: Parallel accelerator project

c

ab

CDIF

Threshold crossing check

Threshold function

No periodicity candidateNo sequence search

Data processing example

Page 11: Parallel accelerator project

Observed signal

a b c a b c a b c a b c a b c

a+b c+a b+c

ca b

CDIF

Cumulative histogram (CDIF) update

b+cc+a

a+b

SDIF(LEVEL = 2)

ca b

CDIF

b+cc+a

a+b

Data processing example

Page 12: Parallel accelerator project

Threshold crossing check

No periodicity candidateNo sequence search

Threshold function

ca b

CDIF

b+cc+a

a+b

Data processing example

Page 13: Parallel accelerator project

Observed signal

a b c a b c a b c a b c a b c

a+b c+a b+c

a+b+c

Cumulative histogram (CDIF) update

ca b

CDIF

b+cc+a

a+b

SDIF(LEVEL = 3)

a+b+c

a+b+c

ca b

CDIF

b+cc+a

a+b

Data processing example

Page 14: Parallel accelerator project

Threshold crossing check

Threshold function

Search for all sequences of periodicity (a+b+c)

a+b+c

ca b

CDIF

b+cc+a

a+b

Threshold satisfied by periodicity (a+b+c)

Data processing example

Page 15: Parallel accelerator project

Detected sequence # 1

Data processing example

Detected sequence # 2

Detected sequence # 3

Sequence search results (final results)

Page 16: Parallel accelerator project

Input datagram format

TOA 1

IDControl Bits Len

TOA 2

... TOA N

64 bits

Page 17: Parallel accelerator project

Output datagram format

Control fields set Length IDTotal pulses associated Total sequences detected

Association of pulse 1Association of pulse 2…Association of pulse N

Total pulses associated with sequence 1 PRI of sequence 1Jitter of sequence 1Confidence level 1 of sequence 1Confidence level 3 of sequence 1

PRI of sequence 2…

2 2

4

4

4

2

4

2

1

1

… 1

4 4

4 …

Field name Size (bytes)

Page 18: Parallel accelerator project

Implementation for Nios II Testing and profiling

• In Visual Studio (VS) floating point calculations were replaced by fixed point

• C code of the algorithm was ported from VS to Nios IDE

• Algorithm was profiled on Nios II

Page 19: Parallel accelerator project

SoPC system generation

H/w design was generated inAltera SoPC Builder environment

Page 20: Parallel accelerator project

Different SoPC system configurations were compared

SoPC system was optimized ◦multiple clock domains were provided

for◦interconnect was minimized◦different processor types were

compared

SoPC system generation

Page 21: Parallel accelerator project

C2H Acceleration C2H h/w accelerators were

generated for two blocks of the algorithm: ◦Sequence search function (FindSeqs) ◦Histogram builder function

(BuildHist)

Page 22: Parallel accelerator project

C2H acceleratorsPerformance optimization

Sequence search (FindSeqs) function acceleration◦Accelerator results unsatisfactory◦Consumes great amount of FPGA

logic ◦Low acceleration gain (X4 at most)◦Discarded after much efforts wasted

in optimization

Page 23: Parallel accelerator project

C2H acceleratorsPerformance optimization

Sequence search (BuildHist) function acceleration◦Good acceleration results ◦X50 acceleration gain◦Moderate FPGA logic consumption

Page 24: Parallel accelerator project

Design performanceFPGA resources

6% logic consumption 5% memory

consumption

Page 25: Parallel accelerator project

Design performance Timing

1 up to 7 ms processing time3 Nios systems significantly

outperform Pentium 4 processor