ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

31
School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico. 1 2011/07/2 Presenter: Dr. Alejandro Castillo Atoche Aggregation of Parallel Computing and Hardware/Software Co-Design Techniques for High- Performance Remote Sensing Applications

Transcript of ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

Page 1: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico. 1

2011/07/25

Presenter: Dr. Alejandro Castillo Atoche

Aggregation of Parallel Computing and Hardware/Software Co-Design Techniques

for High-Performance Remote Sensing Applications

IGARSS’11

Page 2: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

2

Outline

Introduction Previous Work HW/SW Co-design

Methodology Case Study: DEDR-related RSF/RASF Algorithms

Systolic Architectures (SAs) as Co-processors Integration in a Co-design scheme

New design Perspective: Super-Systolic Arrays and VLSI architectures

Hardware Implementation Results Performance Analysis

Conclusions

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 3: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

3

Introduction: Radar Imagery, Facts

The advanced high resolution operations of remote sensing (RS) are computationally complex.

The recently development remote sensing (RS) image reconstruction/ enhancement techniques are definitively unacceptable for a (near) real time implementation.

In previous works, the algorithms were implemented in conventional simulations in Personal Computers (normally MATLAB), in Digital Signal Processing (DSP) platforms or in Clusters of PCs.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 4: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

4

Introduction: HW/SW co-design, Facts

Why Hardware/software (HW/SW) co-design?

The HW/SW co-design is a hybrid method aimed to increase the flexibility of the implementations and improvement of the overall design process.

Why Systolic Arrays?

Extremely fast.

Easily scalable architecture. Why Parallel Techniques?

Optimize and improve the performance of the loops that generally take most of the time in RS algorithms.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 5: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

5

MOTIVATION

First, novel RS imaging applications require now a response in (near) real time in areas such as: target detection for military purpose, tracking wildfires, and monitoring oil spills, etc.

Also, in previous works, virtual remote sensing laboratories had been developed. Now, we are intended to design efficient HW architectures pursuing the real time mode.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 6: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

6

CONTRIBUTIONS:

First, the application of parallel computing techniques using loop optimization transformations generates efficient super-systolic arrays (SSAs)-based co-processors units of the selected reconstructive SP subtasks.

Second, the addressed HW/SW co-design methodology is aimed at an efficient HW implementation of the enhancement/reconstruction regularization methods using the proposed SSA-based co-processor architectures.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 7: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

9

Algorithmic ref. Implementation

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 8: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

10

Algorithmic ref. Implementation

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 9: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

11School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

SNR[dB]

RSF Method RASF Method

IOSNR [dB] IOSNR [dB]

5 4.36 7.94

10 6.92 9.75

15 7.67 11.36

20 9.48 12.72

Algorithmic ref. Implementation

Page 10: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

12

Partitioning Phase

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 11: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

13

Aggregation of parallel computing techniques

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

We consider a number of different parallel optimization techniques used in high performance computing (HPC) in order to exploit the maximum possible parallelism in the design:

Loop unrolling, Nested loop optimization, Loop interchange. Tiling

Page 12: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

14

Aggregation of parallel computing techniques

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

CASE STUDY: Matrix Vector Multiplication

The Matrix Vector multiplication operation is described by the following sum:

where,

a: is the input matrix of dimensions mxn

v: is the input vector of dimensions nx1

u: is the results vector of dimensions mx1

n

jjiji vau

0

i: index variable with range 0 to m

Page 13: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

15

Aggregation of parallel computing techniques

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

CASE STUDY: Matrix Vector Multiplication

The matrix vector multiplication is usually implemented in sequential programming languages such as C++ as:

for (i=0; i < m; i++) {

u[i] = 0;

for (j=0; j < n; j++) {

u[i] = u[i] + a[i][j]*v[j];

}

}

To find out if we can speed up this algorithm, first we need to re write it in such a way that we can see all of its data dependencies. For this purpose, we use single assignment notation.

n

jjiji vau

0

Inputs: a[i,j] = A[i,j] : 0 <= i < m 0 <= j < n v[j] = V[j] : 0 <= j < nOutputs: U[i] = u[i] : 0 <= i < m

Page 14: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

16

Aggregation of parallel computing techniques

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

CASE STUDY: Matrix Vector Multiplication First, we assign each operation in the Matrix

Vector multiplication algorithm, a location in the space called the Index Space as the one shown on the right. We also re write the algorithm in such a way that we can assign a coordinate in this Index Space to each operation.

This operation is called Index Matching.

for (i=0; i < m; i++) {

u[i][0] = 0;

for (j=0; j < n; j++) {

S(i,j): u[i][0] = u[i][0] + a[i][j]*v[0][j];

}

}

NOTE: The algorithm has not been changed in any way, the addition of coordinate [0] has no effect with respect to the previous form of the algorithm.

Index Space

i

j

Inputs: a[i,j] = A[i,j] : 0 <= i < m 0 <= j < n v[0][j] = V[j] : 0 <= j < nOutputs: U[i] = u[i][0] : 0 <= i < m

Page 15: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

17

Aggregation of parallel computing techniques

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

CASE STUDY: Matrix Vector Multiplication -> Single Assignment Stage

Now that each operations is assigned to a single point in the Index Space, we can re write the algorithm such that variable assignments occur only once for each coordinate in the Index Space.

for (i=0; i < m; i++) { u[i][0] = 0; for (j=0; j < n; j++) { u[i][j+1] = u[i][j] + a[i][j]*v[0][j]; } } In this version of the algorithm, one

variable assignment is done for each point (PE) in the index space, please note that the input vector must be seen by all the PEs in order to perform its correct operation.

Inputs: a[i,j] = A[i,j] : 0 <= i < m 0 <= j < n v[0][j] = V[j] : 0 <= j < nOutputs: U[i] = u[i][j+1] : 0 <= i < m

Index Space

i

jv[0][0] v[0][1] v[0][2] v[0][3]

a[0][0]

a[1][0]

a[2][0]

a[3][0]

a[4][0]

0

0

0

0

0

a[3][1] a[3][2] a[3][3] a[3][4]

v[0][4]

Page 16: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

Aggregation of parallel computing techniques

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

CASE STUDY: Matrix Vector Multiplication -> Broadcast Removal

Having a signal being broadcast, implies large routing resources and big drivers which can translate into large amounts of buffers being inserted in the final circuit. To avoid this, we remove the variable being broadcast by passing the variable through each of the PEs.

for (i=0; i < m; i++) { u[i][0] = 0; for (j=0; j < n; j++) { u[i][j+1] = u[i][j] + a[i][j]*v[i][j]; v[i+1][j] = v[i][j]; } } This form of the algorithm does not only

complies with the single assignment requirement but it also has locality, this is, it only depends on data from its neighbors. This graph is also called a Dependency Graph (DG).

Inputs: a[i,j] = A[i,j] : 0 <= i < m 0 <= j < n v[0][j] = V[j] : 0 <= j < nOutputs: U[i] = u[i][j+1] : 0 <= i < m

Index Space

i

jv[0][0] v[0][1] v[0][2] v[0][3]

a[0][0]

a[1][0]

a[2][0]

a[3][0]

a[4][0]

0

0

0

0

0

a[3][1] a[3][2] a[3][3] a[3][4]

v[0][4]

U[0]

U[1]

U[2]

U[3]

U[4]

Page 17: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

Aggregation of parallel computing techniques

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

CASE STUDY: Matrix Vector Multiplication -> Scheduling

Now, lets see how the algorithm works in time, look carefully at the animation at the right.

Index Space

i

j

v[0][0]

a[0][0]

0

a[1][0]

0

v[0][1]

a[0][1]

a[2][0]

0

a[1][1]

v[0][2]

a[0][2]

a[3][0]

0

a[2][1]

a[1][2]

v[0][3]

a[0][3]

a[4][0]

0

a[3][1]

a[2][2]

a[1][3]

v[0][4]

a[0][4]

a[4][1]

a[3][2]

a[2][3]

a[1][4]

a[4][2]

a[3][3]

a[2][4]

a[4][3]

a[3][4]

a[4][4]

U[0]

U[1]

U[2]

U[3]

U[4]

We have identified that in this processor array, it only takes 9 time cycles to run the entire matrix vector multiplication algorithm and that for each time cycle the maximum number of processors being used is 5.

If we are only using a maximum of 5 processors, why should we build an array of 25!!?

Page 18: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

Aggregation of parallel computing techniques

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

CASE STUDY: Matrix Vector Multiplication -> Allocation

The circuit could operate with 5 processors.

0

a[4][0]

a[3][1]

a[1][3]

a[2][3]

a[0][4]

v[0][4]i

j

Index Space

v[0][0]

a[0][0]

0

a[1][0]

0

v[0][1]

a[0][1]

a[2][0]

0

a[1][1]

v[0][2]

a[0][2]

a[3][0]

0

a[2][1]

a[1][2]

v[0][3]

a[0][3]

a[4][0]

0

a[3][1]

a[2][2]

a[1][3]

v[0][4]

a[0][4]

a[4][1]

a[3][2]

a[2][3]

a[1][4]

a[4][2]

a[3][3]

a[2][4]

a[4][3]

a[3][4]

a[4][4]

U[0]

U[1]

U[2]

U[3]

U[4] U[4]

U[3]

U[2]

U[1]

U[0]

v[0][0]

a[0][0]

0

a[4][1]

a[2][2]

a[3][2]

a[1][4]

a[4][2]

a[3][3]

a[2][4]

a[4][3]

a[3][4]

a[4][4]

0

a[1][0]

a[0][1]

v[0][1]

0

a[2][0]

a[0][2]

a[1][1]

v[0][2]

a[2][1]

a[1][2]

0

a[3][0]

a[0][3]

v[0][3]

[1 0]

Matrix Vector Algorithm with projection [1 0]

P0

P1

P2

P3

P4

Page 19: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

Aggregation of parallel computing techniques

CASE STUDY: Matrix Vector Multiplication ->

Space-Time mapping

In this table we can see which processor is being used for each instant t.

Now, if we plot the information in the table into a [t,p] axis, we can see that the polytope defined by this selection table is bounded by the inequations: p>= 0, p>= t-n, p <=t and p<=m in the following relation:

lower bound of p:

p >= max(0,t-n)

upper bound of p:

p <= min(m,t)

for ALL t

Page 20: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

Aggregation of parallel computing techniques

CASE STUDY: Matrix Vector Multiplication ->

Space-Time mapping If we analyze the transformations we did on

our index space, as well as describe the scheduling for the new function, we have that:

j

ip

j

it

where, • p is the position of the processing element in the

transformed algorithm.• t is the time at which a processor in a given

coordinate is activated in the transformed algorithm.

piij

ip

01

ptitjjij

it

11

• We can re write the algorithm doing the proper substitutions as:

from here,

for (t=0; t < (m+n)-1; t++){ forALL(p=max(0,t-(n-1));p≤min(m-1,t);p++){ u[p,t-p+1] = u[p,t-p] + a[p,t-p]v[p,t-p] v[p+1,t-p] = v[p,t-p] }}

for (i=0; i < m; i++) { u[i][0] = 0; for (j=0; j < n; j++) { u[i][j+1] = u[i][j] + a[i][j]*v[i][j]; v[i+1][j] = v[i][j]; } }

Page 21: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

Aggregation of parallel computing techniques

CASE STUDY: Matrix Vector Multiplication ->

Tiling + Strip Mining Lets say we build an array of 4 processors

and we want to solve a 10x10 matrix multiplied by a 10x1 vector. How can we solve a problem like this if we only have 4 processors?

Page 22: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

24

Integration in a HW/SW Co-design scheme

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 23: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

25

New Perspective:Super-Systolic Arrays

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Super-Systolic Arrays is a network of systolic cells in which each cell is also conceptualized in another systolic array in a bit-level fashion.

The bit-level Super-Systolic architecture represents a High-Speed Highly-Pipelined structure than can be implemented as coprocessor unit or inclusive stand-alone VLSI ASIC architecture.

Page 24: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

26School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

1 0 Td

Matrix-vector DG

Mappingtransformation

1 1Π

Hyper-planes

03a 13a 23a 33a

02a 12a 22a 32a

01a 11a 21a 31a

00a 10a 20a 30a

0 0 0 0

03x

02x

01x

00x

0y 1y 2y 3y

For n=m=4

33 23 13 03 0 0 0a a a a

33 23 13 03 0 0a a a a

33 23 13 03 0a a a a

33 23 13 03a a a a

D

D

D

D

D

D

D

D

y

3P

2P

1P

0P

Data-Skewed

Matrix-VectorProcessor Array

(PA)

1 0d

Bit-level Multiply-Acumulate DG

Mappingtransformation

1 2Π

001

a

04

x

For m=4

03

x

02

x

01

x

002

a 003

a 004

a

01

P

02

P

03

P

04

P 05

P 06

P0

7

P Bit-levelArray of PEsfor Processor

2D

01

x00 00 002 1

m

a a a

D

2D

02

x

D

0m

x2D

D

P

0P

FPGA-based Super-Systolic Architecture

Page 25: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

27

Bit-level SSA design on a high-speed VLSI architecture

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

um,1

0

‘0’

D

A∑

Ci Co

B

Q

D Q

D QD Q

um,1

1

‘0’

D

A∑

Ci Co

B

Q

D Q

D QD QBit-Level F

a

a

a

a

b

b

ci

b

b

co

a b

ci

ci

a b ci

a

b

ci

a

b

ci

so

Page 26: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

28

Bit-level SSA design on a high-speed VLSI architecture

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

The chip was designed using a Standard Cell library in a 0.6µm CMOS process. The resulting integrated circuit core has dimensions of 7.4 mm x 3.5 mm. The total gate count is about 32K using approximately 185K transistors. The 72-pin chip will be packaged in an 80 LD CQFP package.

Page 27: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

29

Performance Analysis: VLSI

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Function Complexity For m = 32AND m x m 1024Adder (m + 1) x

m1056

Mux M 32Flip-Flop [(4m + 2) x

m] + m4160

Demux M 32

Page 28: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

30

Performance Analysis: FPGA

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

30

Page 29: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

31

Conclusions

The principal result of this reported study is the aggregation of parallel computing with regularized RS techniques in Super-Systolic Arrays (SSAs) architectures which are integrated via the HW/SW co-design paradigm in FPGA or VLSI platforms for the real time implementation of RS algorithms.

The authors consider that with the bit-level implementation of specialized SSAs of processors in combination with VLSI-FPGA platforms represents an emerging research field for the real-time RS data processing for newer Geospatial applications.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Page 30: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

32

Recent Selected Journal Papers

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

A. Castillo Atoche, D. Torres, Yuriy V. Shkvarko, “Towards Real Time Implementation of Reconstructive Signal Processing Algorithms Using Systolic Arrays Coprocessors”, JOURNAL OF SYSTEMS ARCHITECTURE (JSA), Edit. ELSEVIER, Volume 56, Issue 8, August 2010, Pages 327-339, ISSN: 1383-7621, doi:10.1016/j.sysarc.2010.05.004. JCR.

A. Castillo Atoche, D. Torres, Yuriy V. Shkvarko, “Descriptive Regularization-Based Hardware/Software Co-Design for Real-Time Enhanced Imaging in Uncertain Remote Sensing Environment”, EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING (JASP), Edit. HINDAWI, Volume 2010, 31 pages, 2010. ISSN: 1687-6172, e-ISSN: 1687-6180, doi:10.1155/ASP. JCR.

Yuriy V. Shkvarko, A. Castillo Atoche, D. Torres, “Near Real Time Enhancement of Geospatial Imagery via Systolic Implementation of Neural Network-Adapted Convex Regularization Techniques”, JOURNAL OF PATTERN RECOGNITION LETTERS, Edit. ELSEVIER, 2011. JCR. In Press

Page 31: ARCHAEOLOGICAL LAND USE CHARACTERIZATION USING MULTISPECTRAL REMOTE SENSING DATA

33

Thanks for your attention.

School of Engineering, AutonomousUniversity of Yucatan, Merida, Mexico.

Dr. Alejandro Castillo Atoche

Email: [email protected]