Platform-independent description of imaging · Computer Science X - System Simulation Group Harald...

32
Computer Science X - System Simulation Group Harald Köstler ([email protected]) Platform-independent description of imaging algorithms H. Köstler Salt Lake City, 17.3.2015 1

Transcript of Platform-independent description of imaging · Computer Science X - System Simulation Group Harald...

Page 1: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Platform-independent

description of imaging

algorithms

H. Köstler

Salt Lake City, 17.3.2015

1

Page 3: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Problems in High Performance Computing

Hardware: Modern HPC platforms are massively

parallel

Intra-core, intra-node, and inter-node

Software: Imaging applications become more complex

with increasing computational power

More complex models

Code development in interdisciplinary teams

Algorithm: Class of different algorithms grows, many of

them are just a general idea (like multigrid)

Components and parameters depend on image, type of problem, …

Page 4: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

High Performance Computing: Applications

real-time imaging e.g. medicine

Large-scalesimulation

e.g. multi-physics

Page 5: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Variational Imaging

General Problem: Find mapping , such that

the energy functional

is minimized.

5

Page 6: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

State of the Art: Application-driven Projects

6

Mathematician

Hardware specialist

Software specialist

User from application field Description of application

Solution method

Parallel implementation and framework

Efficient implementation on specific hardware

Page 7: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Proposed: Domain-driven Projects

7

Mathematician

Hardware specialist

Software specialist

Users from different

application fieldsDescription of application in domain specific language

Automatic selection of algorithmic components

Code generation for specific application

Automatic tuning on specific hardware

Domain knowledgeFeature Model

Domain expert

PDE

Operators::Laplacian(Data::solution) = Data::rhs

Page 8: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Aspects

Code generation for imaging applications

High dynamic range compression

Image registration / Optical Flow

Image segmentation

Image denoising

Domain-specific language design

Domain-specific knowledge representation and optimization

Efficient Algorithms

Parallelization

Performance Tuning

8

Page 9: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

ExaStencils – Overview

• DSL as intuitive

interface to the user

• Automatic deduction of

configuration if desired

• Prediction and

Optimization of the

configuration’s

performance using

SPL and LFA

• Code generation in

Scala

• Automatic hardware-

specific optimizations

11

Page 10: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

ExaStencils – Layers

1210

Page 11: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

DSL Scope

11

dimension 2D 3DDomain UnitSquare UnitCubeSolution Scalar VectorOperator Stencil weak form nonlinearBoundary Dirichlet Neumann periodic

Architecture CPU GPULanguages C++ Scala CUDA OpenCLParallelization MPI OpenMP

discrete domain regularGrid nodebased cellbasedoperator discretization FD FE FVdata types float double complexdiscretization accuracy 2 4

Page 12: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

DSL EXAMPLE

High Dynamic Range compression

12

Page 13: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Application: High Dynamic Range Compression

Data: Siemens AG. Healthcare Sector

Sequences of 2D x-ray images

ComputationalSteering

Visualization withOpenGL

Köstler. et al. Performance engineering to achieve real-time high dynamic range imaging.J Real-Time Image Processing. 2013.

Page 14: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Optimized HDR Compression (size 2048x2048)

0

20

40

60

80

100

120

140

GTX 295/2 GTX 480 GTX 480(wavefront)

fps

half of an NVIDIA GTX 295

112 GB/s peak bandwidth

compute capability 1.3

NVIDIA GTX 480

177 GB/s peak bandwidth

compute capability 2.0 (Fermi)

14

Page 15: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Selection of Multigrid Components: LFA Toolbox

15

Goal: minimize time T to reach prescribed accuracy κ

Asymptotic convergence rates are estimated via LFA

Page 16: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Hardware description (DSL level Hardware)

16

Hardware cpu

bandwidth = 40

peak = 30

cores = 4

Node n

sockets = 2

Page 17: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

HDR Compression

Idea: Modify magnitude of image gradient by position-dependent attenuating function

Energy functional

Solve by multigrid the Euler-Lagrange equation

RR 2:

ICFattal/Lischinski/Werman. Gradient Domain High Dynamic Range Compression. SIGGRAPH. 2002

xdCxuuEu

2

)(min)(

onu

infu

0

17

Page 18: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

What is Multigrid?

Goal: Solve partial differential equation

After discretization one requires an efficient iterative solver for sparse systems

multigrid solver has complexity O(N) in number of unknowns N

18

hh fAu

onu

infu

∂Ω

Page 19: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Problem Description (continuous)

19

Domain omega = UnitSquare

f : omega -> R^1

u : omega -> R^1

Laplacian : ( omega -> R^1 ) -> ( omega -> R^1 )

Laplacian = dx^2 + dy^2

pde : Laplacian [ u ] = f in omega

bc : u = 0 in partial_omega

Page 20: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Generation of discrete problem

Domain-specific knowledge

Discretization methods FD, FV, FE

Types of operators supported result in sparse matrices

Domain-specific optimization chooses type of discretization and e.g. concrete data types

Description is parsed, an abstract syntax tree is constructed and then transformed into a discrete representation of the problem

Code generation framework is implemented in Scala language

20

Page 21: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Problem Description (discrete)

21

Fragments f1 = Regular_Square

Discrete_Domain omega levels 8

xsize [0] = 1024 // from Image

ysize [0] = 1024 // from Image

xsize [l+1] = xsize [l] / 2

ysize [l+1] = ysize [l] / 2

Field<Double,1>@nodes f

Field<Double,1>@nodes u

StencilMatrix<Double,1,1, FD, 2>@nodes Laplacian

Page 22: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Multigrid Algorithm

22

Page 23: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Algorithmic Parameters and Components

23

mgcomponents

solver = multigrid

cycletype=Vcycle

mgparameter

nprae = 2

npost = 1

iters = 10

Page 24: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Layer 4 Functions – Example

24

function VCycle@coarsest ( ) : Unit /* coarse grid solver */

function VCycle@((coarsest + 1) to finest) ( ) : Unit

repeat up 1

Smoother@current ( )

UpResidual@current ( )

Restriction@current ( )

SetSolution@coarser ( 0 )

VCycle@coarser ( )

Correction@current ( )

repeat up 2

Smoother@current ( )

Page 25: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Example: Jacobi Smoother

25

function Smoother@((coarsest + 1) to finest) ( ) : Unit

communicate Solution[0]@(current)

loop over inner on Solution@(current)

Solution[1]@current =

Solution[0]@current

+ ( ( ( 1.0 / diag ( Lapl@(current) ) ) * 0.8 )

* ( RHS@current - Lapl@(current) * Solution[0]@current ) )

Page 26: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Jacobi Smoother – Resulting Code (w/o basic Opt)

26

#include "MultiGrid/MultiGrid.h"

void Smoother_4()

exchsolutionData_4(0);

#pragma omp parallel for schedule(static) num_threads(8)

for (int fragmentIdx = 0; fragmentIdx < 8; ++fragmentIdx)

if (isValidForSubdomain[fragmentIdx][0])

for (int y = iterationOffsetBegin[fragmentIdx][0][1];

y < (iterationOffsetEnd[fragmentIdx][0][1]+17); y +=1)

for (int x = iterationOffsetBegin[fragmentIdx][0][0];

x < (iterationOffsetEnd[fragmentIdx][0][0]+17); x +=1)

slottedFieldData_Solution[1][fragmentIdx][4][(((y*19)+19)+(x+1))] =

(slottedFieldData_Solution[0][fragmentIdx][4][(((y*19)+19)+(x+1))]

+(((1.0e+00/fieldData_LaplCoeff[fragmentIdx][4][((y*17)+x)])*8.0e-01)

*(fieldData_RHS[fragmentIdx][4][((y*17)+x)]

-(((((fieldData_LaplCoeff[fragmentIdx][4][((y*17)+x)]

*slottedFieldData_Solution[0][fragmentIdx][4][(((y*19)+19)+(x+1))])

+(fieldData_LaplCoeff[fragmentIdx][4][(((y*17)+289)+x)]

*slottedFieldData_Solution[0][fragmentIdx][4][(((y*19)+19)+(x+2))]))

+(fieldData_LaplCoeff[fragmentIdx][4][(((y*17)+578)+x)]

*slottedFieldData_Solution[0][fragmentIdx][4][(((y*19)+19)+x)]))

+(fieldData_LaplCoeff[fragmentIdx][4][(((y*17)+867)+x)]

*slottedFieldData_Solution[0][fragmentIdx][4][(((y*19)+38)+(x+1))]))

+(fieldData_LaplCoeff[fragmentIdx][4][(((y*17)+1156)+x)]

*slottedFieldData_Solution[0][fragmentIdx][4][((y*19)+(x+1))])))));

Page 27: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Jacobi Smoother – Resulting Code (w/ basic Opt)

27

#include "MultiGrid/MultiGrid.h"

void Smoother_4()

exchsolutionData_4(0);

#pragma omp parallel for schedule(static) num_threads(8)

for (int fragmentIdx = 0; fragmentIdx < 8; ++fragmentIdx)

if (isValidForSubdomain[fragmentIdx][0])

for (int c0 = iterationOffsetBegin[fragmentIdx][0][1];

(c0<=(iterationOffsetEnd[fragmentIdx][0][1]+16)); c0 = (c0+1))

double* slottedFieldData_Solution_1_fragmentIdx_4_p1 =

&(slottedFieldData_Solution[1][fragmentIdx][4][(19*c0)]);

double* fieldData_RHS_fragmentIdx_4_p1 = &(fieldData_RHS[fragmentIdx][4][(17*c0)]);

double* slottedFieldData_Solution_0_fragmentIdx_4_p1 = …

double* fieldData_LaplCoeff_fragmentIdx_4_p1 = …

for (int c1 = iterationOffsetBegin[fragmentIdx][0][0];

(c1<=(iterationOffsetEnd[fragmentIdx][0][0]+16)); c1 = (c1+1))

slottedFieldData_Solution_1_fragmentIdx_4_p1[(c1+20)] =

(slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+20)]

+(((1.0e+00/fieldData_LaplCoeff_fragmentIdx_4_p1[c1])*8.0e-01)*(fieldData_RHS_fragmentIdx_4_p1[c1]

-(((((fieldData_LaplCoeff_fragmentIdx_4_p1[c1]*slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+20)])

+(fieldData_LaplCoeff_fragmentIdx_4_p1[(c1+289)]*slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+21)]))

+(fieldData_LaplCoeff_fragmentIdx_4_p1[(c1+578)]*slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+19)]))

+(fieldData_LaplCoeff_fragmentIdx_4_p1[(c1+867)]*slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+39)]))

+(fieldData_LaplCoeff_fragmentIdx_4_p1[(c1+1156)]*slottedFieldData_Solution_0_fragmentIdx_4_p1[(c1+1)])))));

Page 28: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Denoising by Diffusion

Idea:

Use nonlinear anisotropic diffusion process to denoise the image

u0 in the domain Ω ,i.e. solve the time-dependent PDE

28

inxuxu

Tonnug

Tint

uugdiv

)()0,(

0,

)(

0

Page 29: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Runtime Results for Different Problems

29

cpu gpu

1 8 256

Double Float Double Float Double Float

Lap

lace

2D Jacobi 546 1.17 3.50 3.50 6.42 9.10

GaussSeidel 453 1.16 2.42 3.24 3.41 4.72

3D Jacobi 608 1.08 2.60 3.00 6.83 10.31

GaussSeidel 608 1.08 2.60 3.01 4.22 5.96

Co

mp

lex

D

iffu

sio

n 2D Jacobi 9235 1.02 2.81 2.87 23.99 33.58

GaussSeidel 7504 1.11 2.18 2.29 11.62 16.42

3D Jacobi 8799 1.15 2.77 2.91 15.94 39.28

GaussSeidel 9048 1.80 2.60 2.87 10.28 24.19

ms speedup speedup speedup speedup speedup

Image Sizes 4096x4096

resp. 256x256x256

Page 30: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Future Work

Imaging applications (tested)

High dynamic range compression

Image denoising

Optical Flow

Imaging applications (next)

Image registration

Image segmentation

30

Page 31: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

Acknowledgements

• Funded by

• Bundesministerium für Bildung und Forschung

• KONWIHR. Bavarian project

• DFG SPP 1648/1 – Software for Exascale computing

http://www.exastencils.org/

Page 32: Platform-independent description of imaging · Computer Science X - System Simulation Group Harald Köstler (harald.koestler@fau.de) Platform-independent description of imaging algorithms

Computer Science X - System Simulation Group

Harald Köstler ([email protected])

THANK YOU!

32