reportv1.1

7/30/2019 reportv1.1

1/35

FAST GPU BASED ADAPTIVEFILTERING OF 4D

ECHOCARDIOGRAPHY

M.TECH FIRST SEMESTER SEMINAR REPORT

SIGNAL PROCESSING

Submitted in partial fulfillment ofthe requirements for the award of M. Tech Degree in

Electronics and Communication Engineering(Signal Processing)

of the University of Kerala

Submitted by:

JERRIN THOMAS PANACHAKEL

DEPARTMENT OF ELECTRONICS AND COMMUNICATION

COLLEGE OF ENGINEERING TRIVANDRUM

2012

7/30/2019 reportv1.1

2/35

DEPARTMENT OF ELECTRONICS AND COMMUNICATION

COLLEGE OF ENGINEERING TRIVANDRUM

Certificate

This is to certify that this report entitled FAST GPU BASED ADAPTIVEFILTERING OF 4D ECHOCARDIOGRAPHY is a bonafide record of theseminar presented by JERRIN THOMAS PANACHAKEL, under our guid-ance towards partial fulfillment of the requirements for the award of Master ofTechnology Degree in Electronics and Communication Engineering (Sig-

nal Processing), of the University of Kerala during the year 2012.

Prof. Jeena R.S.Asst. Prof, Dept. of ECECollege of Engineering,Trivandrum(Seminar Guide)

Prof. Prajith C.AAsso Prof, Dept. of ECECollege of Engineering,Trivandrum(Seminar Coordinator)

Dr. Jiji. C. VProfessor, Dept. of ECECollege of Engineering,Trivandrum(P. G. Coordinator)

Dr. J. DavidProfessor, Dept. of ECECollege of Engineering,Trivandrum(Head of Department)

7/30/2019 reportv1.1

3/35

ACKNOWLEDGEMENTS

I am thankful to Dr.J.David, Head of the Department and Dr.Jiji C.V, P.G.Coordinator of the Department of Electronics and Communication for their helpand support.

I extend my hearty gratitude to Prof.Prajith C.A, Prof.James T.G andProf.Susan R.J, seminar coordinators, Department of Electronics and Communi-

cation for providing necessary facilities and their sincere cooperation.

I would like to express my sincere gratitude and heartful indebtedness to myseminar guide, Prof. Jeena R.S.,Assistant Professor, Department of Electronicsand Communication Engineering for her valuable guidance and encouragement inpursuing this seminar.

I also acknowledge other members of faculty in the Department of Electron-ics and Communication Engineering and all my friends and family for their wholehearted cooperation and encouragement.

Above all I am thankful to the God Almighty for his love and blessings.

JERRIN THOMAS PANACHAKEL

7/30/2019 reportv1.1

4/35

Abstract

Time resolved three-dimensional echocardiography generates four-dimensi-onaldata sets that bring new possibilities in clinical practice. Image quality of four-dimensional (4D) echocardiography is however regarded as poorer compared to con-ventional echocardiography where time-resolved 2D imaging is used. Advanced im-age processing filtering methods can be used to achieve image improvements but tothe cost of heavy data processing. The recent development of graphics processingunit (GPUs) enables highly parallel general purpose computations, that consider-ably reduces the computational time of advanced image filtering methods. In thisstudy multidimensional adaptive filtering of 4D echocardiography was performed

using GPUs. Filtering was done using multiple kernels implemented in OpenCL(open computing language) working on multiple subsets of the data. The resultsshow a substantial speed increase of up to 74 times, resulting in a total filtering timeless than 30 s on a common desktop. This implies that advanced adaptive imageprocessing can be accomplished in conjunction with a clinical examination.

i

7/30/2019 reportv1.1

5/35

Contents

1 INTRODUCTION 1

2 ANISOTROPIC ADAPTIVE FILTERING 3

3 ECHOCARDIOGRAPHY 73.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Transthoracic Echocardiogram . . . . . . . . . . . . . . . . . . . . . . 83.3 Transesophageal Echocardiogram . . . . . . . . . . . . . . . . . . . . 9

3.3.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.3.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.4 M-mode Echocardiography . . . . . . . . . . . . . . . . . . . . . . . . 103.5 Two-Dimensional Echocardiography . . . . . . . . . . . . . . . . . . . 113.6 Three-dimensional echocardiography . . . . . . . . . . . . . . . . . . 12

4 HARDWARE AND SOFTWARE 134.1 Echocardiographic Image Acquisition . . . . . . . . . . . . . . . . . . 134.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.3 OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 GPU IMPLEMENTATION 15

6 RESULTS 216.1 Timing Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . 216.2 Filtering Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

7 CONCLUSION 27

Bibliography 29

ii

7/30/2019 reportv1.1

6/35

List of Figures

2.1 Visualization of a quadrature filter . . . . . . . . . . . . . . . . . . . 5

3.1 Normal heart (TTE view) . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 3D echocardiogram of a heart viewed from the apex . . . . . . . . . . 12

5.1 Illustration of kernel invocations and data flow between CPU/GPUand GPU/GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6.1 Timing comparison for different kernel sizes . . . . . . . . . . . . . . 226.2 Timing comparisons for 3D and 4D filtering . . . . . . . . . . . . . . 226.3 Computational time for filtering of the aortic valve data set . . . . . . 236.4 Comparison of filtering efficiency . . . . . . . . . . . . . . . . . . . . 256.5 Intensity plot along a central horizontal line from the original, 3D,

and 4D filtered aortic valve view. . . . . . . . . . . . . . . . . . . . . 26

6.6 Intensity plot along a central horizontal line from the original, 3D,and 4D filtered four chamber view. . . . . . . . . . . . . . . . . . . . 26

iii

7/30/2019 reportv1.1

7/35

Chapter 1

INTRODUCTION

CARDIAC ultrasound began with single-crystal transducer displays of the ampli-tude (A-mode) of reflected ultrasound versus depth on an oscilloscope screen cre-ating images that were difficult to interpret. When time dimension (M-mode) wasintroduced the images became somewhat easier for the clinician to interpret, and itbecame even easier with the introduction of 2D-images.Among the disadvantages of using 2D is that structures can not be seen in all threespatial dimensions or could be viewed from different orientations, which could be ofinterest, for instance when localizing a prolapse or vegetation of the mitral valve.The later could be achieved using 3D/4D echocardiography, which also gives the op-portunity to study structures from the surgical view and makes it possible to travel

through the heart, which is of special interest in patients with complex congenitalheart diseases.

The 3D echocardiographic data set can be used to generate multiple 2D imageplanes, which is useful for instance when calculating the left ventricular stroke vol-ume . Real-time 3D transesophageal echocardiography may be used during complexinterventional procedures like percutaneous edge-to-edge repair of the mitral valve.New areas to use 3D echocardiography are also for instance to assess myocardialperfusion during adenosine stress . However, the clinical use of 3D/4D echocardio-graphy is much reliant on image quality, even more than standard 2D, and duringthe years the use of 3D/4D has been limited due to complex and sometimes time-

consuming requirements for postprocessing, often on a separate workstation afterthe examination .The trend today towards 3D and 4D imaging modalities of even higher resolu-

tions pose an increasing computational challenge for data processing and filtering ofthese data sets. At the same time, the previous trend of the last few decades of pro-

1

7/30/2019 reportv1.1

8/35

cessors of increasing frequencies have been hampered by basic physical limitations of

electronics and instead been replaced by a focus on parallelization of hardware. Thisshift in focus implies that basic imaging filtering algorithms no longer automaticallyscale up to larger data sets with later generation processors, but instead require anoften complete redesign of algorithms and programming tools. As such, there isan increasing need for efficient algorithms that can exploit the new computationalhardwares such as multicore processors and, lately, general purpose computationson graphics processors (GPUs).

The possibility to substantially speed-up processing time using GPUs has beenutilized in different applications of ultrasound imaging. Real-time tracking imageregistration , and segmentation are examples where GPUs has facilitated computa-tional demanding algorithms. It has also been shown how GPU computations can

be used for image denoising of ultrasound images . In a recent publication it isshown how the power of full dimensionality (4D) improves image denoising of 4Dcardiac CT data set. There are, however,there are no work on image denoising of4D echocardiography data where the full dimensionality of the data is taken intoaccount in the image denoising process.

In the work, an efficient data parallel version of an off the shelf adaptive filteringmethod for image denoising is discussed. This algorithm was applied to to 3D and 4Dechocardiography data and conclude that analysis of 3D and 4D echocardiographycan be done in more practical time frames with the method presented here.

For the method to be of practical clinical use, we require that the filtering should

be possible to perform on a normal desktop computer within a time frame allowingfor a normal patient throughput. Due to the size of the data to be filtered and thecomputational cost of each sample, this impose a hard requirement on efficient im-plementation of the computations. For reference, the data sets under considerationhere range roughly from 107 108 samples samples, and the computation requiresin the magnitude of 104 105 convolution steps per sample to measure the localorientations.

To satisfy this computational requirement, an algorithms was developed thatperform this filtering to be run on the latest graphic card processors (GPUs) whichpromise to deliver a floating point performance that is sufficient for the filtering ofthe 4D data. To deliver this performance the basic filtering steps have been rewritten

and optimized to a form suitable for GPUs.

2

7/30/2019 reportv1.1

9/35

Chapter 2

ANISOTROPIC ADAPTIVEFILTERING

Adaptive filters are commonly used in image processing to enhance or restore databy removing noise without significantly blurring the structures in the image. Theadaptive fltering literature is vast and cannot adequately be summarized in a shortchapter. However, a large part of the literature concerns one-dimensional (1D)signals. Such methods are not directly applicable to image processing and there areno straightforward ways to extend 1D techniques to higher dimensions primarilybecause there is no unique ordering of data points in dimensions higher than one.Since higher- dimensional medical image data are not uncommon (2D images, 3Dvolumes, 4D time-volumes), the focus on this chapter on adaptive filtering techniquesthat can be generalized to multidimensional signals.

On the basis of the characteristics of the human visual system, local anisotropyis an important property in images and introduced an anisotropic component inAbramatic and Silvermans model

H, = H+ (1 )(+ (1 )cos2( ))(1H) (2.1)

where the parameter controls the level of anisotropy, defines the angular direc-tion of the filter coordinates, and y is the orientation of the local image structure.The specific choice of weighting functioncos2( ) was imposed by its ideal in-

terpolation properties, the directed anisotropy function could be implemented asa steerable filter from three fixed filters The local orientation and the degree ofanisotropy with three oriented Hilbert transform pairs, so- called quadrature filters,with the same angular profiles as the three basis functions describing the steerableweighting function. Figure 2 shows one of these Hilbert transform pairs. In areas

3

7/30/2019 reportv1.1

10/35

of the image lacking a dominant orientation, is set to 1, and Eq.2.1 reverts to the

isotropic Abramatic and Silverman solution. The more dominant the local orienta-tion, the smaller the value and the more anisotropic the filter. This method canintuitively be seen as a linear combination of low pass and high pass filters, wherethe combination of the filters is spatially variant and relies on the orientation of thelocal structures surrounding each data sample. These orientation estimates adjustthe filter to preserve the edges of surfaces while the low pass components removethe noise. The theory of this concept can be found in the literature, only a conciseintroduction to the method is given here.

At first the local orientation is determined by using a set of quadrature filtersqk. In a quadrature filter set the number of quadrature filters being used depend onthe dimensionality of the data. In 3D six quadrature filters are used while twelve

are used for 4D. Each quadrature filter has a specific orientation and consists of akernel pair with even and odd convolution kernels that are sensitive for lines andedges, respectively. The output qk of is a complex number where the magnitude |qk|is an estimate of the certainty for identifying signal change corresponding to a lineor an edge.

Based on the response from the quadrature filters the local orientation tensor Tis given by

T =k

|qk|(nknkT I) (2.2)

where nk is the direction of the quadrature filter qk, I the identity tensor and ,

are the constants with values =5

4 and =1

4 in 3D and = 1, =1

6 in 4D.By calculating the eigenvalues and eigenvectors of T the local orientation is

possible to interpret. If all eigenvalues are approximately equal then T describes anisotropic neighborhood with no dominating orientation while in other cases, whenthere exists a differentiation in magnitude among the eigenvalues, neighborhoods of,for example, planes and lines are described.

Based on the intrinsic information in T regarding the orientation of the localneighborhood, an adaptive filter synthesis is performed where the resulting adaptivefilter sap is given by a weighted sum of fixed filters

sap = slp + ahpk

cksk (2.3)

where slp is the result from a low pass filter,ahp is the high-pass amplification factorwhich gives a trade-off between filtering quality and a risk for introducing high-passfiltering artifacts exaggerating edges and lines,sk is the output from a high-pass filter

4

7/30/2019 reportv1.1

11/35

(a) (b)

(c) (d)

Figure 2.1: Visualization of a quadrature filter (Hilbert transform pair) used in theestimation of local anisotropy. (Top) The plots show the filter in the spatial domain:the real part (left) and the imaginary part (right). It can be appreciated that thereal part can be viewed as a line filter and the imaginary part an edge filter. Thecolor coding is green, positive real, red, negative real; blue, positive imaginary, andorange, negative imaginary. (Bottom) The left plot shows the magnitude of the filterwith the phase of the filter color coded. The right plot shows the quadrature filterin the Fourier domain. Here the filter is real and zero on one half of the Fourierdomain.

5

7/30/2019 reportv1.1

12/35

with the same direction as the quadrature filter and ck is the weighting coefficient

below ck = C.(nknkT I) (2.4)

where C is the control tensor and . symbolizes the scalar product between ten-sors.The control tensor is used to control the degree of anisotropy in the adap-tive filter. When determining C a low-pass filtered version of the local orientationtensor,Tlp , is being used. Calculating the weighted outer product of the eigenvectorsof Tlp gives the control tensor

C =1

21 + 2

N

k=1kekek

T (2.5)

where k is the eigenvalue of Tlp with i i+1for all i = 1....M , is a resolutionparameter ranging from zero to one and ek is the eigenvector of Tlp correspondingto k.

6

7/30/2019 reportv1.1

13/35

Chapter 3

ECHOCARDIOGRAPHY

An echocardiogram,often referred to in the medical community as a cardiac ECHOor simply an ECHO, is a sonogram of the heart, also known as a cardiac ultrasound,it uses standard ultrasound techniques to image two-dimensional slices of the heart.The latest ultrasound systems now employ 3D real-time imaging.In addition to creating two-dimensional pictures of the cardiovascular system, anechocardiogram can also produce accurate assessment of the velocity of blood andcardiac tissue at any arbitrary point using pulsed or continuous wave Doppler ul-trasound. This allows assessment of cardiac valve areas and function, any abnormalcommunications between the left and right side of the heart, any leaking of bloodthrough the valves (valvular regurgitation), and calculation of the cardiac output as

well as the ejection fraction. Other parameters measured include cardiac dimensions(luminal diameters and septal thicknesses) and E/A ratio.

Echocardiography was an early medical application of ultrasound. Echocardio-graphy was also the first application of intravenous contrast-enhanced ultrasound.This technique injects gas-filled micro bubbles into the venous system to improvetissue and blood delineation. Contrast is also currently being evaluated for its ef-fectiveness in evaluating myocardial perfusion. It can also be used with Dopplerultrasound to improve flow-related measurements. Echocardiography is either per-formed by cardiac sonographers, cardiac physiologists or doctors trained in cardiol-ogy. Purpose of echocardiography in general and various types of echocardiograms

are discussed below.

7

7/30/2019 reportv1.1

14/35

3.1 Purpose

Echocardiography is used to diagnose cardiovascular diseases. In fact, it is one ofthe most widely used diagnostic tests for heart disease. It can provide a wealth ofhelpful information, including the size and shape of the heart, its pumping capacityand the location and extent of any damage to its tissues. It is especially usefulfor assessing diseases of the heart valves. It not only allows doctors to evaluate theheart valves, but it can detect abnormalities in the pattern of blood flow, such as thebackward flow of blood through partly closed heart valves, known as regurgitation.By assessing the motion of the heart wall, echocardiography can help detect thepresence and assess the severity of any wall ischemia that may be associated withcoronary artery disease. Echocardiography also helps determine whether any chest

pain or associated symptoms are related to heart disease. Echocardiography can alsohelp detect any cardiomyopathy, such as hypertrophic cardiomyopathy, as well asothers. The biggest advantage to echocardiography is that it is noninvasive (doesntinvolve breaking the skin or entering body cavities) and has no known risks or sideeffects.

3.2 Transthoracic Echocardiogram

A standard echocardiogram is also known as a transthoracic echocardiogram (TTE),or cardiac ultrasound. In this case, the echocardiography transducer (or probe) is

placed on the chest wall (or thorax) of the subject, and images are taken throughthe chest wall. This is a non-invasive, highly accurate and quick assessment of theoverall health of the heart. A cardiologist can quickly assess a patients heart valvesand degree of heart muscle contraction (an indicator of the ejection fraction). Theimages are displayed on a monitor, and are recorded either by videotape (analog)or by digital techniques.

An echocardiogram can be used to evaluate all four chambers of the heart. Itcan determine strength of the heart, the condition of the heart valves, the lining ofthe heart (the pericardium), and the aorta. It can be used to detect a heart attack,enlargement or hypertrophy of the heart, infiltration of the heart with an abnormalsubstance. Weakness of the heart, cardiac tumors, and a variety of other findingscan be diagnosed with an echocardiogram. With advanced measurements of themovement of the tissue with time (tissue doppler), it can measure diastolic function,fluid status and dys-synchrony.

The TTE is highly accurate for identifying vegetations (masses consisting of a

8

7/30/2019 reportv1.1

15/35

Figure 3.1: Normal heart (TTE view)

mixture of bacteria and blood clots), but the accuracy can be reduced in up to 20%of adults because of obesity, chronic obstructive pulmonary disease, chest-wall de-formities, or otherwise technically difficult patients. TTE in adults is also of limiteduse for the structures at the back of the heart, such as the left atrial appendage.Transesophageal echocardiography may be more accurate than TTE because it ex-cludes the variables previously mentioned and allows closer visualization of commonsites for vegetations and other abnormalities. Transesophageal echocardiographyalso affords better visualization of prosthetic heart valves.

Bubble contrast TTE involves the injection of agitated saline into a vein,

followed by an echocardiographic study. The bubbles are initially detected in theright atrium and right ventricle. If bubbles appear in the left heart, this may indicatea shunt, such as a patent foramen ovale, atrial septal defect, ventricular septal defector arteriovenous malformations in the lungs.

3.3 Transesophageal Echocardiogram

A transesophageal echocardiogram, or TEE is an alternative way to perform anechocardiogram. A specialized probe containing an ultrasound transducer at its tipis passed into the patients esophagus. This allows image and Doppler evaluation

which can be recorded.It has several advantages and some disadvantages comparedto a transthoracic echocardiogram (TTE).

9

7/30/2019 reportv1.1

16/35

3.3.1 Advantages

The advantage of TEE over TTE is usually clearer images, especially of structuresthat are difficult to view transthoracicly (through the chest wall). The explanationfor this is that the heart rests directly upon the esophagus leaving only millimetersthat the ultrasound beam has to travel. This reduces the attenuation (weakening)of the ultrasound signal, generating a stronger return signal, ultimately enhancingimage and Doppler quality. Comparatively, transthoracic ultrasound must first tra-verse skin, fat, ribs and lungs before reflecting off the heart and back to the probebefore an image can be created. All these structures, along with the increased dis-tance the beam must travel, weaken the ultrasound signal thus degrading the imageand Doppler quality.

In adults, several structures can be evaluated and imaged better with the TEE,including the aorta, pulmonary artery, valves of the heart, both atria, atrial septum,left atrial appendage, and coronary arteries. TEE has a very high sensitivity forlocating a blood clot inside the left atrium.

3.3.2 Disadvantages

TEE requires a fasting patient, (the patient must follow the ASA NPO guide-lines(i.e. usually not eat or drink anything for eight hours prior to the proce-dure)

Requires a team of medical personnelTakes longer to perform

May be uncomfortable for the patient

May require sedation or general anesthesia

Has some risks associated with the procedure (esophageal perforation 1 in10,000, and adverse reactions to the medication).

3.4 M-mode EchocardiographyThe M-mode echocardiogram yields a one-dimensional (ice-pick) view of the car-diac structures moving over time. The echoes from various tissue interfaces alongthe axis of the beam are moving during the cardiac cycle and are swept across time,

10

7/30/2019 reportv1.1

17/35

providing the dimension of time. The lines on the recordings correspond to the

position of the imaged structures in relation to the transducer and other cardiacstructures at any instance in time. More accurate placement of the M-mode cursorwithin the heart is performed by using the two-dimensional (2-D) real-time imageas a guide. The M-mode echocardiogram uses a high sampling rate and can yieldcleaner images of cardiac borders, allowing the echocardiographer to obtain moreaccurate measurements of cardiac dimensions and more critically evaluate cardiacmotion. Careful placement of the M-mode beam at the appropriate locations withinthe heart and obtaining clean echoes of endocardial surfaces are critical to obtainaccurate measurements and to make the calculations performed from these measure-ments, meaningful. Standard M-mode views are obtained from the right parasternalposition. The M-mode cursor should be positioned within the heart using the right

parasternal short axis view, to avoid inclusion of a papillary muscle within the leftventricular free wall thickness. The standard M-mode views utilized in veterinarymedicine include the left ventricle (at the level of the chordae tendineae), the mitralvalve and the aortic root (aorta/ left atrial appendage) view.

3.5 Two-Dimensional Echocardiography

Two-dimensional echocardiography allows a plane of tissue (both depth and width)to be imaged in real time. Thus, the anatomic relationships between various struc-tures are easier to appreciate than with M-mode echocardiographic images. An

infinite number of imaging planes through the heart are possible, however, stan-dard views are used to evaluate the intra and extracardiac structures. The standardviews are obtained from either the right parasternal window in all species and fromthe left parasternal window in adult large animals or in other species when imagingthe heart from the left side is desirable. Occasionally, images are obtained fromsubxiphoid (subcostal) or thoracic inlet (suprasternal) positions. These views areusually only feasible to obtain in small animals or young large animals.

The standard views include the right parasternal long axis views of the 4 cham-bers (4 chamber view), left ventricular outflow tract, and right ventricular outflowtract and the short axis views perpendicular to this plane (left ventricle at the

chordal level, mitral valve, and aorta/left atrial appendage). In large animals leftparasternal long axis views of the mitral valve, aorta and pulmonary artery are alsoobtained when indicated.

11

7/30/2019 reportv1.1

18/35

Figure 3.2: 3D echocardiogram of a heart viewed from the apex

3.6 Three-dimensional echocardiography

3D echocardiography (also known as 4D echocardiography when the picture is mov-ing) is now possible, using a matrix array ultrasound probe and an appropriate pro-cessing system. This enables detailed anatomical assessment of cardiac pathology,particularly valvular defects,and cardiomyopathies.The ability to slice the virtual

heart in infinite planes in an anatomically appropriate manner and to reconstructthree-dimensional images of anatomic structures make 3D echocardiography uniquefor the understanding of the congenitally malformed heart.Real Time 3-Dimensionalechocardiography can be used to guide the location of bioptomes during right ven-tricular endomyocardial biopsies, placement of catheter delivered valvular devices,and in many other intraoperative assessments.

12

7/30/2019 reportv1.1

19/35

Chapter 4

HARDWARE AND SOFTWARE

4.1 Echocardiographic Image Acquisition

For the purpose of validating the filtering algorithms presented here, volumetric datasets collected from a healthy volunteer using a GE VingMed Vivid E9 cardiovascularultrasound system was used. We have sampled the heart of the volunteer frommultiple projections, most importantly a parasternal basal short-axis view at thelevel of the aortic valve and an apical four-chamber view. Volume data was acquiredusing four consecutive cardiac cycles (multibeat). These recordings were exportedas envelope data sets from the proprietary system to DICOM format which was read

using a custom program running on standard laptops and desktops.

4.2 Hardware

For the implementation of the above algorithm it was evaluated on a modern desktopPC computer and an off-the-shelf laptop computer. The former consists of a 3.33GHz AMD Phenom II SixCore processor, 4 GB DDR3 RAM, with an AMD 6950graphic card. The later is an Asus G73JH laptop containing an Intel Core i7 720QM(4 hyper threaded CPU cores at 1.6 GHz), 8 GB DDR3 RAM and a AMD 5870Mobility Radeon graphic card. For reference, the graphic cards above contain 1408

and 800 individual stream processors operating at 800 and 700 MHz, respectively.As we see in the results these numbers reflect almost linearly on the performance ofthe algorithm.

Parsing of the volumetric data files of 40700 MB each was performed onboard theCPU following the standard DICOM specification with the VolDICOM extension as

13

7/30/2019 reportv1.1

20/35

specified by the manufacturer of the ultrasound machines.

4.3 OpenCL

Open Computing Language (OpenCL) is a framework for writing programs thatexecute across heterogeneous platforms consisting of central processing unit (CPUs),graphics processing unit (GPUs), and other processors. OpenCL includes a language(based on C99) for writing kernels (functions that execute on OpenCL devices),plus application programming interfaces (APIs) that are used to define and thencontrol the platforms. OpenCL provides parallel computing using task-based anddata-based parallelism. OpenCL is an open standard maintained by the non-profit

technology consortium Khronos Group. It has been adopted by Intel, AdvancedMicro Devices, Nvidia, and ARM Holdings.For example, OpenCL can be used to give an application access to a graph-

ics processing unit for non-graphical computing (see general-purpose computing ongraphics processing units). Academic researchers have investigated automaticallycompiling OpenCL programs into application-specific processors running on FP-GAs, and commercial FPGA vendors are developing tools to translate OpenCL torun on their FPGA devices.

The simplest way of implementing the algorithm is to let each sample of the datacorresponds to one work item, which is possible only if there is not any inter-datadependencies. As we will see below this can be improved to give better performance

by allowing dependencies and working around the constraints posed by limitationsin available memory for storing temporary data.

14

7/30/2019 reportv1.1

21/35

Chapter 5

GPU IMPLEMENTATION

As a method to perform adaptive filtering of typical 4D echocardiography data setswithin reasonable time frames the adaptive filtering algorithm was implemented on-board a graphics processor (GPU). By structuring the problem and modifying thealgorithms accordingly a significant speedup as compared to traditional implemen-tation of the adaptive filtering algorithm could be obtained.

To analyze the trade-off between execution times and quality of filtering both3D and 4D based filtering were implemented of the data sets. In the first case, thefiltering method was applied on a volume of ultrasound data for each time frameof the data set. For the latter case, the whole data set was considered as a fourdimensional array of data and perform filtering also along the fourth dimension.

In the standard way of describing the adaptive filtering method one typicallyconstructs the orientation tensor for each data sample and apply a low pass filtercomponentwise to the tensors. We note that performing the low pass filtering beforethe tensor construction gives an equivalent result, but with fewer operations, sincethe constructed orientation tensors are a linear product of the quadrature filterresponses

T

flp =

k

(qkekekT)

flp (5.1)

=

k(qk

flp)ekekT

(5.2)

Thus, for the 3D and 4D filtering case we require only 6 or 12 low pass filteringoperations since qk only contains the magnitude of the complex quadrature filterresponses

15

7/30/2019 reportv1.1

22/35

The convolution kernels for line and edge detection is precomputed for a given

radius rc and give a Gaussian low pass kernel of radius rg , as measured in datasamples. The filter qk could be computed by a combined kernel of eg.(2rg + 1 +2rc + 1)

3 size for the 3D filtering. However, we observe that we gain the sameresult by performing convolutions with the high pass filters of 2rc + 1)

3 followedby three (respectively four) convolutions with a 1D Gaussian filter of size (2rg + 1)consecutively. By performing this sequence of convolutions with smaller kernelsfewer convolution operations are needed.

As can be seen, the full filtering algorithm could be implemented in a naivelydata parallel fashion where each work item would require convolutions with thelarger kernels followed by the construction of control tensors and final filtering.Such an approach would require no intermediate storage of values and thus require

no communication between different work items.However, since the convolutions with the smaller quadrature filters and 1D low

pass filters give a significant reduction in the number of computations, the compu-tation was split into a series of kernels. The intermediate value from the differentkernel computations are here stored onboard the GPU.

At a high level the adaptive filtering algorithm can be described by the stepsbelow.

Step 1: Compute the quadrature filter qk for each data point as a combinationof the line and edge detection filter kernels.

Step 2: Let sk = |qk|for each data point.

Step 3: Perform a convolution of qk by applying three 1D low pass Gaussianfilters oriented around each of the first three dimensions consecutively formingqk. Apply the same convolutions to form the low pass filtered data slp.

Step 4: Form the orientation tensor T(1) using qk and the corresponding direc-tions ek.

Step 5: Compute the eigenvalues i ofT tensor using the characteristic equationdet(|T i|) = 0 and find the corresponding eigenvectors ei by Gauss Jordan

elimination.Step 5: Form the control tensor C(4), and the weighting coefficients ck(3) for

each high pass filter.

16

7/30/2019 reportv1.1

23/35

Step 6: Compute the final output slp(2) from the weighting coefficients, the low

pass filtered data and the high pass filtered data.Only data dependencies between different data points occurs in Step 3 above, and

can thus implement the algorithm by the following three data parallel computationkernels:

the quadrature convolutions kernel which performs steps 12 of the algorithm;

the low pass filtering kernel that performs filtering with a 1D Gaussian kernel;

the adaptive filtering kernel that performs steps 47 of the algorithm

where the low pass filtering kernel is invoked once per dimension.

A straightforward data parallel implementation of these kernels that runs onone or more processor cores on a desktop machine that process frame by frame ofdata can easily be implemented. To store the intermediate results for qk we requiretwofloating point values per filter and data sample in at least two copies during thelow pass convolutions. For the considered data sets this consumes at least 1536 MiBof data for the 3D filtering data sets and 2rg +1) times that for the 4D filtering casesince multiple frames need to be stored. Since the GPUs of today cannot handle thecomputations with such large temporary data, the computational task was split upinto the consecutive filtering of a number of subvolumes, each responsible to computeN3 data points of the filtered data on each frame. For step 47 of the algorithm above,

we require the corresponding N

3

values of T,slp, and sk to compute sap . For themultiple executions of step 3, however, we require between N3 and (N+ 2rg)3 values

of qk to correctly handle the lowpass filtering of the overlap between subvolumes.This gives a computational cost of ((N + 2rg)/N

3) times higher than if all theintermediate calculations could be saved.

For an illustration of the split up between kernel executions and the intermediatedata sets that are stored (see Figure 5.1). With the above value for , the five kernelexecutions per subvolume will be performed 64 times per frame of the data set. Lowpass filtering was not performed along the time dimension for 4D filtering. This isdue to the requirement of storing a four dimensional qk, requiring (2rg + 1) timesas much memory for intermediate calculations and either storing the computed qk

between the different invocations for different frames (requiring too much onboardmemory or swapping to the CPUs RAM memory) or by the corresponding (2rg + 1)multiplied cost by recomputing the values for each new frame. However, since theinput data sets are of considerably smaller sizes the individual values of qk for the aN3 subset can be computed along the fourth dimensions for reasonable sizes of rc.

17

7/30/2019 reportv1.1

24/35

Figure 5.1: Illustration of kernel invocations and data flow between CPU/GPUand GPU/GPU. Data on CPU side stored with one byte per volumetric sample,temporary data on GPU side stored as floating point vectors/tensors per sample Ris the radius of convolution kernels,N = 64 the size of each subvolume for whichfiltering is performed.

18

7/30/2019 reportv1.1

25/35

1) Computational Performance: The computationally most expensive steps in

the algorithm above is the convolution with the quadrature filters and with the lowpass filters.For the later, we have already seen how it can be changed from using 3D or 4D

convolution kernels to a sequence of 1D kernels, giving a speedup of e.g. (2rg +1)3/2(2rg + 1)), which for rg = 40 gives respectively 332 times less computationsneeded. For the former, we cannot in an easy way reduce the problem to lowerdimensions. For this purpose, optimization of the basic convolution algorithm itselffor use on GPU hardware is of interest.

Due to the relatively small size of the convolution kernels and the large number ofconvolutions to perform, spatial domain convolutions was used on the GPU and haveoptimized the algorithm to fit the number of simultaneous convolution operations,

specific kernel sizes, and dimensionality of the problem. kernels.If we let s be the size of each frame in the data set,f the number of frames, the

dimensionality of the problem (3D or 4D) and rc the convolution radius we see thatthe number of basic convolution operations (one multiplication and one addition)required is (2rc + 1)

dsdf for each of the 6 or 12 complex convolutions.For 36 frames of size 256 and rc = 3 we thus require 5 10

12 operations for the3D filtering and 6.9times1013 operations for the 4D filtering.

Compare these numbers with the typical floating point performance of modernmulti-core CPUs, theoretically capable of 10100 109 floating point operations persecond, with practically measured LINPACK benchmarks of 13 109 operations

per second. Although the exact details for the floating point performance of CPUsdepend on the form of computations and other circumstances (see Pratx and Xingfor an overview of the relative performance differences between CPU and GPUs) wesee that a CPU requires at least 690 s to perform the 4D filtering with the mostoptimistic of the alternatives above.

By using GPUs for these computations, with theoretical speeds up to 3 102

operations per second, we have a lower bound of the required time of instead 19.5s. Note that this is a lower bound that we will not reach in practice since in orderto gain this speed we require that no other bottlenecks, e.g., memory bandwidth,slow down the computational speed. These numbers cannot be improved withoutchanging the computational task significantly. 2) Handling Border Data: Due to

the nature of the sampled data, there is an artificial cutoff of the data outside thepyramid that can be sampled by the ultrasound probe. This change between insideand outside data sets requires a special treatment of the quadrature filter convolutionkernels to avoid creating an artificial edge.

2) Numerical Precision: The only step of the algorithms that have proven sensi-

19

7/30/2019 reportv1.1

26/35

7/30/2019 reportv1.1

27/35

Chapter 6

RESULTS

6.1 Timing Comparisons

Timing results both for applying the filtering based on using a 3D orientation esti-mate as well as a 4D orientation estimate are provided and how the timing scaleson a GPU and/or CPU cores is investigated. The measurement for CPU cores havebeen performed using an identical OpenCL implementation, exploiting vectoriza-tion and with a manual fine tuning of the work group sizes to optimize speed. Assuch the relative numbers between GPU and CPU performance should reflect thetrue difference in computational speed between these types of devices. Figure 6.1

shows the actual time spend in the convolution operations when performing 3Dand 4D filtering on one data set for different values of k0 for kernel sizes rangingfrom 517 and 515, respectively. From this we see that the optimal values for k0vary significantly depending on the kernel size (favoring larger values for k0 ) andthe dimension of the problem. Since higher values for k0 and the higher dimensionrequire more intermediate results the number of work items that can be executedin parallel decreases.From this we can conclude that the optimal values for differdepending on the filter parameters but note that it is invariant to the used dataset. Furthermore, Figure 6.2 the total time and time spent in convolutions whenno optimization was performed (k0 = 1) and when the most optimal value of k0was selected. It is concluded that the total time spend in the algorithm is a con-

stant time plus a cubic respectively a quartic expression dependent only on the filterdiameter. Furthermore, we see that the result of the optimization provides a con-sistent speedup for the different kernel sizes. Table ?? presents the total fractionsof computational time spent in the different computational kernels for a filter size

21

7/30/2019 reportv1.1

28/35

Figure 6.1: Timing comparison for different kernel sizes and values of the optimiza-tion parameter k0 when performing 3D (left) respectively 4D (right) filtering of theaortic valve data set (116 X 200 X 117 X 36 samples) for different filter diametersrg

Figure 6.2: Timing comparisons for 3D (left) respectively 4D (right) filtering of theaortic valve data set (116 X 200 X 117 X 36 samples) showing the time for convo-lutions (center, solid line) and total time (upper, dashed line) with no optimization

k0 = 1. The time for convolutions with best value of k0 shown in lowermost dashedline. On the horizontal axis the total number of elements in the convolution kernelsis presented , proportional to the cube respectively power of four of the kernel radius(ie.(2re + 1)

d).

22

7/30/2019 reportv1.1

29/35

3D/CPU 3D/GPU 4D/CPU 4D/GPU

Quadrature Convolution 61.6% 23.0% 96.2% 69.9%Lowpass filtering 12.3% 20.6% 1.0% 12.5%Adaptive filtering 18.0% 34.5% 2.4% 7.7%Other 7.9% 21.9% 0.4% 9.9%

Figure 6.3: Computational time in seconds for filtering of the aortic valve data setconsisting of 116 X 200 X 117 X 36 samples and kernel size 7.

of 7. As we can see, the CPU based computations are dominated by the cost ofperforming convolutions. Although the convolutions are one of the more expensiveoperations also for the GPU, we see that for the smaller 3D convolution problem itis of the same magnitude as the other kernels while for the 4D case it scales up andtakes a larger share of the computational cost. For filtering with larger kernel radiiwe see that the total computational cost increase by the cube (respectively, powerof four) with the increase in kernel radius as supported by Figure 6.2.

Figure 6.3 compares the time for filtering onboard different GPU and CPUs. Dueto the deterministic nature of the algorithm timing measurements give a standarddeviation less than the clock resolution of 1 ms, implying significant differencesbetween the computational times for each of the presented categories in Figure 6.3.

A quadrature filter diameter of 7 was chosen here as the smallest filter size that givesgood visual results. Timing results thus demonstrate a large gain by the applicationof GPUs compared to CPUs for filtering of the data. As we can see both the 3Dand 4D filtering can be performed onboard GPUs, for the given data set size, withina time span suitable for analysis immediately after and in conjunction with the

23

7/30/2019 reportv1.1

30/35

physical examination.

6.2 Filtering Efficiency

To illustrate the effect of 3D and 4D adaptive filtering on the considered echocar-diography data set a 2D slice at one time frame for the data sets (Figure 6.4) waspresented before filtering and after 3D/4D filtering, respectively. In this figure it ispresented on the top row the original (a), 3D filtered (b) and the 4D filtered data(c) taken from one frame of the aortic valve view data set. On the second row it ispresented the original (d), 3D filtered (e) and 4D filtered (f) data taken from oneframe of the four chamber view data set. Visual assessment of the adaptive filtering

indicated, according to a clinician with over 15 years of experience of echocardiog-raphy and cardiology (KE), an improvement of image quality in both the 3D and4D filtered data set. This improvement is illustrated in Figures 6.5 and 6.6 whereintensity lines demonstrates the effect of the filtering. The 3D and 4D filtering de-crease the noise in isotropic signal areas while preserving the high frequency contentof edges and lines.

When comparing the 4D and 3D filtered images further improvements, accordingto the same clinician, are noticed in the 4D filtered images where for instance theatrioventricular valves are more distinctly visualized which makes the interpretationof the image even easier.

24

7/30/2019 reportv1.1

31/35

Figure 6.4: On the top row, cross sections of the aortic valve in basal short axis viewusing the original (a), 3D filtered (b), and 4D filtered (c) signal. On the bottom rowcross sections of the four chamber view data set using the original (d), 3D filtered

(e) and 4D filtered (f) signal.

25

7/30/2019 reportv1.1

32/35

Figure 6.5: Intensity plot along a central horizontal line from the original, 3D, and4D filtered aortic valve view.

Figure 6.6: Intensity plot along a central horizontal line from the original, 3D, and4D filtered four chamber view.

26

7/30/2019 reportv1.1

33/35

Chapter 7

CONCLUSION

A general method for fast local orientation estimation and filtering of 4D echocar-diographic data sets using GPU hardware was presented. This specific combinationof 3D and 4D filtering show promising results that require further studies to de-termine suitability in echocardiographic examinations. Such a clinical evaluationwould preferably be performed as a double blind study involving several data setsand clinicians. This specific application of filtering of ultrasound data should be seenas an example of applications that can be implemented on large 4D data sets usingthe GPU based quadrature filtering approach. This method holds promises also toany other technique requiring local orientation estimates on large data sets. Whenlooking at recent development of computational hardware an exponential growth

of the number of parallel processing elements matching Moores law is seen on theGPU side. With the advent of accelerated processing units (APUs) where GPU andCPU processors are combined on the same chip, this exponential growth in num-ber of processing elements can be expected to continue in the near future. Giventhe performances of the algorithms on modern hardware, it is not unreasonable toassume that realtime interactive filtering can be performed during clinical exami-nations within a few years. It is known that multibeat acquisition might introducestitching artifacts that degrade the quality of the image. Since such artifacts iscaused by miss registration of volumetric data from different heart beats a simplelow-level filtering along the boundary of the artifact would not solve the underly-

ing problem of the possible differences in the stitched images. Since the presentedfiltering algorithm contain no image registration step, or other effects for dealingwith the underlying problem with stitching artifacts only data sets that contain noapparent stitching artifacts were used. This may pose a limitation when applyingthe method to certain cases of patients. In conclusion, GPUs facilitate the use of

27

7/30/2019 reportv1.1

34/35

demanding adaptive image filtering techniques that enhance 4D echocardiographic

data sets. This may open up for an improvement in diagnosis and pre- and evenper-surgical examinations using 4D echocardiograms. This general methodology ofimplementing parallelism is also applicable for other medical multidimensional datasets, such as MRI and CT, that would benefit from fast adaptive image processing.

28

7/30/2019 reportv1.1

35/35

Bibliography

[1] John S. Gottdiener, MD , James Bednarz, BS, RDCS, Richard Devereux, MD,Julius Gardin, MD, Allan Klein, MD, Warren J. Manning, MD, Annitta More-head, BA, RDCS, Dalane Kitzman, MD, Jae Oh, MD, Miguel Quinones, MD,Nelson B. Schiller, MD, James H. Stein, MD, and Neil J. Weissman, MD, AReport from the American Society of Echocardiographys Guidelines and Stan-dards Committee and The Task Force on Echocardiography in Clinical TrialsJournal of the American Society of Echocardiography, Vol.17, no.10, pp.1021-1122, Oct.2004.

[2] C. Otto, Principles of echocardiographic image acquisition and doppler analysis,in Textbook of Clinical Echocardiography. Philadelphia, PA: Saunders, 2004,pp. 129.

[3] C. Otto, Other echocardiographic modalities, in Textbook of Clinical Echocar-

diography. Philadelphia, PA: Saunders, 2004, pp. 100104.

[4] Mathias Broxvall, Kent Emilsson, and Per Thunberg, Fast GPU Based Adap-tive Filtering of 4D Echocardiography, IEEE Trans. Commun., vol. 31, no. 4,pp. 532540, April 1983.

[5] R. H. Bamberger and M. J. T. Smith, A filter bank for the directional decom-position of images: Theory and design, IEEE Trans. on Medical Imaging., vol.31, no. 6, pp. 11651172, June 2012.

29

reportv1.1

Documents

Transcript of reportv1.1