Design Space Exploration for High-Accuracy 1-Dimensional ...instead/Papers/paper9.pdf · resolution...

6
Design Space Exploration for High-Accuracy 1-Dimensional Edge Detection Jos´ e C. Alves *† and Paulo J. Pereira * * INESC TEC, Faculdade de Engenharia, Universidade do Porto R. Dr. Roberto Frias 4200-465 Porto, Portugal [email protected], [email protected] Pedro C. Diniz INESC-ID R. Alves Redol, 9 1000-029 Lisboa, Portugal [email protected] Abstract—Structural defect detection is an important applica- tion area of digital image processing techniques for streamlining and classifying possible manufacturing issues in areas such as metallurgy, ceramic and even food quality evaluation. In selected scenarios there is a need for cost-effective and very high- resolution image analysis for which traditional edge-detection techniques are ineffective. In this paper we describe an approach using a linear regression scheme that can reach high-accuracy for sub-pixel edge detection in 1-D images. We report experimental results of our approach on an FPGA-based hardware implemen- tation exploring a set of hardware designs for a range of numeric accuracy alternatives and high-level synthesis optimizations. The resulting design(s) meet the required overall detection latency of 1μsecs with a low cost line-scan CMOS camera. I. I NTRODUCTION AND MOTIVATION Non-invasive, and thus non-destructive analysis of structural defects such as cracks or holes is of great economic and social importance. Large standing structures such as bridges or buildings are constantly subject to weathering and erosion thus altering over long periods of time their intrinsic properties. Many of these structures were built long ago and cannot be (if at all) easily retrofitted with sensor devices. In some settings, inclusion of sensors in the structure would alter or even mask the properties to be observed. A common technique relies on the observation of the mechanic elasticity of the materials under stress. By applying very small amplitude vibrations to the structure and observing how different points of the structure move relative to each other in a non-coherent fashion it is possible to identify hidden heterogeneities and thus develop models of a possible defect of the materials [1]. A possible approach to this problem makes use of digital image processing techniques to capture and process images at fairly high rates identifying the relative position of notorious points in the structure. The use of high resolution (dynamic range) and high speed frame rates required to capture and eliminate induced phenomenon such as harmonic vibration frequencies is impractical due to the correspondingly high costs. To address the various practical constraints of this setting many researchers have proposed the use of high-speed line scan cameras that capture images of the target structure upon which a reference mark has been implanted. Reference marks commonly consists of a sequence of high contrast strips (black and white) that serve as markers for the points to be tracked during the structural vibration. With this setting the implementation must very accurately identify and track the boundary between the transition between the strips, often requiring precision that is below the resolution offered by the native camera in what is called sub-pixel edge estimation. The work presented here addresses the problem of fast and accurate sub-pixel edge estimation in 1-D images obtained from line-scan cameras, by exploring alternative implemen- tations using linear regression with optimized numeric data representations. Given stringent performance and accuracy requirements (e.g. harmonic elimination during vibration) the implementation of a feasible algorithmic solution will require dedicated hardware using, for flexibility and cost, a Field- Programmable Gate Array (FPGA) device streaming the data from a low-cost CMOS line-scan camera. We implemented various algorithmic solutions on an FPGA using a commer- cially available high-level synthesis tool aiming at reducing the used FPGA resources while maintaining a specific pixel estimation latency. The work presented here makes the following contributions: Describes a simple linear regression method and practical implementation using fix-point numeric representations of the sub-pixel estimation problem suitable for use with a low-cost line scan camera; Explores the trade-off between the numeric precision in the algorithm and the observed sub-pixel estimation error for the linear regression method used; Presents real implementation results exploiting value- based optimizations and loop transformation for an FPGA-based system. The designs exhibit very low la- tency making them suitable for real-time applications. The design-space exploration described in our work is fairly common in a wider context where designers aim at developing custom (and often reprogrammable) designs using FPGAs that meet specific latency (throughput) requirements while being able to sacrifice some numeric accuracy but without detriment for the final computation usefulness or quality of solutions. To this effect we describe how this exploration can be expressed and exploited with a domain-specific language (DSL) [2]. The use of this DSL allows designers to quickly specify numeric- related and mapping-related program transformations from a

Transcript of Design Space Exploration for High-Accuracy 1-Dimensional ...instead/Papers/paper9.pdf · resolution...

Page 1: Design Space Exploration for High-Accuracy 1-Dimensional ...instead/Papers/paper9.pdf · resolution image analysis for which traditional edge-detection techniques are ineffective.

Design Space Exploration for High-Accuracy1-Dimensional Edge Detection

Jose C. Alves∗† and Paulo J. Pereira∗∗INESC TEC, †Faculdade de Engenharia, Universidade do Porto

R. Dr. Roberto Frias4200-465 Porto, Portugal

[email protected], [email protected]

Pedro C. DinizINESC-ID

R. Alves Redol, 91000-029 Lisboa, Portugal

[email protected]

Abstract—Structural defect detection is an important applica-tion area of digital image processing techniques for streamliningand classifying possible manufacturing issues in areas suchas metallurgy, ceramic and even food quality evaluation. Inselected scenarios there is a need for cost-effective and very high-resolution image analysis for which traditional edge-detectiontechniques are ineffective. In this paper we describe an approachusing a linear regression scheme that can reach high-accuracy forsub-pixel edge detection in 1-D images. We report experimentalresults of our approach on an FPGA-based hardware implemen-tation exploring a set of hardware designs for a range of numericaccuracy alternatives and high-level synthesis optimizations. Theresulting design(s) meet the required overall detection latency of1µsecs with a low cost line-scan CMOS camera.

I. INTRODUCTION AND MOTIVATION

Non-invasive, and thus non-destructive analysis of structuraldefects such as cracks or holes is of great economic andsocial importance. Large standing structures such as bridges orbuildings are constantly subject to weathering and erosion thusaltering over long periods of time their intrinsic properties.Many of these structures were built long ago and cannotbe (if at all) easily retrofitted with sensor devices. In somesettings, inclusion of sensors in the structure would alter oreven mask the properties to be observed. A common techniquerelies on the observation of the mechanic elasticity of thematerials under stress. By applying very small amplitudevibrations to the structure and observing how different pointsof the structure move relative to each other in a non-coherentfashion it is possible to identify hidden heterogeneities andthus develop models of a possible defect of the materials [1].

A possible approach to this problem makes use of digitalimage processing techniques to capture and process images atfairly high rates identifying the relative position of notoriouspoints in the structure. The use of high resolution (dynamicrange) and high speed frame rates required to capture andeliminate induced phenomenon such as harmonic vibrationfrequencies is impractical due to the correspondingly highcosts. To address the various practical constraints of thissetting many researchers have proposed the use of high-speedline scan cameras that capture images of the target structureupon which a reference mark has been implanted. Referencemarks commonly consists of a sequence of high contrast strips(black and white) that serve as markers for the points to

be tracked during the structural vibration. With this settingthe implementation must very accurately identify and trackthe boundary between the transition between the strips, oftenrequiring precision that is below the resolution offered by thenative camera in what is called sub-pixel edge estimation.

The work presented here addresses the problem of fast andaccurate sub-pixel edge estimation in 1-D images obtainedfrom line-scan cameras, by exploring alternative implemen-tations using linear regression with optimized numeric datarepresentations. Given stringent performance and accuracyrequirements (e.g. harmonic elimination during vibration) theimplementation of a feasible algorithmic solution will requirededicated hardware using, for flexibility and cost, a Field-Programmable Gate Array (FPGA) device streaming the datafrom a low-cost CMOS line-scan camera. We implementedvarious algorithmic solutions on an FPGA using a commer-cially available high-level synthesis tool aiming at reducingthe used FPGA resources while maintaining a specific pixelestimation latency.

The work presented here makes the following contributions:

• Describes a simple linear regression method and practicalimplementation using fix-point numeric representationsof the sub-pixel estimation problem suitable for use witha low-cost line scan camera;

• Explores the trade-off between the numeric precision inthe algorithm and the observed sub-pixel estimation errorfor the linear regression method used;

• Presents real implementation results exploiting value-based optimizations and loop transformation for anFPGA-based system. The designs exhibit very low la-tency making them suitable for real-time applications.

The design-space exploration described in our work is fairlycommon in a wider context where designers aim at developingcustom (and often reprogrammable) designs using FPGAs thatmeet specific latency (throughput) requirements while beingable to sacrifice some numeric accuracy but without detrimentfor the final computation usefulness or quality of solutions. Tothis effect we describe how this exploration can be expressedand exploited with a domain-specific language (DSL) [2]. Theuse of this DSL allows designers to quickly specify numeric-related and mapping-related program transformations from a

Page 2: Design Space Exploration for High-Accuracy 1-Dimensional ...instead/Papers/paper9.pdf · resolution image analysis for which traditional edge-detection techniques are ineffective.

high-level source C code for its translation to HDL. We seea great need for tools that can automate this search, in arobust and predictable fashion for many competing designs.Our experience with the test cases described here is a clearindication that this automation is extremely desirable.

The remainder of this paper is structured as follows. In thenext section we describe some background work in the contextof non-invasive edge-detection using image processing tech-niques with particular emphasis on sub-pixel estimation. Wethen describe our algorithmic approach along with its practicalimplementation considerations in section III. In section IVwe present experimental results of the implementation of thedescribed approach using an FPGA-based hardware design andwe conclude in section V.

II. RELATED WORK

Given the economical impact of anomaly detection invarious manufacturing and production industries, it is notsurprising that there has been a large body of research andimplementation on automated quality control and inspectionsystems. Many of these systems have features very similar toour work with varying degrees of performance and cost.

Various edge-detection numeric estimation approaches havebeen proposed and analyzed in the literature (see for instancethe survey by Naidu and Fisher [3]). As an example, Li et.al [4] use a discrete Chebyshev polynomial estimation algo-rithm rather then a simple linear regression method whereasYa-ceng et al [5] use a curve fitting gradient computation todetermine sub-pixel location. In neither of these works the au-thors present to real practical implementation of their method.In a similar work with a specific hardware implementation aprecision of 1/5 of a pixel is attained [6].

The work by Hussmann [7] is very similar in terms ofperformance and overall approach. Its major, and significantdifference is that it relies on a high-performance (and thereforeexpensive) camera. As a result it does not handle noise wellrendering inappropriate for the mechanical vibration settingwe aim at. Their approach is geared towards high-quality butstable analysis of edges and is naturally used in the contextof quality control. In a similar application domain otherresearchers have used high-speed line cameras and spectralmethods such as Fast-Fourier Transforms (FFT) to determinefor seismic structural displacement [8].

Many other authors have focused almost exclusively onaddressing hardware micro-architecture aiming at improvingthe data availability for the core computation of their algo-rithmic choice. This research (see e.g. [9]) thus focuses onsophisticated data storage and streaming solutions for selectededge-detection algorithms (say the Sobel algorithm) and for asingle numeric precision implementation.

The work described here explores a very specific combina-tion of factors. First, we use a commercial off-the-self (COTS)inexpensive line-scan CMOS sensor capable of fairly highframe rates. This is key for the reaction times of the excitationsof the materials to which defects are being analyzed. Second,and to cope with high frame rates we cannot make use of

sophisticated interpolation methods. Instead we explore theuse of fixed point numeric representations combined with alinear regression procedure to provide high-accuracy sub-pixelcalculation within tight timing constraints.

III. ALGORITHMIC APPROACH, SIMULATION AND SETUP

We now describe our approach to the sub-pixel edge detec-tion problem using linear regression which we explore in ourimplementation. We begin by a description of the algorithmfollowed by a description of the use of simulation to fine-tunespecific parameters such as numeric precision and number ofthe selected pixels the algorithmic uses. This is a key step forthe hardware implementations described in section IV.

A. Sub-pixel Edge Detection using Linear RegressionThe sub-pixel edge estimation algorithm operates on 1-D

images with one (or more) clear stripes placed over a darkbackground. Each frame consists of a vector of pixels thatrepresent the light intensity captured by the image sensor: nearblack (or values close to zero) for the background and white(or a high value) for the white stripes. The color transitions(black-to-white and white-to-black) represent the image edgeswhose position in the image has to be calculated within afraction of pixel accuracy.

The calculation of the sub-pixel edge detection consists oftwo major steps. First, we derive the linear function that bestapproximates a selected set of points that lie in the lineartransition region of the pixels. By setting a fixed number ofpixels (points) and offsetting the base reference x-value as thenumeric zero, we can make several important numeric simpli-fications as many values can be precomputed as constants. Asecond step consists in a simple calculation to determine theintersection of the line found in the first step with a specificthreshold value (the mid distance between the black and whitelevels), thus yielding the sub-pixel axis value.

We have determined experimentally that good results areobtained with a small set of pixels along the black-whitetransition region. If the image is sharply focused, the edges arevery steep and on average only 2 to 3 pixels will stay in thelinear region of the transition. This setting with a low numberof pixels has proven to be too short for the linear regressionmethod. On another extreme, if the image is very unfocused,the image edges exhibit a “S” shape that does not correlatewell with a linear function thus requiring a large number ofpixels for an accurate linear regression. One effective tradeoffis to slightly unfocused the image, in order to (i) obtain aclose to linear transition region while (ii) keeping the numberof pixels in this region as small as possible. Using MATLAB

TM

simulations we found empirically that a number of pixelsbetween 5 and 8 leads to good results with images as shownin figure 2.

The simultaneous equations for finding the linear functiony = m.x + b that best approximates, using the least squaresmethod, a set of N points, (xi, yi), are:{

m∑Ni=1 xi + b

∑Ni=1 1 =

∑Ni=1 yi

m∑Ni=1 xi

2 + b∑Ni=1 xi =

∑Ni=1 xiyi

(1)

Page 3: Design Space Exploration for High-Accuracy 1-Dimensional ...instead/Papers/paper9.pdf · resolution image analysis for which traditional edge-detection techniques are ineffective.

If N and the x values are known, the coefficient matrixcan be pre-computed as a set of constants and only the RHSvector needs to be calculated for each new set of pixels. WithN = 6 and the x values (pixel position) referred to zero (i.e.x = (0, 1, 2, 3, 4, 5)), equation 1 becomes:{

15m + 6b =∑Ni=1 yi

55m + 15b =∑Ni=1 xiyi

(2)

The solution is easily found with the inverse of the coeffi-cient matrix, yielding:{

m = 6105b1 − 15

105b0

b = −15105 b1 + 55

105b0(3)

where b0 =∑Ni=1 yi and b1 =

∑Ni=1 xiyi. With this linear

approximation for the set of pixels, the sub-pixel edge positionis computed as (threshold − b)/m, where threshold is thesame pixel intensity level used to select the pixels to use forthe linear regression.

B. Experimental Setup

Our experimental setup consists of a low-cost line-scancamera using a CMOS sensor (Hamamatsu S10077), capableof scanning 1024 8-bit pixel with a maximum frame rate of976 fps, and fitted with a commercially available 50 mm lens.

The line-scan camera was positioned in front of a targetmade of two stripes of thick white paper approximately 5 mmwide on top of a black cardboard. The target is attached to alinear positioning table that can be moved with a resolutionof 0.1µm as depicted in Fig. 1.

Fig. 1. Experimental setup for the sub-pixel edge detection system.

With this setup we captured a set of image frames whilemoving horizontally the target across 1mm of displacement.An initial displacement of 200µm is performed in 20 stepsof 10µm each, capturing a total of 16 images for eachposition. Then, 16 image positions were captured in 50µmsteps, moving the target an additional 0.8mm to the right.Figure 2 shows one sample captured image. The horizontalaxis is the pixel number and the vertical axis represents thepixel intensity.

Fig. 2. Example of a partial image captured by the line-scan camera. Thecircled dots (red) are the pixels selected in each edge for the linear regression.

The linear regression procedure is applied to the four setsof 6 pixels selected in each edge, as shown in figure 2. Thesub-pixel estimation of the target position is computed as theintersection between a horizontal line at intensity 150 and thefour straight lines calculated via linear regression.

C. Simulation Results

The algorithmic approach described includes a handful ofparameters that need to be defined ahead of a hardwareimplementation to meet specific performance metrics whileexploring the trade-off between speed and accuracy. To eval-uate this trade-off we first used a full precision MATLAB

TM

model to evaluate the precision results with different numbersof pixels in the transition regions given the level of imagefocus and the number of edges. We then, developed a fix-pointnumeric code version of the algorithm in C and synthesized itwith the high-level CatapultC [10] synthesis tool. The resultinghardware design was then simulated with ModelSim

TMand the

results compared against the MATLABTM

model results. Wedefined as quality metrics the absolute difference between thereal position (as defined by the linear positioning table) andthe estimated position, as well as the maximum position errorobserved for the selected test images.

Regarding the estimation we use two approaches, both usingthe raw sensor images. The first approach, labelled single-edge, approximates the target position with only one edge,thus requiring a single regression. A second approach, labelledmulti-edge, estimates the target position by averaging thepositions calculated using four edges. As expected, this latermethod improves significantly the accuracy in the estimatedpositions, reducing the maximum position error from 16.6µmfor the single-edge approach to only 6.6µm for the multi-edgeapproach (see figures 3 and 4). In the used setup each pixelmaps to approximately 180µm, the maximum position errorof 6.6µm corresponds to approximately 0.033 pixels.

To further improve the estimation accuracy, we have imple-mented a 8-tap averaging-filter along the pixels of each imageand a sliding window averaging-filter along time, using the

Page 4: Design Space Exploration for High-Accuracy 1-Dimensional ...instead/Papers/paper9.pdf · resolution image analysis for which traditional edge-detection techniques are ineffective.

Fig. 3. Positions calculated from 0µm to 200µm (320 frames, blue) usingone edge (left) and averaging the results for the 4 edges (right). The redcontinuous line represents the real target position.

Fig. 4. Absolute difference between the real position of the target and theposition estimated by edge detection for the single-edge implementation (left)and the multi-edge approach (right), for all the 592 frames.

last 4 acquired frames. Both filters are implemented using theinternal FPGA block RAMs as data buffers and require verylittle additional logic, as described in section IV.

The results obtained with this approach are depicted infigure 5. This enhanced solution exhibits a maximum positionerror of only 4µm (or 0.022 pixel). Longer averaging filtersfurther improve the position accuracy but require higher FPGAresources (mainly block RAMs) exhibiting a lower bandwidthresponse due to the low pass behavior of the average filter.

Fig. 5. Positions calculated (blue) after applying the averaging filters to theimage (left) and the position errors after applying the averaging filters (right).

IV. HARDWARE SYNTHESIS AND RESULTS

We now describe the overall hardware architecture thatimplements the sub-pixel edge-detection algorithm along withits use in the setup described in section III. Next we presentthe synthesis methodology and obtained results followed bya discussion of the opportunities for using a domain-specificlanguage for design-space exploration (DSE).

A. Hardware Architecture

Figure 6 depicts the hardware organization of the digitalsystem implemented on an FPGA. The interface with theline-scan sensor sets the camera integration time, deserializesthe serial data from the camera and provides a stream of 8-bit pixels with a end-of-frame signal to the pixel extractionmodule.

Fig. 6. Hardware system architecture for sub-pixel edge-detection.

The pixel extraction block compares the pixels receivedagainst a pixel value threshold and stores (in a first dual-port memory) a set of 6 pixels in each edge that lie aroundthat threshold value. These are the pixels used for the linearregression approximation. The position of the first pixel ofeach set is also registered (the pixel index or its x coordinate).As the linear regression considers the x values of the pixels ineach edge translated to zero, these values are necessary to re-translate the sub-pixel result to its absolute x position along theimage. Next, the sub-pixel edge calculator computes the linearregressions for each of the four 6-pixel sets and calculates theintersection of the linear functions with the pixel thresholdvalue (for the set of test images used, similar to figure 2, afixed threshold value of 150 has been defined). The output ofthis block’s computation is a set of four 13-bit unsigned values(with 10 bits for the fractional part) which are stored into asecond dual port RAM.

B. Implementation via High-Level Synthesis

The hardware block responsible for the linear regressionand threshold calculation was synthesized from the C codespecification depicted in figure 7 using the CatapultC high-level synthesis tool (University version) [10]. The input datavector y[32] is synthesized as an interface to a synchronousRAM which in turn is connected to the dual-port RAM blockholding the pixels extracted from the image. The th input isa 8-bit unsigned value defining the threshold level to computethe edge intersection. The result of the computation is a vectorof 4 values x[4] that is also translated to a RAM interface.

The fixed-point type ac_fixed<> defines the number ofbits for the integer and fractional part, whether it is signed orunsigned, and optionally the rounding mode to be implemented(similarly data type ac_int<> defines an integer).

As the bit widths chosen for a fixed-point custom hardwaredesign may significantly impact the hardware complexity andperformance, we explored this effect by varying the numberof bits of the fractional parts of the numeric fixed-point

Page 5: Design Space Exploration for High-Accuracy 1-Dimensional ...instead/Papers/paper9.pdf · resolution image analysis for which traditional edge-detection techniques are ineffective.

void edgeDetection (ac_fixed<8,8,FALSE> y[32],ac_fixed<8,8,FALSE> th,ac_fixed<13,3,FALSE,AC_RND> x[4]){ ac_fixed<11,11,FALSE> b0;ac_fixed<12,12,FALSE> b1;ac_fixed<13,1,TRUE,AC_RND> as[3];ac_fixed<28,16,TRUE,AC_RND> a1, a2;ac_fixed<26,16,TRUE,AC_RND> m, b;ac_int<5,FALSE> i, ix0;ac_int<5,FALSE> k;ac_int<5,FALSE> k1;// the inverse matrix coefficients:as[0] = 6 / 105.0;as[1] = -15 / 105.0;as[2] = 55 / 105.0;for(k=0; k<4; k++) { // for each edge:b0 = 0; b1 = 0; k1 = k << 3;for(i=0; i<6; i++) { // compute the RHS terms:

ix0 = k1 | i;b0 = b0 + y[ix0];b1 = b1 + y[ix0] * i;

}// slope: ma1 = as[0] * b1; a2 = as[1] * b0;m = a1 + a2;// Y-crossing: ba1 = as[1] * b1; a2 = as[2] * b0;b = a1 + a2;// intersection with the threshold:x[k] = (th - b) / m;

}}

Fig. 7. C code for Sub-Pixel Edge Detection.

representation in our designs. In the code in figure 7, thenumber of bits for the integer parts have been constrainedby the maximum values that may be assigned to them, inorder to prevent overflow. The number of bits of the fractionalparts were defined by reducing progressively the data widthand comparing the absolute positions, calculated using theMATLAB

TMmodel. The bit widths shown here are the smallest

that guarantee a maximum deviation of the edge-position errorless then a factor of 10 from the MATLAB

TMmodel.

C. Hardware Synthesis Results

We now present hardware synthesis results for a set ofhardware designs that implement the sub-pixel edge-detectionalgorithm (and its variants) described in section III. We usethis design to program an FPGA device fitted on a board thatinterfaces with a PC via a serial interface, used for communi-cation of the sub-pixel calculations for further processing.

We use as our baseline hardware design, the C sourcecode depicted in figure 7. We use the CatapultC high-levelsynthesis tool [10] and the Precision Synthesis Tool [11]from Mentor Graphics Inc. targeting the Xilinx Spartan-3FPGA [12]. We synthesized the design with increasing targetclock frequencies, respectively 10, 20, 50, 100 and 150 MHz.Surprisingly, the lowest latency was obtained with the targetclock of 20 MHz. For space considerations we present hereonly synthesis results for the best cases with target clock ratesof 20 MHz and 50 MHz setting as synthesis goals FPGAarea and design latency. For all these base designs we used

the fix-point representation indicated in the code in figure 7.We also explored the use of constant values in some of themultiplications but found this “optimization” not profitable.

The synthesis results are depicted in table I. For each design,identified by the target clock rate and synthesis goal, wepresent its maximum clock frequency estimated after RTLsynthesis, the number of LUTs and Flip-Flops (FFs) and thenumber of block multipliers. Lastly, we report latency in termsof clock cycles as well as wall clock time, considering themaximum frequency allowed for each design.

Number Block Clock Latency LatencyDesign LUTs / FFs Mults. (MHz) (cycles) (µsecs)

20MHz, area 1535/158 3 77 54 0.70120MHz, latency 1553/145 4 79 50 0.63350MHz, area 1536/209 3 99 82 0.828

50MHz, latency 1536/175 4 86 78 0.907

TABLE ISYNTHESIS RESULTS FOR XILINX SPARTAN-3 FPGA FOR BASE NUMERIC

REPRESENTATION AND WITHOUT CONSTANT MULTIPLIERS.

To try to increase the performance even further we activatedthe optimizations “loop unrolling” and “loop pipelining” inCatapultC, for the two core loops in this algorithm. Table IIpresents the synthesis results of the designs resulting fromthese loop-based transformations. For space considerations weonly include design results for a target clock rate of 20MHz,without any constant multipliers and having as synthesis goalthe lowest design latency.

Number Block Clock Latency LatencyDesign LUTs/FFs Mults (MHz) (cycles) (µsecs)

pipe inner-loop,Init. Interval = 1 1553/145 4 79 50 0.633pipe outer-loop,Init. Interval = 1 6099/362 5 51 31 0.608pipe outer-loop,Init. Interval = 2 3026/228 5 51 54 1.059pipe outer-loop,Init. Interval = 3 3021/224 5 51 77 1.510pipe outer-loop,Init. Interval = 4 1540/137 5 53 100 1.887unroll outer-loopunroll inner-loop 5992/623 5 25 41 1.640

TABLE IISYNTHESIS RESULTS FOR XILINX SPARTAN-3 FPGA FOR DESIGNS WITH

TARGET CLOCK RATE OF 20 MHZ; NO CONSTANT MULTIPLIERS AND WITHLATENCY SYNTHESIS GOAL, WHEN EXPLORING LOOP UNROLLING AND

LOOP PIPELINING.

As can be observed the use of pipelining in the outer loopleads to a slight reduction in the latency but at the heftyexpense of a three-fold increase in FPGA resources. The bestperforming design is still surprisingly the smallest one fora pipelining of the innermost loop and no loop unrolling.Pipelining the outer loop leads to a substantial increase ofFPGA resources and only in the case of an initiation intervalof 1 can the resulting design achieve an overall latency ofless than 1µsecs. This latency requirement is key to completethe whole computation within the time between two adjacentframes.

Page 6: Design Space Exploration for High-Accuracy 1-Dimensional ...instead/Papers/paper9.pdf · resolution image analysis for which traditional edge-detection techniques are ineffective.

aspectdef fixed_point_precisionselectfunction{name=="edgeDetection"}.var

endapply

$var.def type={fixed:[28,16,true,"AC_RND"]};endcondition

$var.name=="a1" || $var.name=="a2"end

end

aspectdef unroll_loopsselectfunction{name=="edgeDetection"}.loop{type=="for"}

endapply $loop.optimize(kind:’loopunroll’); endcondition$loop.numIterIsConstant &&$loop.num_iter <= 32 &&$loop.is_innermost

endend

Fig. 8. Numeric Precision and Source Code Transformations Specifications.

We also run the C code in figure 7 on a MicroBlaze softprocessor implemented on an Spartan3E FPGA and executedat 50 MHz. A floating point software version (single precisionwithout FPU) requires 3.2msecs to complete whereas a codeversion that uses fixed-point arithmetic with 16 fractionalbits executes in 100µsecs. Still, the synthesized hardwareimplementation using our approach exhibits a 150× speedupover the fastest fixed-point software version.

D. Design Space Exploration

As the results presented above clearly indicate there are verysurprising combinations of transformations that lead to the bestfeasible design. Exploring the entire set of possible designsolutions is clearly very tedious and extremely error-prone,rendering an exhaustive exploration humanly impractical. Toaddress this issue we are exploring an automated approachusing a domain-specific language [2] that allows for designersto convey to transformation tools domain-specific knowledgeand code transformations along with its corresponding parame-ters. The toolchain then use these specification to automaticallytransform high-level C code as the one presented in figure 7and use the resulting code for synthesis to generate a completehardware solution.

The two examples depicted in figure 8 illustrate specifica-tions for a selected set of variables in the code in figure 7 andfor the application of loop unrolling transformations.

Such an interface to design-space-exploration would, webelieve, tremendously improve design productivity and enableto automate the search of specific parameters values andtransformations options for sophisticated designs as is theexample of the one described here.

V. CONCLUSIONS

We described in this paper the application of a linearregression numeric algorithmic approach to the problem of

edge-detection sub-pixel estimation. The target applicationdomain for our solution, finding structural defects, requireshigh-precision and real-time performance in addition to a lowcost solution. We have developed and evaluated the results ofa design space exploration for FPGA-based implementationsusing a high-level synthesis tool. Our solution exhibits 1µsecslatency for edge-detection and is accurate within 0.022 of thecorresponding pixel-to-axis value. These results also revealthat while linear regression is a simple algorithmic approach,its implementation leads to potentially very large design spacesfor which an automated tool as the one we are pursuing ishighly desirable.

ACKNOWLEDGMENT

This work was partially supported by the European Com-munity’s Framework Programme 7 (FP7) under contract No.248976. Any opinions, findings, and conclusions or recom-mendations expressed in this material are those of the author(s)and do not necessarily reflect the views of the EuropeanCommunity. The first and second authors have also beenpartially funded by the Portuguese Science and TechnologyFoundation (FCT - Fundacao para a Ciencia e Tecnologia) un-der grant number PTDC/EEA-AUT/108180/2008, and INESC-ID/INESC-TEC under its multi-annual funding through thePIDDAC Program funds.

REFERENCES

[1] H. Alonso, P. Ribeiro, and P. Rocha, “An Automatic System for On-lineChange Detection with Application to Structural Health Monitoring,” inProc. of the 2009 IEEE Annual Conf. on Industrial Electronics (IECON),Nov. 2009, pp. 3309 –3314.

[2] J. Cardoso, J. Teixeira, J. Alves, R. Nobre, and P. Diniz, “SpecifyingCompiler Strategies for FPGA-based Systems,” in Proc. of the 2012 Intl.Symp. on FPGAs for Custom Computing Machines (FCCM), Apr. 2012.

[3] D.Naidu and R. Fisher, “A Comparative Analysis of Algorithms forDetermining the Peak Position of a Stripe to Sub-pixel Accuracy,” inProc. of the 2011 British Machine Vision Conf. (BMVC), 1991.

[4] Y.-S. Li, T. Youg, and J. Magerl, “Subpixel Edge Detection andEstimation with a Microprocessor-Controlled Line Scan Camera,” IEEETrans. on Industrial Electronics, vol. 35, no. 1, Feb. 1988.

[5] S. Ya-ceng, C. Jing, and T. Jun-wei, “The Study of Sub-Pixel EdgeDetection Algorithm Based on the Function Curve Fitting,” in Proc. ofthe 2nd Intl. Conf. on Information Engineering and Computer Science(ICIECS), Dec. 2010, pp. 1–4.

[6] L. Chang-Ming and X. Guo-Sheng, “Sub-pixel Edge Detection Basedon Polynomial Fitting for Line-matrix CCD Image,” in Proc. SecondIntl. Conf. on Information and Computing Science, 2009.

[7] S. Hussmann and T. Ho, “A High-speed Subpixel Edge DetectorImplementation inside a FPGA,” Real-Time Imaging, vol. 9, no. 5, pp.361 – 368, 2003.

[8] M. Nayyerloo, X.-Q. Chen, J. Chase, A. Malherbe, and G. MacRae,“Seismic Structural Displacement Measurement using a High-speedLine-scan Camera: Experimental Validation,” in Proc. of the 2010 NewZealand Society for Earthquake Engineering Conference, 2010.

[9] C. Moore, H. Devos, and D. Stroobandt, “Optimizing the FPGA MemoryDesign for a Sobel Edge Detector,” in Proc. of the Engineering ofReconfigurable Syst. and Algorithms (ERSA), 2009, pp. 496–499.

[10] T. Bollaert, “Making the case for High-level Synthesis,” Mentor Graph-ics, Inc., Tech. Rep., Aug. 2010.

[11] Mentor Graphics Corp. (2010) The Pre-cision Synthesis Tool. [Online]. Available:http://www.mentor.com/products/fpga/synthesis/precision rtl plus

[12] Xilinx Corp. (2007) Spartan-3TM

Platform FPGAs: Complete DataSheet. [Online]. Available: http://www.xilinx.com