The Radon transform on a dynamically switched transputer network

CONCURRENCY: PRACTICE AND EXPERIENCE. VOL. 3(4). 315-323 (AUGUST 1991)

The Radon transform on a dynamically switched transputer network G. HALL AND TJ. TERRELL Department of Cottpting and Electronics Laneashire Polytechaic Preston PRI ZTQ, UK

LM. MURPHY Space D e p a r ~ t ~ Royal Aerospace Establishment Farnbormgh CU14 6TD. UK

SUMMARY Previously reported work resulted In a transputer tree network using thirty tramputers to perform the Radon transform. The Radon transform was used to enhance linear features In noisy synthetk aperture radar images. The work presented here describes modification of the transputer network by the addition of INMOS BOO4 link switches, and controlling T222 transputer. The tree configuration is replaced such that link switches now dynamically switch between the image data source transputer (farmer) and the transputers which perform the calculations (workers). It k shown that thk arrangement is more cost-effective since the worker transputers are all relieved of their data-routing role. The paper Includes a comparison of the previous tree network with the new switched network, and presents results for both techniques.

1. INTRODUCTION

This paper briefly describes the Radon transform and its application to the enhancement of linear features in noisy images. Previous work carried out to perform the Radon transform on a network of thirty transputers is briefly described, together with results which illustrate some of the shortcomings of the system. The main emphasis of this paper is on a modified version of the transputer system which uses link switches to implement a more effective allocation of work through the transputer network.

2. THE RADON TRANSFORM

The Radon transform of an image intensity function f ( x , y ) defined on twodimensional Euclidean space is given by:

f ( x y ) S(p - x cos 0 - y sin 0) dx dy

where 6 @ ) is the Dirac delta function. The term 6@ - x cos 0 - y sin 0) forces the integration of f ( x , y ) along the line

p - x cos 0 - y sin 0 = 0, and consequently the value R@,@ for any line @,0) is the sum of the values off(x,y) along this line. Each line integral in image spacef(x,y) produces a point function in feature space @,O).

The integration process causes noise contributions along lines in an image to tend to cancel (leaving an average @C) image intensity contribution), whereas contributions derived from a linear feature tend to be accentuated. The signal-to-noise ratio of the

1040-3 10819 1x)403 15-O9$05.OO @Chuoller HhfSO London 1991

Received 4 December I990 Revised 13 May I991

G. HALL. T.J. T E m L L AND LM. MIJRF'HY 3 16

point in Radon feature space is therefore higher than that of the linear feature in image space which produced that point. The Radon feature space representation can be used directly to detect the positions of linear features in image space, or a computer simulation can display a corresponding synthesized feature space line diagram, Alternatively, non- linear enhancement may be applied in Radon feature space, and an inverse Radon transform applied, to give an enhanced image space representation. Both alternatives have been used to advantage for synthetic aperture radar ( S A R ) images which are inherently prone to speckle noise[1,2,3]. Figure 1 shows a typical noisy 256 x 256 pixel S A R image containing a linear feature. Figure 2 shows the Radon transform of this image, and Figure 3 shows the Radon transform after square law contrast enhancement. The improvement in signal-to-noise ratio of the point in Radon space, compared to the linear feature in image space is obvious by inspection of these Figures. Figure 4 shows a synthesized image space representation produced by back-projecting the data shown in Figure 3113. Figure 5 shows an image space representation obtained by applying an inverse (modified) Radon transform to the data shown in Figure 3[2].

A reduction in the computational requirements of the Radon transform can be achieved by the use of the frequency domain representation of the image function,f(x,y). The Fourier Slice Theorem[41 lies at the heart of the method as it relates the one-dimensional Fourier transform of a projection of a function f(x,y) to the two-dimensional Fourier nansform off(x,y). If F(u,v) is the Fourier transform of the two-dimensional image function f ( x , y ) , with Pe@)as a projection at angle 8 across f ( x , y ) , and if S e ( W ) is the Fourier transform of PO@), then

S e ( W ) = F(w cos 8, w sin 0)

In order to obtain the Radon transform, R@,@ via the Fourier slice method, the following steps are implemented:

(a) Compute the two-dimensional Fourier transform of the imagef(x,y) with the origin at the centre, to give F(u,v).

(b) Take slices through the centre of F(u,v) at various angles 8. Compute the inverse one-dimensional Fourier transforms of these slices to give the projections at various angles of 8. The Radon transform is then a plot of the projections against their corresponding angles.

The S A R images are discretized, comprising 256 x 256 (or 512 x 512) 16-bit pixels. The discrete Fourier transform (DFT)[5] is therefore used in implementing the Radon transform. In practice the actual software used to implement the Radon transform takes advantage of the fast Fourier transform (FFI')[51 which is a computationally efficient way of computing a DFI'. The program is based on the Cooley-Tukey successive doubling algorithm, and is written in occam2. The twodimensional FFT can be achieved by performing a one-dimensional for each row of image data, followed by a one- dimensional FFT on each column of results. The Radon transform is implemented by taking inverse one-dimensional FFTS of slices at various angles through F(u,v). The inverse one-dimensional FIT can be obtained using the computation for the forward FIT, with the following modificationsU1:

THE RADON TRANSFORM 317

Figure I . Typical SAR image

Figure 2. Radon transform of Figure 1 Figure 3. Effect of contrast enhancement of Figure 2

Figure 4. Synthesized image space Figure 5. Inverse Radon lrunsform of Figure 3 representation of Figure 3

318 G. HALL, T.J. TERRELL AND LM. MURPHY

(i) take the complex conjugate of the function whose inverse FFT is required. (ii) perform the forward FFT and multiply the results by N, the number of samples

(iii) take the complex conjugate of the results. (typically 256 or 512 for S A R images).

The Radon transform therefore involves repeated use of a one-dimensional FFT computation.

3. TRANSPUTER TREE NETWORK

A network of twenty-seven T800-20 transputers programmed in occam has been built to perform the Radon transform within a few seconds. As explained in references 111, 121, and [3], this compares with several minutes’ execution time using a Prime 750 computer. The aim of the transputer network was to reduce the execution time to approximately one second. The proposal to use a tree network of twenty-seven T800-20 transputers was based on initial transputer work using a single T414-15 transputer which achieved an execution time of four minutes. Initial benchmarks performed by INMOS prior to the launch of the T800-20 suggested a factor of ten increase in performance of the T800-20 compared to the T414-15 for floating-point computations. This suggested that twenty-four TSOO-20s would be required. A tree network was proposed since this would minimize the lengths of communications paths between the image source and the worker transputers. The symmetrical of a tree comprising twenty-seven transputers meant that a high degree of software standardization within the network could be achieved. The network is shown in Figure 6.

The ‘image transputer’ at the top of the tree holds the image whose Radon transform is required, and has 4 Mbytes of Dynamic RAM. The ‘system transputers’ perform forward or inverse one-dimensional FFTs on sections of image data allocated to them on a farming basis and on boards designed in-house (‘TX4’ board) containing four transputers each with 256 Kbytes static RAM. These boards have links 2 and 3 joined to other transputers on the same board, and links 0 and 1 connected to the edge connector. Each of the groups of three ‘B’ transputers and the associated ‘A’ transputer represents one TX4 board. The two ‘A’ transputers below the ‘image transputer’ are mounted on a modified TX4 board which contains only two transputers, but with all four links brought out to the edge connector.

The Radon transform is implemented by the ‘image transputer’ farming out to the tree network, sections of its stored data corresponding to rows, columns, and angular slices. It has been found that the majority of the computation is performed by the transputers labelled ‘B’ in Figure 6. The transputers labelled ‘A’ run parallel computation and farming processes, and the ratio of time spent on computation to time spent farming is lowest for transputers located furthest up the tree. In fact the system performance was found to be improved if all but the ‘B’ transputers were relieved of computation, and were allowed to spend all their time allocating work to the transputers lower in the tree, and sending results up the tree. Therefore in this network comprising twenty-six ‘system transputers’ (workers), only eighteen were actually performing calculations. There are several reasons why system performance was improved with effectively only eighteen worker transputers:

3 19 THE RADON TRANSFORM

I -.--

T GRAPHICS -&BOARD I BOARD SCSl TRANSPUTER

TRANSPUTER

. . . . . . . . . . . . . . . . . . I 26 SYSTEM TRANSPUTERS (each with 256 kbytes RAM)

Figwe 6. Transputer tree network

(i) The limiting factor affecting the performance of the network was found to be the speed at which the ‘image transputer’ could fetch data from its two-dimensional arrays. The row data was fetched using occam segments, and this method did in fact provide an adequately fast data access to make full use of the available ‘system transputem’. Unfortunately, the use of occam segments is limited to data which is contiguous in the X dimension of a multi-dimensional array, so could only be used for row data access. The column data access was implemented using the T800 Block Move instruction, the data having first been retyped to bytes. The fetching of data along slices of the two-dimensional array could only be achieved by using a WHILE loop, the calculation of X and Y co-ordinates being calculated for each element using SINE and COSINE functions, and rounded to integer to give nearest-neighbour interpolation. It was found that the time taken to compute the SINE and COSINE was negligible compared to the time taken to fetch the data from the arrays.

(ii) When the ‘A’ transputers have a parallel computation process, the communication section uses an ALT construct to farm work either to its own computation, or to transputers below in the tree depending on availability of results. With computations limited to the ‘B’ transputers, the need for the ALT is removed, and a more straightforward sequential allocation of work to the three transputers below in the tree can be used.

(iii) All communications are bidirectional so that data down the tree and results up the tree communicate in parallel. The Radon transform comprises three sections which must be performed sequentially, i.e. row FITS then column FFl3 then slice


inverse FFTs. There is a time at the beginning and end of each section (while the tree network is filling and being emptied) when some transputers are idle, and when dummy data is being communicated in one direction of the bidirectional communications links. When the computations are limited to eighteen transputers, the time taken to fill and empty the network is less, therefore the time during which transputers are idle is less. The amount of dummy data which is generated and communicated is also less.

In detail, the performance of the tree network for a Radon transform on 256 x 256 image data (using 256 x 256 FFT and 256 angular slices) was as shown in Table 1. The row FFT times could be improved slightly by giving computation to all twenty-six workers, but this improvement was found to be less than the improvements made to column and slice times by limiting calculations to the ‘B’ transputers.

Table 1. Tree network calculation times with eighteen workers

Row FFI’s Column FFTs Angular slice FFI’s

0.368 seconds 0.545 seconds 1.067 seconds

It should be noted that the link communications speed was 20 Mbits per second on the critical links out of the ‘image transputer’. It should also be noted that all data arrays stored by the ‘image transputer’ were in 32-bit floating-point format., and that two arrays were used, corresponding to real and imaginary data.

4. DYNAMIC WORK ALLOCATION USING LINK SWITCHES

It was obvious from the practical results obtained from the tree network that some T800 transputers were not being used effectively. Eight of the twenty-six T800 ‘system transputers’ were being used for simple data routing. The system has been modified to utilize INMOS COO4 link switches to perform dynamic allocation of work from the ‘image transputer’ to the ‘system transputers’. The COO4 link switch[6] is a 32-way crossbar switch with NMOS link standard inputs and outputs. The link switch was introduced by INMOS to allow static configurations to be set up by program control, i.e. without the need to change interconnection hardware. The application described herein changes the configuration of the transputer network continuously during program execution to give direct connection between the ‘image transputer’ and each worker transputer. All twenty-six ‘system transputers’ are now able to be employed wholly for computation, the switching being achieved by two COO4 link switches controlled by a T222 16-bit transputer. Figure 7 shows a block diagram of the new arrangement using link switches. W o INMOS COO4 link switches are used to connect two links of the ‘image transputer’ to two ‘system transputers’ for allocation of FIT data and return of FFT results. The third available link of the ‘image transputer’ is used to control the T222 transputer to change connection to the next two ‘system transputers’ once data transfer is complete. The software executes as follows:

THE RADON TRANSFORM 32 1

IBM AT-BASED TRANSPUTER DEVELOPMENT SYSTEM (TDS)

- 26 T800 Transputers

Figure 7 . Dynamic work allocation using link switches

(1)

(ii)

The controlling ‘I222 transputer first connects all the transputers in a pipeline, and then the Radon transform transputer code is downloaded from the development system into the transputers. The ‘image transputer’ then repeats the following steps. Communication to the T222 controlling transputer requests connection of two links to two ‘system transputers’. Handshake communication is received to confirm that the links have been successfully set up. Line-identifying words are sent to the two ‘system transputers’. Two lines of data are fetched from two-dimensional arrays and sent to the two ‘system transputers’ in parallel with the return of results and line identifier from the last lines transmitted. This process is repeated, cycling through the ‘system transputers’ two at a time.

The results for the performance of this system for a Radon transform on 256 x 256 image data, using a 256 x 256 FIT, and employing 256 angular slices showed a slight deterioration in performance compared to the tree network used previously. This can be explained by two factors:

(9

(ii)

The COO4 link switch introduces an average 1.75 bit time delay on link trans- mission. There is a software overhead in link switch control. The ‘image transputer’ has to generate control messages to the T222 transputer to indicate when the link switches need to connect to the next two ‘system transputers’, and to receive messages back from the T222 to confirm that connection is complete. It has been found that the time to set up a link via the link switches is 58 microseconds.


The use of this arrangement means that the number of transputers used as workers (i.e. the ‘system transputers’) can be changed by a minor software change to the ‘image transputer’ (modification of a loop variable limit). This enables results to be obtained easily for networks using various numbers of transputers, and a table of results for the Radon transform is included as ’Pable 2. In order to produce a similar table of results for uee networks, the code for several transputers would nesd to be modified to obtain each result, together with modification of the placement section. Furthermore, for many transputer numbers it would be impossible to produce a symmetrical tree network, resulting in inefficient use of transputers (‘hot spots’), and further programming complexity.

Table 2. Performance in seconds for various numbers of workers (note: link communication speed is 20 Mbits/second)

Workers 26 22 12 10 8 6 4 Row FFTs 0.44 0.44 0.44 0.52 0.65 0.86 1.28 Column FlTs 0.62 0.62 0.62 0.62 0.67 0.88 1.29 Angular slice FFTs 1.08 1.08 1.08 1.08 1.08 1.08 1.36

Table 2 shows that the Radon transform can be implemented using twelve worker transputers with the same speed as with twenty-six worker transputers. In practice, the addition of a second ‘image transputer’ (with 4 Mbytes of RAM) would enable the system to work at optimum speed performing two Radon transforms concurrently.

5. CONCLUDING REMARKS

The work has shown that for applications where data is stored at one point and has to be communicated to worker transputers for processing, then the use of link switches to dynamically switch between transputers forms an effective solution. Many image processing applications fall into this category, since the image data often is produced at one source. In the example described in this paper, the system performance was limited by the speed at which the farmer could access data. It has been shown that in this example, the advantage of dynamic link switching, as opposed to the previous static tree network, resulted in fewer worker transputers being required. In systems where the worker computation time is the factor limiting overall performance rather than farmer data access time, it would be expected that the use of dynamic link switching would improve overall performance, since direct communications paths exist between farmer and workers. Preliminary data from INMOS would suggest that this advantage would be available to users of the forthcoming H1 transputer using the ‘virtual channel’ rather than dynamic switching. .

A decision has to be made as to how large to make the sections of data to be processed. The smaller the sections the more switching (and therefore the more switching time), but the less memory each worker transputer requires. In practice the switching time of 50-60 microseconds is short enough to enable small data packets to be used in many applications. The use of small data packets means that the worker transputers only need a relatively small amount of RAM, and this can then be fast static RAM.

THE RADON ‘IRANSFORM 323

ACKNOWLEDGEMENTS

This work has been carried out with the support of the Procurement Executive, Ministry of Defence; in particular the authors wish to thank the Space Department at the Royal Aerospace Establishment, Famborough, for their support.

REFERENCES

1. G. Hall, TJ. Terrell. L.M. Murphy and J.M. Senior. ‘Transputer implementation of the Radon kansform for image enhancement’, Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, pp. 1548-1551, Glasgow. UK, May 1989, IEEE. USA.

2. G. Hall, TJ. Tmell, L.M. Murphy and J.M. Senior, ‘A new fast discrete Radon kansfonn for enhancing linear features in noisy images’, Proceedings IEE Third International Conference on Image Processing and its Applicatwns, pp. 187-191, Warwick, UK, July 1989, IEE. UK.

3. G. Hall, TJ. Terrell, L.M. Murphy and J.M. Senior, ‘A modified radon transform for linear feature enhancement in SAR data’, Proceedings IEEE International CorJerence on Image Processing, Vol. 2, pp. 676680. Singapore, September 1989, RE, UK.

4. A, Rosenfield and A.C. Kak, Digital Picture Processing, 2n d edn. Vol. 1, Academic Press, b n d o n 1982.

5. R.C. Gonzalez and P. Win=, Digital Image Processing. Addison-Wesley. Reading, MA. 1977. 6. INMOS, The Transputer Databook, 2nd edn, INMOS, 1989.

The Radon transform on a dynamically switched transputer network

Documents

Transcript of The Radon transform on a dynamically switched transputer network