A note concerning the closest point pair algorithm

3
Information Processing Letters 82 (2002) 193–195 A note concerning the closest point pair algorithm Martin Richards University Computer Laboratory, New Museums Site, Pembroke St., Cambridge CB2 3QG, UK Received 31 January 2001; received in revised form 1 June 2001 Abstract An algorithm, described by Sedgewick, finds the distance between the closest pair of n given points in a plane using a variant of mergesort. This takes O(n log n) time. To prove this it is necessary to show that, in the merge phase of the algorithm, no more than a constant number of distances need to be checked for each point considered. Cormen, Leiserson and Rivest show that checking seven distances is sufficient while Sedgewick suggests that this should be four. This paper shows that checking three distances is sufficient and that a slight modification of the algorithm reduces the number to two. 2001 Published by Elsevier Science B.V. Keywords: Closest point pair algorithm; Mergesort; Analysis of algorithms 1. The algorithm To find the minimum separation of any pair of n given points in a plane, a list of the points is formed ordered by their x -coordinates and a global variable min, to hold the minimum separation distance, is ini- tialised to infinity. The algorithm then calls a routine, SORT, that is a modified version of mergesort, to return the list of points ordered by their y -coordinates. Dur- ing the merge phase, min is updated whenever a closer pair of points is discovered. If SORT is given a list containing fewer than two points, it returns the list unchanged, otherwise it applies SORT recursively to the first and second halves of the given list in turn. At this stage, the minimum separation of any pair of points in either half will be min. If a closer pair exists, its points must be on either side of an imaginary line separating the two halves with each point no further than min from it. An E-mail address: [email protected] (M. Richards). efficient check for such pairs can be made as the two lists are merged to form the sorted result. If P , the next point to be merged, lies within a distance min of the dividing line, it is compared with a few other similar recently processed points. Cormen et al. [1] show that it is sufficient to check P with no more than seven other points. Manber [2] suggests the number is five while Sedgewick [3] suggests four. This paper shows that checking three distances is sufficient. 2. Proof Let min 0 be the value of min at the start of the current merge phase and let the next point to be merged that is within min 0 of the dividing line be called P . P could potentially be one of a closer pair and is therefore pushed onto a stack of such points. Suppose the next four points on the stack are A, B , C, and Q. They clearly satisfy P y A y B y 0020-0190/01/$ – see front matter 2001 Published by Elsevier Science B.V. PII:S0020-0190(01)00268-X

Transcript of A note concerning the closest point pair algorithm

Page 1: A note concerning the closest point pair algorithm

Information Processing Letters 82 (2002) 193–195

A note concerning the closest point pair algorithm

Martin RichardsUniversity Computer Laboratory, New Museums Site, Pembroke St., Cambridge CB2 3QG, UK

Received 31 January 2001; received in revised form 1 June 2001

Abstract

An algorithm, described by Sedgewick, finds the distance between the closest pair ofn given points in a plane using a variantof mergesort. This takes O(n logn) time. To prove this it is necessary to show that, in the merge phase of the algorithm, no morethan a constant number of distances need to be checked for each point considered. Cormen, Leiserson and Rivest show thatchecking seven distances is sufficient while Sedgewick suggests that this should be four. This paper shows that checking threedistances is sufficient and that a slight modification of the algorithm reduces the number to two. 2001 Published by ElsevierScience B.V.

Keywords: Closest point pair algorithm; Mergesort; Analysis of algorithms

1. The algorithm

To find the minimum separation of any pair ofn

given points in a plane, a list of the points is formedordered by theirx-coordinates and a global variablemin, to hold the minimum separation distance, is ini-tialised to infinity. The algorithm then calls a routine,SORT, that is a modified version of mergesort, to returnthe list of points ordered by theiry-coordinates. Dur-ing the merge phase,min is updated whenever a closerpair of points is discovered.

If SORT is given a list containing fewer than twopoints, it returns the list unchanged, otherwise itappliesSORTrecursively to the first and second halvesof the given list in turn. At this stage, the minimumseparation of any pair of points in either half willbe min. If a closer pair exists, its points must be oneither side of an imaginary line separating the twohalves with each point no further thanmin from it. An

E-mail address: [email protected] (M. Richards).

efficient check for such pairs can be made as the twolists are merged to form the sorted result. IfP , the nextpoint to be merged, lies within a distancemin of thedividing line, it is compared with a few other similarrecently processed points. Cormen et al. [1] show thatit is sufficient to checkP with no more than sevenother points. Manber [2] suggests the number is fivewhile Sedgewick [3] suggests four. This paper showsthat checking three distances is sufficient.

2. Proof

Let min0 be the value ofmin at the start of thecurrent merge phase and let the next point to bemerged that is withinmin0 of the dividing line becalledP . P could potentially be one of a closer pairand is therefore pushed onto a stack of such points.Suppose the next four points on the stack areA, B,C, and Q. They clearly satisfyPy � Ay � By �

0020-0190/01/$ – see front matter 2001 Published by Elsevier Science B.V.PII: S0020-0190(01)00268-X

Page 2: A note concerning the closest point pair algorithm

194 M. Richards / Information Processing Letters 82 (2002) 193–195

Fig. 1.

Cy � Qy . 1 The algorithm checks whether any of thedistances2 PA, PB or PC are less thanmin. It does nothave to checkPQ since, as will be shown, ifPQ < minthenP will be at least as close to one ofA, B, or C.

Consider the 2min0×min0 rectangle centered aboutthe dividing line and havingP on its top edge. Therectangle is composed of two adjacentmin0 × min0squares on either side of the dividing line. Without lossof generality, assumeP is on the top edge of the lefthand square. IfPQ < min, Q must be in the right handsquare andA, B andC must all be in the rectangle. Wehave to consider four cases depending on how many ofA, B andC are in the right hand square.(1) If A, B, C andQ are all in the right hand square

they will have a minimum separation ofmin0 andso can only be at its corners.P will be closest tothe point at the top left corner which cannot beQ.

(2) The next case is when just one ofA, B or C isin the left hand square. Let that point be renamedZ and the other two renamedX and Y . Thearrangement might be as in Fig. 1. It followsfrom Xy � Qy and PQ < PX that Xx � Qx .We can similarly deduceYx � Qx . Without lossof generality, assume thatXx � Yx . SinceQ, X

andY are on the same side they must be at leastmin0 apart, and since no two points in their squarecan be more than

√2min0 apart, we can deduce

angle � QXY � 90◦. This combined withXx �Qx , Xy � Qy andXx � Yx impliesXy � Yy .If the arrangement satisfies the constraints givenabove, then another valid arrangement can beobtained by movingZ to the point on the left edge

1 Px and Py denote thex- and y-coordinates of pointP , andsimilarly for other points.

2 PA denotes the distance between pointsP andA, and similarlyfor other point pairs.

Fig. 2.

at the same level asQ, moving Q horizontallyto the middle edge, movingY horizontally to theright edge and then down to the same level asQ,and, finally, movingX onto the top edge and tothe right untilXY = min0.Clearly this does not increasePQ, andQY remains� min0. SinceX is on or above the circle radiusmin0 centered at the new position ofQ, it is abovethe line joiningP with the new position ofY .Moving X to the top edge of the rectangle, ina direction orthogonal to this line, will thereforeincrease bothPX and PY. So if PX is now <

PQ, this would have been true of the originalarrangement.If we now add a pointR to the top edge such thatRQ is parallel toXY, we have Fig. 2.Note thatRXYQ is a rhombus with sides of lengthmin0. SincePQ < min(� min0), P must be to theright of R forcing thePX � PQ. This shows thatthe distancePQ does not need to be checked whenthere are two other points aboveQ in the rightsquare and one other in the left.

(3) The next case when two ofA, B andC share theleft hand square withP . By swappingP andQ

and inverting the figure this arrangement can beseen to be equivalent to the previous case and sothe same result holds.

(4) Finally, if A, B andC are in the left square withP ,they must be at its vertices.Q must then be on thebottom edge which violated the constraintPQ <

min.We can thus deduce, for all cases, that ifPQ < minthenP is at least as close toA, B or C. In practice,the algorithm only pushesP onto the stack if it is lessthanmin (not min0) from the dividing line. This canonly reduce the number of points in the stack and sodoes not violate the result.

Page 3: A note concerning the closest point pair algorithm

M. Richards / Information Processing Letters 82 (2002) 193–195 195

Fig. 3.

By considering an arrangement as shown in Fig. 3,it is clear that checkingP with only the top two pointsin the stack is insufficient.

3. Final comments

The number of checks can be further reduced to twoby the observation that only two ofPA, PB and PCneed be checked, since, either at least one ofA, B

or C will be on the same side asP , or they will allbe opposite toP , in which casePC will be no smallerthanPA or PB. To implement this, two stacks are used,one for the left side and one for the right.P is pushedonto the stack for its side and then the distances arechecked betweenP and the top two points on the otherstack.

Reducing the number of distance checks from fourto three or even two makes no noticeable speedimprovement in practice since, for typical data, havingmore than one point in the rectangle is extremely rare.

References

[1] T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction toAlgorithms, MIT Press, Cambridge, MA, 1990.

[2] U. Manber, Introduction to Algorithms, A Creative Approach,Addison-Wesley, Reading, MA, 1990.

[3] R. Sedgewick, Algorithms in C, Addison-Wesley, Reading,MA, 1990.