Reconﬁgurable omnidirectional camera array calibration...

Reconfigurable omnidirectional camera array calibration with

a linear moving object

Tarak Gandhi *, Mohan M. Trivedi

Computer Vision and Robotics Research Laboratory, University of California at San Diego, 9500 Gilman Drive 0434, La Jolla, CA 92093, USA

Received 1 August 2004; received in revised form 10 October 2005; accepted 8 February 2006

Abstract

Reconfigurable omnidirectional camera arrays are useful for applications where multiple cameras working together are to be deployed at a short

notice. This paper proposes a multi-camera calibration approach for such arrays using a 1D object such as a person, moving parallel to itself. The

moving object is separated from the background and correspondences between multiple cameras are obtained using location of the object in all the

cameras in multiple frames. The approximate orientation of the cameras is individually determined using vanishing points. The non-linear 3D

problem of multi-camera calibration can then be approximated by a 2D problem in plan view. An initial solution for the relative positions and

orientations of multiple cameras is obtained using the factorization approach. A non-linear optimization stage is then used to correct the

approximations, and to minimize the geometric error between the observed and the projected omni pixel coordinates. Numerous experiments are

performed with simulated data sets as well as real image sequences to evaluate the performance of this approach.

q 2006 Elsevier B.V. All rights reserved.

Keywords: Calibration; Panoramic vision; Stereo vision; Surveillance

1. Introduction and motivation

Omnidirectional cameras which give a 3608 field of view of

the surroundings have been of great interest to vision

researchers [1–3]. A network of multiple omni based camera

arrays can offer advantages of wide area coverage, redundancy,

abstraction at multiple levels such as detection, tracking and

recognition. Arrays of omnidirectional as well as rectilinear

cameras may be integrated with the infrastructure for

applications such as surveillance, people tracking, face

detection, traffic analysis, and infrastructure monitoring. In

addition to the fixed cameras, there may also be arrays of

cameras deployed as needed to work as a part of a larger

system. Such arrays would give advantages of reconfigur-

ability, flexibility, and ad hoc deployment and be of value in

the new generation of Sensor networks capable of monitoring

dynamic events from remote environments. In applications

such as structural health monitoring of civil infrastructures,

0262-8856/$ - see front matter q 2006 Elsevier B.V. All rights reserved.

doi:10.1016/j.imavis.2006.02.020

* Corresponding author. Tel.: C1 858 822 0002.

E-mail addresses: [email protected] (T. Gandhi), [email protected] (M.

M. Trivedi).

reconfigurable video arrays can provide important collabora-

tive visual record.

In order to ensure that multiple cameras would work in an

integrated manner, it is necessary to calibrate the cameras with

respect to each other. However, measuring locations and

orientations of the cameras is laborious procedure. Hence, if

one wants to install such a network at short notice, one should

have a calibration procedure for the cameras that is fast and

preferably automatic.

For example, an array of four fixed omni cameras are used in

[3,4] to perform 3D tracking of persons in an indoor setting.

Fig. 1(a) shows the omni images acquired from four cameras.

This system detects objects using background subtraction as

shown in Fig. 1(b). Using the known location of cameras, it

obtains the location and heights of the targets by triangulation

as shown in Fig. 1(c). We intend to use this system for

reconfigurable array of omni cameras such as the one shown in

Fig. 1(d), for which fast calibration procedure is necessary.

1.1. Related work

Calibration can be performed by registering corresponding

features between a number of cameras. However, when the

cameras are widely separated, finding corresponding features is

often difficult. Due to this reason, some researchers have

proposed the use of moving features to register the images [5].

Image and Vision Computing 24 (2006) 935–948

www.elsevier.com/locate/imavis

http://www.elsevier.com/locate/imavis

mailto:[email protected]

mailto:[email protected]

Fig. 1. (a) Images from the omni cameras with two moving persons. (b) Panoramic views showing object detection. (c) Estimated locations and heights of the

detected persons. (d) Proposed reconfigurable array of omni cameras.

T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948936

In [6], the calibration parameters of a single rectilinear camera

were determined by observing motion of a person in the scene.

Factorization methods have been widely used for

structure from motion problem where images from a single

moving camera is used to determine the scene structure.

Table 1

Classification of calibration methods according to dimensionality (based on [18])

Dim. Calib. object Approach A

3D [19] A non-planar solid or

wireframe structure with

known geometry

Use world and camera coordinates of

object features to estimate camera

parameters

N

p

s

2D [20] Planar object with a

pattern such as checker-

board

Move around the object and estimate

calibration parameters assuming

unknown motion of the calibration

pattern

E

o

a

1D [18] A linear object with small

number of collinear

feature points

Move the object so that one end is

fixed. Solve for calibration using at

least six sets of observations using

closed form linear method followed

by iterative non-linear bundle adjust-

ment

C

b

o

c

i

0D [11] Point object such as laser

pointer covered with

plastic sheet or a bright

bulb

Wave the object in the working area.

Solve structure and motion problem to

estimate relative camera calibrations

M

l

This work

(1D) [21]

A vertically oriented

linear object such as a

moving person

Obtain object location in all cameras

and use factorization and non-linear

least squares to obtain calibration as

well as object positions

U

o

p

Tomasi and Kanade [7] proposed the factorization method

for orthographic cameras, which was extended to para-

perspective cameras [8] as well as perspective cameras

[9,10]. Svoboda et al. [11] have developed a multi-camera

calibration software that uses a modified laser pointer to get

dvantages Limitations

eed to estimate small number of

arameters, efficient, and least

ensitive to ill-conditioning

Expensive to make and use 3D

objects. Impractical for large working

area

asier to construct 2D pattern than 3D

bject. Standard calibration tools

vailable

More parameters to be estimated than

in 3D. Need to deal with multiple

images. Impractical for large working

area

onvenient for multi-camera cali-

ration of large area where 2D/3D

bject may not be seen properly in all

ameras. Less prone to ill-condition-

ng than using point object

Configuration or motion of the linear

object needs to be constrained in order

to determine calibration

unambiguously. Less convenient than

point objects

ost convenient to use. Can cover

arge areas with least effort

Large number of parameters to be

estimated. Possible ill conditioning

the working area is not covered

properly. Can estimate calibration

only up to a scale factor

seful in large areas including

utdoors. Can be integrated with

erson detection and tracking

Accuracy is limited in scale factor due

to errors in localization of the person

100

120

140

160

180

200

(a)

(b)

(c)

ixel

s]

T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948 937

corresponding points in multiple cameras. In this implemen-

tation, self-calibration is performed using factorization

algorithm by Martinec [12], which accounts for outliers

and occlusions.

Chen et al. [13] as well as Baretto and Daniilidis [14] use the

path of a moving point to calibrate a wide area network of

cameras, all of which cannot see the point at the same time. To

introduce robustness to the calibration procedure, a planar

target pattern is used by Pedersini [15]. The target is moved

around to get multiple views in the area of interest, which gives

enough constraints to solve the calibration problem robustly.

Antonne and Teller [16] proposes an algorithm to perform fast

and accurate extrinsic calibration of large camera networks

with every pair of cameras have overlapping FOV. The paper

extends classical registration methods to large-scale and

proposes a probabilistic framework for modeling projective

uncertainty. In [17], an accurate method for calibrating a single

omni-directional camera on a moving platform is described.

As described in [18] these and other calibration approaches

can be broadly divided into the categories shown in Table 1. It

is mentioned that there is not much research in using 1D

calibration objects. He proves that a freely moving 1D object

cannot be used for calibration and describes an approach to use

a linear target moving with one end fixed. The approach

described here and in [21] also uses a 1D object, such as a

moving person to calibrate multiple stationary omni-direc-

tional cameras. However, instead of using the constraint of one

end being fixed as in [18], it assumes that the object is oriented

vertically and moving parallel to itself. Using this constraint,

the non-linear 3D structure from motion problem is projected

to a 2D problem in the ground plane perpendicular to the

person’s axis. This problem is solved using factorization

method to give an initial approximate solution. The solution is

then used in a non-linear optimization framework to account

for the non-vertical camera axes, and minimizing a geometric

error measure between the observed and the projected omni

pixel coordinates.

This paper is an expansion of [21] with details on internal

calibration of omni camera, approach for factorization with

missing points adapted from [22,23], more extensive exper-

imentation with simulated and real data, and a discussion on

comparison between calibration methods.

–0.2 0 0.2 0.4 0.6 0.8 10

20

40

60

80

z/r

d [

y p

Fig. 2. (a) Geometry of a hyperbolic omni camera. The rays towards first focus

of the mirror are reflected towards the second focus, and imaged by a normal

camera. (b) Averaged image from the omni camera set for calibration. An

ellipse is fitted to the FOV boundary, and a number of points having known

world coordinates are marked. (c) Plot of the radial distance d in image against

cos q where q is the angle between Z-axis and object. The curve is drawn using

estimated parameters.

2. Omni camera geometry

The omni cameras consist of a hyperbolic mirror and a

camera placed on its axis [2]. These cameras have a single

viewpoint that permits the image to be suitably transformed to

obtain perspective views. The geometry of the omni camera is

shown in Fig. 2(a). Each camera covers a 360 degrees field of

view (FOV) around its center. However, the images are

distorted with straight lines transformed into curves. This

distortion should be taken into account when performing

calibration. The conversion between the camera coordinates

pZ(x, y, z)T and the omni coordinates qZ(u, v)T is given by

u

v

1

0B@

1CAZC

u0

v0

1

0B@

1CAZ lC

x

y

Kc1zCc2jjpjj

0B@

1CA (1)

Reduction to plan view estimation by assuming vertical symmetry, constant height vector

Factorization to obtain camera parameters and person’s world coordinates in plan view

Non-linear optimization to minimize geometric error and account from deviation from vertical symmetry

C1

DH

C2

C3

Pj2

C4

Pj1

C1W j

C3

C4C2

(a) (b)

Fig. 3. (a) Flow chart for the three stage omni-array calibration algorithm. (b)

Projection of 3D configuration to plan view.


where c1 and c2 are determined by the omni mirror geometry as

well as the focal length of the lens, whereas C is the normalized

calibration matrix of the CCD, with g being the aspect ratio,

and (u0, v0) the camera center

C Z

a g u0

0 1 v0

0 0 1

0B@

1CA (2)

Since, the internal parameters of the omni camera are to be

measured only once, a specialized set-up was used. The omni

camera was set on a tripod, and leveled to have vertical camera

axis. A number of images of a stationary scene were taken and

averaged to reduce noise. The averaged image is shown in

Fig. 2(b). An ellipse was fitted to the boundary of the FOV in

order to extract the parameters of C matrix. A number of

features with known coordinates were marked on the ground

and a vertical pole to cover the FOV of the omni camera. Their

camera coordinates were manually extracted from the image

and used for estimating the mirror parameters.

We assume that the FOV of the omni camera is circularly

symmetric, bounded by:

u02 Cv02 Z r2 (3)

This maps into the ellipse as seen in Fig. 2(b). The equation

in pixel coordinates is given by

aK2½ðuKu0ÞKgðvKv0Þ�

2 C ðvKv0Þ2 Z r2 (4)

which can be written in the standard form:

au2 C2huvCbv2 C2guC2fvCcZ 0 (5)

The parameters ofCmatrix can then be determined by fitting

an ellipse to the set of points on the boundary of the FOV [24].

The equation is normalized to set, aZ1. Using this, we get:

aZ 1; hZg; bZa2 Cg2; gZgv0Ku0; f

ZKa2v0Kgg; cZ g2 Ca2ðv20Kr2Þ (6)

These equations can be readily solved to give:

gZ h; aZffiffiffiffiffiffiffiffiffiffiffiffiffibKg2

p; v0 ZKðggC f Þ=a2; u0

Zgv0Kg; r Zffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffig2KcCa2v20

q=a (7)

Note, that in the case of most cameras, the skew factor g is

negligible. In such case, ellipse center corresponds to (u0, v0)

and aspect ratio (a) is equal to the ratio of the axes lengths of the

ellipse.

Using these parameters in matrix C, the image coordinates

(u, v) can be normalized to give (u 0, v 0). Assuming radial

symmetry around the image center, we have

d Zffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiu02 Cv02

pZ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffix2 Cy2

pc1zCc2jjpjj

Zsin q

c1 cos qCc2(8)

where q is the angle between the Z-axis and the object. Using

the world and image coordinates of these points, the linear

equations in c1 and c2 are formed and solved using least

squares.

ðd cos qÞc1 Cdc2 Z sin q (9)

Fig. 2(c) shows the plot of the radial distance d in image

domain against cos q of the sample points, and the curve fitted

using estimated parameters. It is seen that the curve models the

omni mapping quite faithfully. Non-linear least squares can

then be used for improving the accuracy.

3. Three-stage calibration algorithm for reconfigurable

omni arrays

The flow chart of the calibration method is shown in Fig. 3.

Initially, it is assumed that the camera axes as well as the person

are nearly vertical. Using this approximation, the non-linear 3D

problem of structure from motion is reduced to a 2D problem in

plan view coordinates. Using the plan view coordinates of

person’s position in all cameras, theworld coordinates aswell as

the camera orientation and location is obtained using factoriz-

ation method. The solution obtained is then used in a non-linear

least squares optimization framework, minimizing the geo-

metric error measure between the observed and projected pixels

in the omni camera domain, and also incorporating the

possibility of camera axes not being exactly vertical.

Let Ri, Di be the rotation and translation parameters of

cameras iZ1,., M, and Pj be the world coordinates of a

reference point on the object in image frame j. Let kZ1,.,K

points on the reference objects be at the heights H1, H2,.,HK

from the reference point in the world coordinate system. For

example, if the object is a person with height DH, one can haveKZ2 points with H1Z(0, 0, 0) and H2Z(0, 0, DH)representing the top and bottom of the person. If the height is

unknown, one can take DHZ1 and determine the calibration

within a scale factor.

The world coordinates of these points would be PjkZPjCHk

and the coordinatesPijk in the camera i system are then given by:

pijk ZRTi ðPj CHkKDiÞ (10)


Let the omni image coordinates of the respective point using

Eq. (1) be denoted by qijkZ(uijk, vijk)T.

3.1. Reduction to plan view

Since this step is applied separately to each camera i and

image j, let the subscripts i, j denoting the camera and image be

temporarily dropped for the sake of brevity. The omni image

coordinates qk of each object point k are converted to

homogenous coordinates pk with jjpkjjZ1 by solving Eq. (1)

pZ1

1Cc21ðu02 Cv02Þ

u0ðc1gCc2Þ

v0ðc1gCc2Þ

c1c2ðu02 Cv02ÞKg

0BB@

1CCA (11)

where

ð u0 v0 1 ÞT ZCK1ð u v 1 ÞT;

gZffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1C ðc21Kc22Þðu

2 Cv2Þ

q (12)

The coordinates of the points in the camera system are then

given by

pk Z lkpk ZRTðPCHkKDÞ (13)

Let the rotationmatrixR be decomposed asRZRzRxywithRz

being rotation aboutZ-axis andRxy containing rotations aboutX-

and Y-axes. Let hkZRTHkZRTxyR

Tz Hk. If the target is a linear

object perpendicular to the X–Y plane, then the rotation around

Z-axis does not affect its coordinates, hence RTz HkZHk and

hk ZRTxyHk (14)

If the camera axes are nearly vertical, one can use RxyZI.

However, if that is not the case, one can estimate Rxy using

vanishing points as in [6]. IfKZ2, the line lTpZ0 joining p1 and

p2 is given by lZp1!p2. In case, of KO2 the line can be

estimated using singular value decomposition (SVD) as the

singular vector corresponding to the smallest singular value of

the matrix:

ð p1 p2 / pK Þ (15)

Note that if the elements of pK are imbalanced, Hartley

normalization [25] should be used. If the data is prone to outliers,

a robust method such as RANSAC [9] could be used to fit the

line. To obtain more accurate estimate, SVD could be used with

the points determined to be inliers.

Once the lines lij corresponding to object are found for all

images jZ1,., N of a particular camera i, the vanishing point

vi of these lines for camera i can be similarly obtained using

SVD of:

ð li1 li2 / liN Þ (16)

Again, RANSAC could be used for robust fitting, followed

by SVD on inlier points.

Since, the vanishing point is the image of the point at

infinity in direction (0, 0, 1), it corresponds to the third column

of the matrix RT (or RTxy). The other columns can be specified

using orthonormalization, up to the ambiguity of rotation

around Z-axis. Using the matrix Rxy, the camera Z-axis is

virtually aligned with the world axis.

If KZ2 as in case of using top and bottom points of a

moving person, one can subtract Eq. (13) for kZ1 from that for

kZ2 to get:

p2l2Kp1l1 ZRTðH2KH1ÞZ h2Kh1 (17)

This gives three scalar equations in two variables, which can

be solved using least squares to obtain l1 and l2.

In case, of KO2 the mean of Eq. (13) is taken over kZ1,.,K:

1

K

Xk

lkpk ZRT PC1

K

Xk

HkKD

!(18)

Subtracting this from Eq. (13), for each k we get:

lkpkK

Pk lkpkK

ZRT HkK1

K

Xk

Hk

!

Z hkK1

K

Xk

hk (19)

Writing these equations in terms of unknown parameters lkgives

G

l1

l2

«

lK

0BBBB@

1CCCCAZ

h1

h2

«

hK

0BBBB@

1CCCCAK

1

K

Xk

hk

1

1

«

1

0BBBB@

1CCCCA (20)

where

GZ

p1 0 / 0

0 p2 / 0

« « 1 «

0 0 / pK

0BBBB@

1CCCCAK

1

K

1

1

«

1

0BBBB@

1CCCCAð p1 p2 / pk Þ

(21)

is a 3K!K matrix with 3(KK1) independent equation and K

variables.

The depths lk are obtained using least squares. Using these

values, the coordinates pk in the camera systems are obtained.

Since the line joining the pk’s is assumed vertical, the plan view

coordinates of the person wij are obtained by computing the

average projection of pijk into the X–Y plane, and the height diof the camera i above the ground is obtained by averaging the

Z-coordinates over all j and k as:

xij

yij

zij

0B@

1CAZ

1

KðRxyÞi

Xk

½pijkKHk�

wij Zxij

yij

� �; di Z

1

N

Xj

zij (22)


3.2. Factorization in plan view coordinates

Using the approximately known plan view coordinates of

the person in camera system of all cameras, the camera

positions and rotations, as well as the world coordinates of the

points are obtained using factorization method similar to [7].

Let the relation between the plan view coordinates in the

world and camera system be given by wijZAiWjCbi, where

Ai, bi represent the affine transformation due to the camera.

Assuming the world origin to be at the centroid of Wj, we

have:

bi ZXj

wij; ~wij ZwijKbi ZAiWj (23)

If ~wij, Ai, and Wj are stacked as matrices with i along rows

and j along columns we get a matrix equation:

U2M!N ZA2M!2W2!N (24)

Hence, U is a rank 2 matrix that can be factorized into the

camera parameter component A and the world coordinate

component W. If all elements of U are known, the

factorization can be performed using SVD [7] as

UZUSVT; ~AZU; ~W Z SWVT (25)

where SW is obtained by setting all but the largest two singular

values of S to zero.

Since, all feature points may not be detected in all images,

some of the elements ofUmay be unknown. However, since the

columns of U are dependent, the missing elements can be filled

using the approach proposed by Jacobs [22], with the

implementation by Svoboda et al. [23] used with appropriate

modifications. In this case, a rank rZ2matrix is to be fitted toU

with some unknown elements. The algorithm is summarized as

follows:

For a number of trials tZ1,.,T

(1) Choose r columns of U at random. Remove rows with

unknown elements. If resulting matrix is full rank, then

accept, otherwise repeat the step.

(2) Take these r columns, and replace each unknown element

by 0. Add unit vectors corresponding to each row having

unknown elements. For example, replace:

2

3

!

5

!

4

0BBBBBBBBB@

1CCCCCCCCCA/

2 0 0

3 0 0

0 1 0

5 0 0

0 0 1

4 0 0

0BBBBBBBBBB@

1CCCCCCCCCCA

(26)

(3) Find the null space Ft of the resulting matrix Bt by

taking its SVD. The left singular vectors corresponding

to very small singular values of the matrix give the null

space.

(4) The space spanned by the matrix U is a subset of the

null space of the matrix from all trials t, given by:

F Z ðF1 F2 / Ft Þ (27)

The null space of this matrix is again obtained using

SVD similar to previous step. If the rank of the null space

is equal to r, then it corresponds to the matrix U, which

approximates U and fills the missing elements. If so,

terminate, otherwise loop back.The SVD of USVT gives the required matrices ~AZU and

~WZSWV with SW corresponding to the first r singular values.

The components ~A and ~W are obtained modulo affine

transformation Q so that any AZ ~AQ and QK1WZ ~W would

satisfy the factorization. To obtain A so that its 2!2 sub-

matrices Ai are orthonormal, one can write as in [8,9]

Aai CA

bi Z dab (28)

where CZQQT, Aai is the ath row of submatrix Ai, and d($) is

the Kroneker delta function. These equations are linear in terms

of elements of the symmetric matrix C, and are solved using

least squares. The matrix Q is obtained using eigenvalue

decomposition of C. The signs of columns of Q are adjusted to

ensure that the resulting matrices AiZ ~AiQ have positive

determinant.

3.3. Non-linear optimization

The factorization stage gives an approximate solution to the

calibration. However, it does not minimize a geometrically

significant measure of error. Hence, a non-linear least squares

procedure is applied by converting the 2D solution to 3D and

using that as the initial guess:

Ri ZATi 0

0 1

!Rxy; Di ZKRT

i

bi

di

!; Pj Z

Wj

0

!(29)

The rotation matrix R is parameterized into angle axis form

using Rodrigues formula. The objective function to be

minimized is given by the sum of squares of the error between

the actual and the projected omni coordinates

mindi ;qi;Pj

Xi;j;k

ðqijkKqijkÞLK1ijk ðqijkKqijkÞ (30)

with qijk being the projections obtained using the current

estimates of di, qi, Pj, and Lijk is the covariance matrix of noise

in each point qijk. If the pixel noise is isotropic, thenLijkZI2!2.

However, in real situations, the pixel noise along the major axis

of the person is often larger than that along the minor axis. For

such a case, we use

Lijk ZUijkSijkUTijk ZVijkV

Tijk (31)

where columns of Uijk contain unit vectors along major and

minor axes for the line passing through qijk, kZ1,.,K, Sijk is

the diagonal matrix containing the variances assigned to each

axis, and VijkZUijkS1=2ijk . Using this expression, the minimiz-

ation is simplified to:

Fig. 4. Simulated camera configurations showing the camera locations, orientations, and track points. Ground truth (red, blue) as well as estimated values (green,

orange) from simulations are shown. Covariance ellipses (3s) are also plotted. Statistics of camera parameter estimates are shown against each of the configurations.

Note that camera 1 is fixed at the world origin, and the camera 2 along X-axis.



mindi;qi;Pj

Xi;j;k

jjVijkðqijkKqijkÞjj2 (32)

To have a unique solution, the first camera is fixed at world

origin, and the second camera is along positive world X-axis by

constraining the respective parameters to zero. The function

nsqnonlin in MATLAB is used for performing the non-

linear minimization.

4. Experimental evaluation and results

4.1. Simulated experiments

In order to evaluate the accuracy as well as limits of the

performance we need to work with situations where the results

Fig. 5. Simulations with random noise standard deviation in pixels of (a) 5, (b) 7, (c

from simulations are shown. Covariance ellipses (3s) are also plotted. (e) Table sho

some parameters increase sharply around spixZ9.

can be compared with unambiguous and known solutions.

Simulation allows us to do that as well as to add in a very

controlled manner perturbations to the data, to access the limits

of performance. For these reasons, we have designed a new

series of experiments with controlled data sets. The following

protocol was used to generate the data sets:

(1) Manually select a 2D configuration by clicking on iZ1,.,M cameras and jZ1,.,N points around the

cameras. Normalize the configuration by translating

the first camera to origin and rotating about it so that

second camera is along positive X-axis. Let the camera

locations be ðDxi ; D

yi Þ and the 2D points be (Xj, Yj).

(2) Randomly assign orientations fzm uniformly distributed

between Kp and p to the cameras in the 2D plane.

) 9, (d) 11. Ground truth (red, blue) as well as estimated values (green, orange)

wing RMS errors in each parameter for all cameras. It is seen that the errors in


(3) Replace each 2D point (Xj, Yj) with points (Xj, Yj, Zk)

for pre-specified values of Zk, kZ1,.,K, signifying

moving points on the walking person.

(4) Generate camera heights Dzi normally distributed with

mean mDz and variance s2Dz to each camera to form

translation vector DiZ ðDxi ; D

yi ; D

zi Þ.

(5) Generate 3D camera orientation by rotating each camera

using a random rotation vector ðfxi ; f

yi ; 0Þ having

uniform distribution in a circle with radius Ltilt. The

final rotation vector qi of the camera is formed by

composing these two rotations.

(6) Project all points (Xj,Yj,Zk) into camera coordinates using the

camera parametersDi, qi. Convert to omni pixel coordinates

ðu8ijk; v8ijkÞ using intrinsic camera parameters.

(7) Add Gaussian noise with variance s2pix to the pixel

coordinates

(8) Estimate the camera parameters as well as the world point

locations using the proposed algorithm.

(9) Compare and plot the estimated parameterswith ground truth

parameters. Also plot the covariance ellipses (3s) of the

parameters.

Fig. 6. Simulations with randomly generated tilt within a circle of radius (a) 58, (b) 1

orange) from simulations are shown. Covariance ellipses are different for each instan

parameter for all cameras.

In the first experiment, a number of different geometrical

configurations of MZ4,.,6 omni cameras were manually

generated, along with a number of point pairs (NZ20,.,40)

corresponding to a walking person. Fig. 4 shows the plan

views of a number of different geometrical configurations. A

number of simulations (NsimZ100) were carried out for each

configuration by adding Gaussian random noise with standard

deviation of spixZ3 pixels to the computed pixel coordinates.

The camera position and orientation (green), and the locations

of the points (orange) were estimated from the noisy pixel

coordinates and plotted for all simulations. It is seen that the

estimated parameters are quite close to the ground truth (red,

blue) in all the simulations and mostly fit in the 3s covariance

ellipses. Also, it can be observed that some configurations

such as (c) with collinear cameras are intrinsically more

sensitive to errors than other configurations.

In the second experiment, the configuration was fixed but

the pixel noise standard deviation was varied (spixZ5,.,11

pixels) and the behavior of the algorithm was observed with

NsimZ300 simulations in each case. Fig. 5 shows the results of

the simulations. It is seen that up to about spixZ7 pixels, the

08, (c) 208, (d) 308. Ground truth (red, blue) as well as estimated values (green,

t of tilt, but are quite close to each other. (e) Table showing RMS errors in each


estimated location converges near the true location (within the

covariance ellipses), but for larger spix, there are cases where

the estimated location is far from the true location, showing the

break down of the algorithm.

In third experiment, a particular 2D configuration was used,

and Ntilt number of tilt angles ðfxi ;f

yi Þ were generated by

sampling from a uniform distribution on a circle with radius

LtiltZ58,.,308. The parameters estimated from the noisy

pixels were compared with the ground truth in each case. Fig. 6

shows the result of the comparisons. Note that the error

(a)

(b)

-1 -0.5 0 0.5 1-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

C1

C3

y [m

]

Fig. 7. (a) Images from a reconfigurable array of four omni cameras assembled in a h

image subtraction. The positions of the top and bottom of the person in this frame are

using the person’s position in this and other frames. Ellipses denote the estimated

covariances show small differences between configurations due

to different inclinations of the camera axes. However, it is seen

that the algorithm converges to the correct solution even with

tilts as large as LtiltZ308.

4.2. Experiments with real data

In order, to use the calibration algorithm in conjunction with

person tracking and event analysis applications [3], the

calibration approach was tested with video sequences from

1.5 2 2.5 3 3.5

C2

C4

x [m]

allway. A person moving around is detected in all four cameras by background

extracted. (b) Result of estimation algorithm (blue!) and ground truth (redB)

covariances.


multiple omni cameras in various conditions as described

below.

In the first experiment, a reconfigurable array of four omni

cameras were placed in a hallway as shown in Fig. 1. Fig. 7(a)

shows a typical snapshot of the moving person in all four

cameras. Background subtraction was used to detect the

moving person. The frames in which the coordinates of the

bottom and top of the person were correct were manually

selected and given to the estimation algorithm. The estimated

position and orientation of the cameras, as well as the positions

of the tracked points are shown in Fig. 7(b). Ground truth

camera positions are also shown for comparison.

The second experiment was performed in outdoor setting.

However, due to poor lighting conditions, the positions of the

person were marked manually in the images. Fig. 8 shows the

results of this experiment. The error was considerably greater

in this case. However, it was observed that most of the error

was in the scale of estimation, and the shapes of the estimated

and ground truth positions matched better if scaled by a

(a)

(b) (c

-2 -1 0 1 2 3 4-1

0

1

2

3

4

5

6

7

C1 C2

C3

C4

x [m]

y [m

]

Fig. 8. (a) A network of four omni cameras assembled in an outdoor set-up with a dif

Result of estimation algorithm (blue!) as well as the ground truth (redB). (c) Res

for cameras 3 and 4 is much better now.

uniform factor. Possible reasons for this discrepancy could be

greater sensitivity in estimation of scale than of orientation, a

bias in the marking of person’s height, or inaccuracies in

internal parameters.

The third set-up was in garage setting. Fig. 9 shows the

results of this experiment. Again, most of the error was

observed in scale factor and after compensating that, the errors

reduced by a large extent.

The fourth set-up used four omnidirectional cameras

mounted on a table for doing real-time person tracking as

described in [3]. However, in this case, the walking person was

not fully visible as seen in Fig. 10(a), therefore the localization

in longitudinal direction was very inaccurate. Hence, the

anisotropic noise model described in Eq. (31) was used for

bundle adjustment. Fig. 10(b) shows the computed and ground

truth camera positions along with computed position of the

person. Due to the longitudinal error, there is a very large

difference between the computed and the ground truth camera

positions. However, here too it was observed that the difference

)

-2 -1 0 1 2 3 4-1

0

1

2

3

4

5

6

7

C1 C2

C3

C4

x [m]

y [m

]

ferent camera geometry. The positions of the person were manually marked. (b)

ult after scaling the estimated positions to align the cameras 1 and 2. Alignment

-5 0 5-1

0

1

2

3

4

5

C1 C2

C3

C4

x [m]

y [m

]

-4 -3 -2 -1 0 1 2 3 4-1

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

C1 C2

C3

C4

x [m]

y [m

]

(b)

(a)

(c)

Fig. 9. (a)A network offour omni cameras assembled in an garage set-up. The positions of the person in number of images are shown. (b)Result of estimation algorithm

(blue) as well as the ground truth (red). (c) Result after scaling the estimated positions to align the cameras 1 and 2. Alignment for cameras 3 and 4 is much better now.


was mainly in scale factor. After compensating the scale factor,

the ground truth position and the computed positions are much

closer; almost within the covariance ellipses in spite of large

longitudinal error in pixel locations.

5. Concluding remarks

The paper described a method for calibrating reconfigurable

arrays of omnidirectional cameras using the image positions of

a 1D object such as a moving person in all the cameras. Matrix

factorization in presence of missing elements, followed by non-

linear optimization was used to determine the camera

parameters as well as the position of moving person in

multiple frames.

Experimental results using simulated and real data show that

the method is effective in getting a calibration within a

reasonable accuracy at least up to a scale factor. The estimation

of scale factor is however less accurate. Possible explanations

for this inaccuracy are: (1) longitudinal uncertainty in

determining image coordinates of person, (2) error in internal

calibration, (3) presence of outliers, (4) bias due to solution of

non-linear least squares with large observation errors. The

performance of the algorithm is good for low to moderate noise

but degrades as the noise increases. Use of vanishing point

estimation to compensate pan and tilt enables the method to be

used with significantly non-vertical camera axes. However,

according to the geometry of the approach, the person needs to

be in vertical pose and the ground to be horizontal.

Self-calibration using point object such as waving a bulb

around the cameras can yield good accuracy as illustrated by

[11]. However, that approach is more suited to controlled

environments with low or moderate lighting. In many

applications, with bright lighting and background clutter, it

may be difficult to identify a point object in the images. In such

case, a linear object such as a walking person may be useful for

obtaining reasonably accurate calibration. In addition, it would

be interesting to integrate this calibration procedure with

tracking in order to find the trajectories of walking person.

(a)

-1 0 1 2 3 4 5 6 7-3

-2

-1

0

1

2

3

4

C1 C2

C3 C4

x [m]

y [m

]

P1

P2

P3

P4P5

P6P7

P8P9P10

P11

P12

P13

P14

P15

P16

P17 P18P19

P20

P21

P22

P23P24P25P26

P27

P28

P29

P30

P31 P32

P33

P34

P35

P36

P37

P38 P39

-1 0 1 2 3 4-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

C1 C2

C3 C4

x [m]

y [m

] P1

P2

P3

P4P5

P6P7

P8P9P10

P11

P12

P13

P14

P15

P16

P17 P18P19

P20

P21

P22

P23P24P25P26

P27

P28

P29

P30

P31 P32

P33

P34

P35P36

P37

P38 P39

P40

P41

P42

(b) (c)

Fig. 10. (a) A network of four omni cameras mounted on a table top. The positions of the person in number of images are shown. Note that the person is not fully

visible in all images, and therefore the longitudinal localization error is high. (b) Due to the longitudinal error, there is a large difference between the estimated (blue)

and ground truth positions (red). (c) Result after scaling the estimated positions to align the cameras 1 and 2. Alignment for cameras 3 and 4 is much better now, in

spite of the large longitudinal pixel errors.


If greater accuracy is required, one can use surveying poles

standing vertically with distinctive marks at K different

heights. The approach can also be used for integrated

calibration of rectilinear cameras along with omni cameras.

For rectilinear cameras, the camera axis is likely to deviate

from vertical, and vanishing point method described here and

in [6] to determine the axis orientation would be especially

useful. For handling outliers, the approaches used by Martine

and Pajdla [26] would be useful in factorization. Currently, the

internal calibration of omni camera is predetermined using

features with known world coordinates. However, we would

like to study if it is feasible to estimate the internal camera

parameters simultaneously with sufficient accuracy. Finally,

we plan to integrate the calibration with person tracking and

event analysis systems shown in Fig. 1 for real world

deployment of sensor networks containing omni as well as

rectilinear cameras.

Acknowledgements

We thank the reviewers for the careful reading of the earlier

version and for insightful comments, which has allowed us to

improve the paper. We are thankful for the grant awarded by

the Technical Support Working Group (TSWG) of the US

Department of Defense, which provided the primary sponsor-

ship of the reported research. We also thank our colleagues

from the UCSD Computer Vision and Research Laboratory for

their contributions and support.

References

[1] S. Baker, S.K. Nayar, A theory of catadioptric image formation, in:

ICCV98, 1998, pp. 35–42.

[2] R. Benosman, S.B. Kang, Panoramic Vision: Sensors, Theory, and

Applications, Springer, Berlin, 2001.


[3] K.C. Huang, M.M. Trivedi, Video arrays for real-time tracking of persons,

head and face in an intelligent room, Machine Vision and Applications 14

(2) (2003) 103–111.

[4] T. Sogo, H. Ishiguro, M. Trivedi, N-ocular stereo for real-time human

tracking, in: R. Benosman, S. Kang (Eds.), Panoramic Vision: Sensors,

Theory, and Applications, Springer, Berlin, 2001, pp. 359–376.

[5] L. Lee, B. Romano, G. Stein, Monitoring activities from multiple video

streams: establishing a common coordinate frame, PAMI 22 (8) (2000)

758–767.

[6] F. Lv, T. Zhao, R. Nevatia, Self-calibration of a camera from video of a

walking human, in: Int. Conf. on Pattern Recognition 2002, pp. I:

562–567.

[7] C. Tomasi, T. Kanade, Shape and motion from image streams under

orthography: A factorization method, International Journal of Computer

Vision 9 (2) (1992) 137–154.

[8] C. Poelman, T. Kanade, A paraperspective factorization method for shape

and motion recovery, PAMI 19 (3) (1997) 206–218.

[9] D. Forsyth, J. Ponce, Computer Vision: A Modern Approach, Prentice-

Hall, Upper Saddle River, NJ, 2003.

[10] P. Strum, B. Triggs, A factorization-based algorithm for multi-image

projective structure and motion, in: Proceedings of the European

Conference on Computer Vision, 1996, pp. 709–720.

[11] Tomas Svoboda, Daniel Martinec, and Tomas Pajdla. A convenient multi-

camera self-calibration for virtual environments, PRESENCE: Tele-

operators and Virtual Environments, 14(4) August (2005), pp 407–422.

[12] D. Martinec, T. Pajdla, Structure from many perspective images with

occlusions, in: Proceedings of the European Conference on Computer

Vision (ECCV), Springer, Berlin, 2002, pp. 355–369.

[13] X. Chen, J. Davis, P. Slusallek, Wide area camera calibration using virtual

calibration objects, in: IEEE Conference on Computer Vision and Pattern

Recognition, 2000, pp. 2520–2527.

[14] J. Barreto, K. Daniilidis, Wide area multiple camera calibration and

estimation of radial distortion, in: Proceedings of the fifth Workshop on

Omnidirectional Vision, Camera Networks and Non-classical cameras,

2004.

[15] F. Pedersini, A. Sarti, S. Tubaro, Accurate and simple geometric

calibration of multi-camera systems, Signal Processing 77 (1999)

309–334.

[16] M. Antone, S. Teller, Scalable extrinsic calibration of omnidirectional

image networks, International Journal on Computer Vision 49 (2–3)

(2002) 143–174.

[17] D. Aliaga, Accurate catadioptric calibration for real-time pose estimation

in room-size environments, in: IEEE International Conference on

Computer Vision, vol. 1, 2001, pp. 127–134.

[18] Z. Zhang, Camera calibration with one-dimensional objects, in:

A. Heyden et al. (Ed.), European Conference on Computer Vision,

LNCS 2353, Springer, Berlin, Heidelberg, 2002, pp. 161–174.

[19] O. Faugeras, Three-Dimensional Computer Vision: A Geometric View-

point, The MIT Press, Cambridge, MA, 1993.

[20] Z. Zhang, A flexible new technique for camera calibration, Pattern

Analysis and Machine Intelligence 22 (11) (2000) 1330–1334.

[21] T. Gandhi, M.M. Trivedi, Calibration of a reconfigurable array of

omnidirectional cameras using a moving person, in: Second ACM

International Workshop on Video Surveillance and Sensor Networks,

New York, 2004, pp. 12–19.

[22] D. Jacobs, Linear fitting with missing data: applications to structure from

motion and to characterizing intensity images, in: IEEE Conference on

Computer Vision and Pattern Recognition, 1997, pp. 206–212.

[23] T. Svoboda, D. Martinec, T. Pajdla, J.-Y. Bouguet, T. Werner, O. Chum,

Multi-camera self-calibration, Czech Technical University, Prague, Czech

Republic, Available from: http://cmp.felk.cvut.cz/wsvoboda/SelfCal/

[24] A.W. Fitzgibbon, M. Pilu, R.B. Fisher, Direct least-squares fitting of

ellipses, IEEE Transactions on Pattern Analysis and Machine Intelligence

21 (5) (1999) 476–480.

[25] R.I. Hartley, In defence of the eight-point algorithm, IEEE Transactions

on Pattern Analysis and Machine Intelligence 19 (6) (1997) 580–593.

[26] D. Martinec, T. Pajdla, Automatic factorization-based reconstruction

from perspective images with occlusions and outliers, in: Proceedings of

the eighth Computer Vision Winter Workshop (CVWW), Czech Pattern

Recognition Society, Prague, 2003, pp. 147–152.

http://http://cmp.felk.cvut.cz/~svoboda/SelfCal/

Reconﬁgurable omnidirectional camera array calibration...

Documents

Transcript of Reconﬁgurable omnidirectional camera array calibration...