Reconfigurable omnidirectional camera array calibration...
Transcript of Reconfigurable omnidirectional camera array calibration...
Reconfigurable omnidirectional camera array calibration with
a linear moving object
Tarak Gandhi *, Mohan M. Trivedi
Computer Vision and Robotics Research Laboratory, University of California at San Diego, 9500 Gilman Drive 0434, La Jolla, CA 92093, USA
Received 1 August 2004; received in revised form 10 October 2005; accepted 8 February 2006
Abstract
Reconfigurable omnidirectional camera arrays are useful for applications where multiple cameras working together are to be deployed at a short
notice. This paper proposes a multi-camera calibration approach for such arrays using a 1D object such as a person, moving parallel to itself. The
moving object is separated from the background and correspondences between multiple cameras are obtained using location of the object in all the
cameras in multiple frames. The approximate orientation of the cameras is individually determined using vanishing points. The non-linear 3D
problem of multi-camera calibration can then be approximated by a 2D problem in plan view. An initial solution for the relative positions and
orientations of multiple cameras is obtained using the factorization approach. A non-linear optimization stage is then used to correct the
approximations, and to minimize the geometric error between the observed and the projected omni pixel coordinates. Numerous experiments are
performed with simulated data sets as well as real image sequences to evaluate the performance of this approach.
q 2006 Elsevier B.V. All rights reserved.
Keywords: Calibration; Panoramic vision; Stereo vision; Surveillance
1. Introduction and motivation
Omnidirectional cameras which give a 3608 field of view of
the surroundings have been of great interest to vision
researchers [1–3]. A network of multiple omni based camera
arrays can offer advantages of wide area coverage, redundancy,
abstraction at multiple levels such as detection, tracking and
recognition. Arrays of omnidirectional as well as rectilinear
cameras may be integrated with the infrastructure for
applications such as surveillance, people tracking, face
detection, traffic analysis, and infrastructure monitoring. In
addition to the fixed cameras, there may also be arrays of
cameras deployed as needed to work as a part of a larger
system. Such arrays would give advantages of reconfigur-
ability, flexibility, and ad hoc deployment and be of value in
the new generation of Sensor networks capable of monitoring
dynamic events from remote environments. In applications
such as structural health monitoring of civil infrastructures,
0262-8856/$ - see front matter q 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2006.02.020
* Corresponding author. Tel.: C1 858 822 0002.
E-mail addresses: [email protected] (T. Gandhi), [email protected] (M.
M. Trivedi).
reconfigurable video arrays can provide important collabora-
tive visual record.
In order to ensure that multiple cameras would work in an
integrated manner, it is necessary to calibrate the cameras with
respect to each other. However, measuring locations and
orientations of the cameras is laborious procedure. Hence, if
one wants to install such a network at short notice, one should
have a calibration procedure for the cameras that is fast and
preferably automatic.
For example, an array of four fixed omni cameras are used in
[3,4] to perform 3D tracking of persons in an indoor setting.
Fig. 1(a) shows the omni images acquired from four cameras.
This system detects objects using background subtraction as
shown in Fig. 1(b). Using the known location of cameras, it
obtains the location and heights of the targets by triangulation
as shown in Fig. 1(c). We intend to use this system for
reconfigurable array of omni cameras such as the one shown in
Fig. 1(d), for which fast calibration procedure is necessary.
1.1. Related work
Calibration can be performed by registering corresponding
features between a number of cameras. However, when the
cameras are widely separated, finding corresponding features is
often difficult. Due to this reason, some researchers have
proposed the use of moving features to register the images [5].
Image and Vision Computing 24 (2006) 935–948
www.elsevier.com/locate/imavis
Fig. 1. (a) Images from the omni cameras with two moving persons. (b) Panoramic views showing object detection. (c) Estimated locations and heights of the
detected persons. (d) Proposed reconfigurable array of omni cameras.
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948936
In [6], the calibration parameters of a single rectilinear camera
were determined by observing motion of a person in the scene.
Factorization methods have been widely used for
structure from motion problem where images from a single
moving camera is used to determine the scene structure.
Table 1
Classification of calibration methods according to dimensionality (based on [18])
Dim. Calib. object Approach A
3D [19] A non-planar solid or
wireframe structure with
known geometry
Use world and camera coordinates of
object features to estimate camera
parameters
N
p
s
2D [20] Planar object with a
pattern such as checker-
board
Move around the object and estimate
calibration parameters assuming
unknown motion of the calibration
pattern
E
o
a
1D [18] A linear object with small
number of collinear
feature points
Move the object so that one end is
fixed. Solve for calibration using at
least six sets of observations using
closed form linear method followed
by iterative non-linear bundle adjust-
ment
C
b
o
c
i
0D [11] Point object such as laser
pointer covered with
plastic sheet or a bright
bulb
Wave the object in the working area.
Solve structure and motion problem to
estimate relative camera calibrations
M
l
This work
(1D) [21]
A vertically oriented
linear object such as a
moving person
Obtain object location in all cameras
and use factorization and non-linear
least squares to obtain calibration as
well as object positions
U
o
p
Tomasi and Kanade [7] proposed the factorization method
for orthographic cameras, which was extended to para-
perspective cameras [8] as well as perspective cameras
[9,10]. Svoboda et al. [11] have developed a multi-camera
calibration software that uses a modified laser pointer to get
dvantages Limitations
eed to estimate small number of
arameters, efficient, and least
ensitive to ill-conditioning
Expensive to make and use 3D
objects. Impractical for large working
area
asier to construct 2D pattern than 3D
bject. Standard calibration tools
vailable
More parameters to be estimated than
in 3D. Need to deal with multiple
images. Impractical for large working
area
onvenient for multi-camera cali-
ration of large area where 2D/3D
bject may not be seen properly in all
ameras. Less prone to ill-condition-
ng than using point object
Configuration or motion of the linear
object needs to be constrained in order
to determine calibration
unambiguously. Less convenient than
point objects
ost convenient to use. Can cover
arge areas with least effort
Large number of parameters to be
estimated. Possible ill conditioning
the working area is not covered
properly. Can estimate calibration
only up to a scale factor
seful in large areas including
utdoors. Can be integrated with
erson detection and tracking
Accuracy is limited in scale factor due
to errors in localization of the person
100
120
140
160
180
200
(a)
(b)
(c)
ixel
s]
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948 937
corresponding points in multiple cameras. In this implemen-
tation, self-calibration is performed using factorization
algorithm by Martinec [12], which accounts for outliers
and occlusions.
Chen et al. [13] as well as Baretto and Daniilidis [14] use the
path of a moving point to calibrate a wide area network of
cameras, all of which cannot see the point at the same time. To
introduce robustness to the calibration procedure, a planar
target pattern is used by Pedersini [15]. The target is moved
around to get multiple views in the area of interest, which gives
enough constraints to solve the calibration problem robustly.
Antonne and Teller [16] proposes an algorithm to perform fast
and accurate extrinsic calibration of large camera networks
with every pair of cameras have overlapping FOV. The paper
extends classical registration methods to large-scale and
proposes a probabilistic framework for modeling projective
uncertainty. In [17], an accurate method for calibrating a single
omni-directional camera on a moving platform is described.
As described in [18] these and other calibration approaches
can be broadly divided into the categories shown in Table 1. It
is mentioned that there is not much research in using 1D
calibration objects. He proves that a freely moving 1D object
cannot be used for calibration and describes an approach to use
a linear target moving with one end fixed. The approach
described here and in [21] also uses a 1D object, such as a
moving person to calibrate multiple stationary omni-direc-
tional cameras. However, instead of using the constraint of one
end being fixed as in [18], it assumes that the object is oriented
vertically and moving parallel to itself. Using this constraint,
the non-linear 3D structure from motion problem is projected
to a 2D problem in the ground plane perpendicular to the
person’s axis. This problem is solved using factorization
method to give an initial approximate solution. The solution is
then used in a non-linear optimization framework to account
for the non-vertical camera axes, and minimizing a geometric
error measure between the observed and the projected omni
pixel coordinates.
This paper is an expansion of [21] with details on internal
calibration of omni camera, approach for factorization with
missing points adapted from [22,23], more extensive exper-
imentation with simulated and real data, and a discussion on
comparison between calibration methods.
–0.2 0 0.2 0.4 0.6 0.8 10
20
40
60
80
z/r
d [
y p
Fig. 2. (a) Geometry of a hyperbolic omni camera. The rays towards first focus
of the mirror are reflected towards the second focus, and imaged by a normal
camera. (b) Averaged image from the omni camera set for calibration. An
ellipse is fitted to the FOV boundary, and a number of points having known
world coordinates are marked. (c) Plot of the radial distance d in image against
cos q where q is the angle between Z-axis and object. The curve is drawn using
estimated parameters.
2. Omni camera geometry
The omni cameras consist of a hyperbolic mirror and a
camera placed on its axis [2]. These cameras have a single
viewpoint that permits the image to be suitably transformed to
obtain perspective views. The geometry of the omni camera is
shown in Fig. 2(a). Each camera covers a 360 degrees field of
view (FOV) around its center. However, the images are
distorted with straight lines transformed into curves. This
distortion should be taken into account when performing
calibration. The conversion between the camera coordinates
pZ(x, y, z)T and the omni coordinates qZ(u, v)T is given by
u
v
1
0B@
1CAZC
u0
v0
1
0B@
1CAZ lC
x
y
Kc1zCc2jjpjj
0B@
1CA (1)
Reduction to plan view estimation by assuming vertical symmetry, constant height vector
Factorization to obtain camera parameters and person’s world coordinates in plan view
Non-linear optimization to minimize geometric error and account from deviation from vertical symmetry
C1
DH
C2
C3
Pj2
C4
Pj1
C1W j
C3
C4C2
(a) (b)
Fig. 3. (a) Flow chart for the three stage omni-array calibration algorithm. (b)
Projection of 3D configuration to plan view.
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948938
where c1 and c2 are determined by the omni mirror geometry as
well as the focal length of the lens, whereas C is the normalized
calibration matrix of the CCD, with g being the aspect ratio,
and (u0, v0) the camera center
C Z
a g u0
0 1 v0
0 0 1
0B@
1CA (2)
Since, the internal parameters of the omni camera are to be
measured only once, a specialized set-up was used. The omni
camera was set on a tripod, and leveled to have vertical camera
axis. A number of images of a stationary scene were taken and
averaged to reduce noise. The averaged image is shown in
Fig. 2(b). An ellipse was fitted to the boundary of the FOV in
order to extract the parameters of C matrix. A number of
features with known coordinates were marked on the ground
and a vertical pole to cover the FOV of the omni camera. Their
camera coordinates were manually extracted from the image
and used for estimating the mirror parameters.
We assume that the FOV of the omni camera is circularly
symmetric, bounded by:
u02 Cv02 Z r2 (3)
This maps into the ellipse as seen in Fig. 2(b). The equation
in pixel coordinates is given by
aK2½ðuKu0ÞKgðvKv0Þ�
2 C ðvKv0Þ2 Z r2 (4)
which can be written in the standard form:
au2 C2huvCbv2 C2guC2fvCcZ 0 (5)
The parameters ofCmatrix can then be determined by fitting
an ellipse to the set of points on the boundary of the FOV [24].
The equation is normalized to set, aZ1. Using this, we get:
aZ 1; hZg; bZa2 Cg2; gZgv0Ku0; f
ZKa2v0Kgg; cZ g2 Ca2ðv20Kr2Þ (6)
These equations can be readily solved to give:
gZ h; aZffiffiffiffiffiffiffiffiffiffiffiffiffibKg2
p; v0 ZKðggC f Þ=a2; u0
Zgv0Kg; r Zffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffig2KcCa2v20
q=a (7)
Note, that in the case of most cameras, the skew factor g is
negligible. In such case, ellipse center corresponds to (u0, v0)
and aspect ratio (a) is equal to the ratio of the axes lengths of the
ellipse.
Using these parameters in matrix C, the image coordinates
(u, v) can be normalized to give (u 0, v 0). Assuming radial
symmetry around the image center, we have
d Zffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiu02 Cv02
pZ
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffix2 Cy2
pc1zCc2jjpjj
Zsin q
c1 cos qCc2(8)
where q is the angle between the Z-axis and the object. Using
the world and image coordinates of these points, the linear
equations in c1 and c2 are formed and solved using least
squares.
ðd cos qÞc1 Cdc2 Z sin q (9)
Fig. 2(c) shows the plot of the radial distance d in image
domain against cos q of the sample points, and the curve fitted
using estimated parameters. It is seen that the curve models the
omni mapping quite faithfully. Non-linear least squares can
then be used for improving the accuracy.
3. Three-stage calibration algorithm for reconfigurable
omni arrays
The flow chart of the calibration method is shown in Fig. 3.
Initially, it is assumed that the camera axes as well as the person
are nearly vertical. Using this approximation, the non-linear 3D
problem of structure from motion is reduced to a 2D problem in
plan view coordinates. Using the plan view coordinates of
person’s position in all cameras, theworld coordinates aswell as
the camera orientation and location is obtained using factoriz-
ation method. The solution obtained is then used in a non-linear
least squares optimization framework, minimizing the geo-
metric error measure between the observed and projected pixels
in the omni camera domain, and also incorporating the
possibility of camera axes not being exactly vertical.
Let Ri, Di be the rotation and translation parameters of
cameras iZ1,., M, and Pj be the world coordinates of a
reference point on the object in image frame j. Let kZ1,.,K
points on the reference objects be at the heights H1, H2,.,HK
from the reference point in the world coordinate system. For
example, if the object is a person with height DH, one can haveKZ2 points with H1Z(0, 0, 0) and H2Z(0, 0, DH)representing the top and bottom of the person. If the height is
unknown, one can take DHZ1 and determine the calibration
within a scale factor.
The world coordinates of these points would be PjkZPjCHk
and the coordinatesPijk in the camera i system are then given by:
pijk ZRTi ðPj CHkKDiÞ (10)
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948 939
Let the omni image coordinates of the respective point using
Eq. (1) be denoted by qijkZ(uijk, vijk)T.
3.1. Reduction to plan view
Since this step is applied separately to each camera i and
image j, let the subscripts i, j denoting the camera and image be
temporarily dropped for the sake of brevity. The omni image
coordinates qk of each object point k are converted to
homogenous coordinates pk with jjpkjjZ1 by solving Eq. (1)
pZ1
1Cc21ðu02 Cv02Þ
u0ðc1gCc2Þ
v0ðc1gCc2Þ
c1c2ðu02 Cv02ÞKg
0BB@
1CCA (11)
where
ð u0 v0 1 ÞT ZCK1ð u v 1 ÞT;
gZffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1C ðc21Kc22Þðu
2 Cv2Þ
q (12)
The coordinates of the points in the camera system are then
given by
pk Z lkpk ZRTðPCHkKDÞ (13)
Let the rotationmatrixR be decomposed asRZRzRxywithRz
being rotation aboutZ-axis andRxy containing rotations aboutX-
and Y-axes. Let hkZRTHkZRTxyR
Tz Hk. If the target is a linear
object perpendicular to the X–Y plane, then the rotation around
Z-axis does not affect its coordinates, hence RTz HkZHk and
hk ZRTxyHk (14)
If the camera axes are nearly vertical, one can use RxyZI.
However, if that is not the case, one can estimate Rxy using
vanishing points as in [6]. IfKZ2, the line lTpZ0 joining p1 and
p2 is given by lZp1!p2. In case, of KO2 the line can be
estimated using singular value decomposition (SVD) as the
singular vector corresponding to the smallest singular value of
the matrix:
ð p1 p2 / pK Þ (15)
Note that if the elements of pK are imbalanced, Hartley
normalization [25] should be used. If the data is prone to outliers,
a robust method such as RANSAC [9] could be used to fit the
line. To obtain more accurate estimate, SVD could be used with
the points determined to be inliers.
Once the lines lij corresponding to object are found for all
images jZ1,., N of a particular camera i, the vanishing point
vi of these lines for camera i can be similarly obtained using
SVD of:
ð li1 li2 / liN Þ (16)
Again, RANSAC could be used for robust fitting, followed
by SVD on inlier points.
Since, the vanishing point is the image of the point at
infinity in direction (0, 0, 1), it corresponds to the third column
of the matrix RT (or RTxy). The other columns can be specified
using orthonormalization, up to the ambiguity of rotation
around Z-axis. Using the matrix Rxy, the camera Z-axis is
virtually aligned with the world axis.
If KZ2 as in case of using top and bottom points of a
moving person, one can subtract Eq. (13) for kZ1 from that for
kZ2 to get:
p2l2Kp1l1 ZRTðH2KH1ÞZ h2Kh1 (17)
This gives three scalar equations in two variables, which can
be solved using least squares to obtain l1 and l2.
In case, of KO2 the mean of Eq. (13) is taken over kZ1,.,K:
1
K
Xk
lkpk ZRT PC1
K
Xk
HkKD
!(18)
Subtracting this from Eq. (13), for each k we get:
lkpkK
Pk lkpkK
ZRT HkK1
K
Xk
Hk
!
Z hkK1
K
Xk
hk (19)
Writing these equations in terms of unknown parameters lkgives
G
l1
l2
«
lK
0BBBB@
1CCCCAZ
h1
h2
«
hK
0BBBB@
1CCCCAK
1
K
Xk
hk
1
1
«
1
0BBBB@
1CCCCA (20)
where
GZ
p1 0 / 0
0 p2 / 0
« « 1 «
0 0 / pK
0BBBB@
1CCCCAK
1
K
1
1
«
1
0BBBB@
1CCCCAð p1 p2 / pk Þ
(21)
is a 3K!K matrix with 3(KK1) independent equation and K
variables.
The depths lk are obtained using least squares. Using these
values, the coordinates pk in the camera systems are obtained.
Since the line joining the pk’s is assumed vertical, the plan view
coordinates of the person wij are obtained by computing the
average projection of pijk into the X–Y plane, and the height diof the camera i above the ground is obtained by averaging the
Z-coordinates over all j and k as:
xij
yij
zij
0B@
1CAZ
1
KðRxyÞi
Xk
½pijkKHk�
wij Zxij
yij
� �; di Z
1
N
Xj
zij (22)
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948940
3.2. Factorization in plan view coordinates
Using the approximately known plan view coordinates of
the person in camera system of all cameras, the camera
positions and rotations, as well as the world coordinates of the
points are obtained using factorization method similar to [7].
Let the relation between the plan view coordinates in the
world and camera system be given by wijZAiWjCbi, where
Ai, bi represent the affine transformation due to the camera.
Assuming the world origin to be at the centroid of Wj, we
have:
bi ZXj
wij; ~wij ZwijKbi ZAiWj (23)
If ~wij, Ai, and Wj are stacked as matrices with i along rows
and j along columns we get a matrix equation:
U2M!N ZA2M!2W2!N (24)
Hence, U is a rank 2 matrix that can be factorized into the
camera parameter component A and the world coordinate
component W. If all elements of U are known, the
factorization can be performed using SVD [7] as
UZUSVT; ~AZU; ~W Z SWVT (25)
where SW is obtained by setting all but the largest two singular
values of S to zero.
Since, all feature points may not be detected in all images,
some of the elements ofUmay be unknown. However, since the
columns of U are dependent, the missing elements can be filled
using the approach proposed by Jacobs [22], with the
implementation by Svoboda et al. [23] used with appropriate
modifications. In this case, a rank rZ2matrix is to be fitted toU
with some unknown elements. The algorithm is summarized as
follows:
For a number of trials tZ1,.,T
(1) Choose r columns of U at random. Remove rows with
unknown elements. If resulting matrix is full rank, then
accept, otherwise repeat the step.
(2) Take these r columns, and replace each unknown element
by 0. Add unit vectors corresponding to each row having
unknown elements. For example, replace:
2
3
!
5
!
4
0BBBBBBBBB@
1CCCCCCCCCA/
2 0 0
3 0 0
0 1 0
5 0 0
0 0 1
4 0 0
0BBBBBBBBBB@
1CCCCCCCCCCA
(26)
(3) Find the null space Ft of the resulting matrix Bt by
taking its SVD. The left singular vectors corresponding
to very small singular values of the matrix give the null
space.
(4) The space spanned by the matrix U is a subset of the
null space of the matrix from all trials t, given by:
F Z ðF1 F2 / Ft Þ (27)
The null space of this matrix is again obtained using
SVD similar to previous step. If the rank of the null space
is equal to r, then it corresponds to the matrix U, which
approximates U and fills the missing elements. If so,
terminate, otherwise loop back.The SVD of USVT gives the required matrices ~AZU and
~WZSWV with SW corresponding to the first r singular values.
The components ~A and ~W are obtained modulo affine
transformation Q so that any AZ ~AQ and QK1WZ ~W would
satisfy the factorization. To obtain A so that its 2!2 sub-
matrices Ai are orthonormal, one can write as in [8,9]
Aai CA
bi Z dab (28)
where CZQQT, Aai is the ath row of submatrix Ai, and d($) is
the Kroneker delta function. These equations are linear in terms
of elements of the symmetric matrix C, and are solved using
least squares. The matrix Q is obtained using eigenvalue
decomposition of C. The signs of columns of Q are adjusted to
ensure that the resulting matrices AiZ ~AiQ have positive
determinant.
3.3. Non-linear optimization
The factorization stage gives an approximate solution to the
calibration. However, it does not minimize a geometrically
significant measure of error. Hence, a non-linear least squares
procedure is applied by converting the 2D solution to 3D and
using that as the initial guess:
Ri ZATi 0
0 1
!Rxy; Di ZKRT
i
bi
di
!; Pj Z
Wj
0
!(29)
The rotation matrix R is parameterized into angle axis form
using Rodrigues formula. The objective function to be
minimized is given by the sum of squares of the error between
the actual and the projected omni coordinates
mindi ;qi;Pj
Xi;j;k
ðqijkKqijkÞLK1ijk ðqijkKqijkÞ (30)
with qijk being the projections obtained using the current
estimates of di, qi, Pj, and Lijk is the covariance matrix of noise
in each point qijk. If the pixel noise is isotropic, thenLijkZI2!2.
However, in real situations, the pixel noise along the major axis
of the person is often larger than that along the minor axis. For
such a case, we use
Lijk ZUijkSijkUTijk ZVijkV
Tijk (31)
where columns of Uijk contain unit vectors along major and
minor axes for the line passing through qijk, kZ1,.,K, Sijk is
the diagonal matrix containing the variances assigned to each
axis, and VijkZUijkS1=2ijk . Using this expression, the minimiz-
ation is simplified to:
Fig. 4. Simulated camera configurations showing the camera locations, orientations, and track points. Ground truth (red, blue) as well as estimated values (green,
orange) from simulations are shown. Covariance ellipses (3s) are also plotted. Statistics of camera parameter estimates are shown against each of the configurations.
Note that camera 1 is fixed at the world origin, and the camera 2 along X-axis.
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948 941
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948942
mindi;qi;Pj
Xi;j;k
jjVijkðqijkKqijkÞjj2 (32)
To have a unique solution, the first camera is fixed at world
origin, and the second camera is along positive world X-axis by
constraining the respective parameters to zero. The function
nsqnonlin in MATLAB is used for performing the non-
linear minimization.
4. Experimental evaluation and results
4.1. Simulated experiments
In order to evaluate the accuracy as well as limits of the
performance we need to work with situations where the results
Fig. 5. Simulations with random noise standard deviation in pixels of (a) 5, (b) 7, (c
from simulations are shown. Covariance ellipses (3s) are also plotted. (e) Table sho
some parameters increase sharply around spixZ9.
can be compared with unambiguous and known solutions.
Simulation allows us to do that as well as to add in a very
controlled manner perturbations to the data, to access the limits
of performance. For these reasons, we have designed a new
series of experiments with controlled data sets. The following
protocol was used to generate the data sets:
(1) Manually select a 2D configuration by clicking on iZ1,.,M cameras and jZ1,.,N points around the
cameras. Normalize the configuration by translating
the first camera to origin and rotating about it so that
second camera is along positive X-axis. Let the camera
locations be ðDxi ; D
yi Þ and the 2D points be (Xj, Yj).
(2) Randomly assign orientations fzm uniformly distributed
between Kp and p to the cameras in the 2D plane.
) 9, (d) 11. Ground truth (red, blue) as well as estimated values (green, orange)
wing RMS errors in each parameter for all cameras. It is seen that the errors in
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948 943
(3) Replace each 2D point (Xj, Yj) with points (Xj, Yj, Zk)
for pre-specified values of Zk, kZ1,.,K, signifying
moving points on the walking person.
(4) Generate camera heights Dzi normally distributed with
mean mDz and variance s2Dz to each camera to form
translation vector DiZ ðDxi ; D
yi ; D
zi Þ.
(5) Generate 3D camera orientation by rotating each camera
using a random rotation vector ðfxi ; f
yi ; 0Þ having
uniform distribution in a circle with radius Ltilt. The
final rotation vector qi of the camera is formed by
composing these two rotations.
(6) Project all points (Xj,Yj,Zk) into camera coordinates using the
camera parametersDi, qi. Convert to omni pixel coordinates
ðu8ijk; v8ijkÞ using intrinsic camera parameters.
(7) Add Gaussian noise with variance s2pix to the pixel
coordinates
(8) Estimate the camera parameters as well as the world point
locations using the proposed algorithm.
(9) Compare and plot the estimated parameterswith ground truth
parameters. Also plot the covariance ellipses (3s) of the
parameters.
Fig. 6. Simulations with randomly generated tilt within a circle of radius (a) 58, (b) 1
orange) from simulations are shown. Covariance ellipses are different for each instan
parameter for all cameras.
In the first experiment, a number of different geometrical
configurations of MZ4,.,6 omni cameras were manually
generated, along with a number of point pairs (NZ20,.,40)
corresponding to a walking person. Fig. 4 shows the plan
views of a number of different geometrical configurations. A
number of simulations (NsimZ100) were carried out for each
configuration by adding Gaussian random noise with standard
deviation of spixZ3 pixels to the computed pixel coordinates.
The camera position and orientation (green), and the locations
of the points (orange) were estimated from the noisy pixel
coordinates and plotted for all simulations. It is seen that the
estimated parameters are quite close to the ground truth (red,
blue) in all the simulations and mostly fit in the 3s covariance
ellipses. Also, it can be observed that some configurations
such as (c) with collinear cameras are intrinsically more
sensitive to errors than other configurations.
In the second experiment, the configuration was fixed but
the pixel noise standard deviation was varied (spixZ5,.,11
pixels) and the behavior of the algorithm was observed with
NsimZ300 simulations in each case. Fig. 5 shows the results of
the simulations. It is seen that up to about spixZ7 pixels, the
08, (c) 208, (d) 308. Ground truth (red, blue) as well as estimated values (green,
t of tilt, but are quite close to each other. (e) Table showing RMS errors in each
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948944
estimated location converges near the true location (within the
covariance ellipses), but for larger spix, there are cases where
the estimated location is far from the true location, showing the
break down of the algorithm.
In third experiment, a particular 2D configuration was used,
and Ntilt number of tilt angles ðfxi ;f
yi Þ were generated by
sampling from a uniform distribution on a circle with radius
LtiltZ58,.,308. The parameters estimated from the noisy
pixels were compared with the ground truth in each case. Fig. 6
shows the result of the comparisons. Note that the error
(a)
(b)
-1 -0.5 0 0.5 1-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
C1
C3
y [m
]
Fig. 7. (a) Images from a reconfigurable array of four omni cameras assembled in a h
image subtraction. The positions of the top and bottom of the person in this frame are
using the person’s position in this and other frames. Ellipses denote the estimated
covariances show small differences between configurations due
to different inclinations of the camera axes. However, it is seen
that the algorithm converges to the correct solution even with
tilts as large as LtiltZ308.
4.2. Experiments with real data
In order, to use the calibration algorithm in conjunction with
person tracking and event analysis applications [3], the
calibration approach was tested with video sequences from
1.5 2 2.5 3 3.5
C2
C4
x [m]
allway. A person moving around is detected in all four cameras by background
extracted. (b) Result of estimation algorithm (blue!) and ground truth (redB)
covariances.
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948 945
multiple omni cameras in various conditions as described
below.
In the first experiment, a reconfigurable array of four omni
cameras were placed in a hallway as shown in Fig. 1. Fig. 7(a)
shows a typical snapshot of the moving person in all four
cameras. Background subtraction was used to detect the
moving person. The frames in which the coordinates of the
bottom and top of the person were correct were manually
selected and given to the estimation algorithm. The estimated
position and orientation of the cameras, as well as the positions
of the tracked points are shown in Fig. 7(b). Ground truth
camera positions are also shown for comparison.
The second experiment was performed in outdoor setting.
However, due to poor lighting conditions, the positions of the
person were marked manually in the images. Fig. 8 shows the
results of this experiment. The error was considerably greater
in this case. However, it was observed that most of the error
was in the scale of estimation, and the shapes of the estimated
and ground truth positions matched better if scaled by a
(a)
(b) (c
-2 -1 0 1 2 3 4-1
0
1
2
3
4
5
6
7
C1 C2
C3
C4
x [m]
y [m
]
Fig. 8. (a) A network of four omni cameras assembled in an outdoor set-up with a dif
Result of estimation algorithm (blue!) as well as the ground truth (redB). (c) Res
for cameras 3 and 4 is much better now.
uniform factor. Possible reasons for this discrepancy could be
greater sensitivity in estimation of scale than of orientation, a
bias in the marking of person’s height, or inaccuracies in
internal parameters.
The third set-up was in garage setting. Fig. 9 shows the
results of this experiment. Again, most of the error was
observed in scale factor and after compensating that, the errors
reduced by a large extent.
The fourth set-up used four omnidirectional cameras
mounted on a table for doing real-time person tracking as
described in [3]. However, in this case, the walking person was
not fully visible as seen in Fig. 10(a), therefore the localization
in longitudinal direction was very inaccurate. Hence, the
anisotropic noise model described in Eq. (31) was used for
bundle adjustment. Fig. 10(b) shows the computed and ground
truth camera positions along with computed position of the
person. Due to the longitudinal error, there is a very large
difference between the computed and the ground truth camera
positions. However, here too it was observed that the difference
)
-2 -1 0 1 2 3 4-1
0
1
2
3
4
5
6
7
C1 C2
C3
C4
x [m]
y [m
]
ferent camera geometry. The positions of the person were manually marked. (b)
ult after scaling the estimated positions to align the cameras 1 and 2. Alignment
-5 0 5-1
0
1
2
3
4
5
C1 C2
C3
C4
x [m]
y [m
]
-4 -3 -2 -1 0 1 2 3 4-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
C1 C2
C3
C4
x [m]
y [m
]
(b)
(a)
(c)
Fig. 9. (a)A network offour omni cameras assembled in an garage set-up. The positions of the person in number of images are shown. (b)Result of estimation algorithm
(blue) as well as the ground truth (red). (c) Result after scaling the estimated positions to align the cameras 1 and 2. Alignment for cameras 3 and 4 is much better now.
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948946
was mainly in scale factor. After compensating the scale factor,
the ground truth position and the computed positions are much
closer; almost within the covariance ellipses in spite of large
longitudinal error in pixel locations.
5. Concluding remarks
The paper described a method for calibrating reconfigurable
arrays of omnidirectional cameras using the image positions of
a 1D object such as a moving person in all the cameras. Matrix
factorization in presence of missing elements, followed by non-
linear optimization was used to determine the camera
parameters as well as the position of moving person in
multiple frames.
Experimental results using simulated and real data show that
the method is effective in getting a calibration within a
reasonable accuracy at least up to a scale factor. The estimation
of scale factor is however less accurate. Possible explanations
for this inaccuracy are: (1) longitudinal uncertainty in
determining image coordinates of person, (2) error in internal
calibration, (3) presence of outliers, (4) bias due to solution of
non-linear least squares with large observation errors. The
performance of the algorithm is good for low to moderate noise
but degrades as the noise increases. Use of vanishing point
estimation to compensate pan and tilt enables the method to be
used with significantly non-vertical camera axes. However,
according to the geometry of the approach, the person needs to
be in vertical pose and the ground to be horizontal.
Self-calibration using point object such as waving a bulb
around the cameras can yield good accuracy as illustrated by
[11]. However, that approach is more suited to controlled
environments with low or moderate lighting. In many
applications, with bright lighting and background clutter, it
may be difficult to identify a point object in the images. In such
case, a linear object such as a walking person may be useful for
obtaining reasonably accurate calibration. In addition, it would
be interesting to integrate this calibration procedure with
tracking in order to find the trajectories of walking person.
(a)
-1 0 1 2 3 4 5 6 7-3
-2
-1
0
1
2
3
4
C1 C2
C3 C4
x [m]
y [m
]
P1
P2
P3
P4P5
P6P7
P8P9P10
P11
P12
P13
P14
P15
P16
P17 P18P19
P20
P21
P22
P23P24P25P26
P27
P28
P29
P30
P31 P32
P33
P34
P35
P36
P37
P38 P39
-1 0 1 2 3 4-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
C1 C2
C3 C4
x [m]
y [m
] P1
P2
P3
P4P5
P6P7
P8P9P10
P11
P12
P13
P14
P15
P16
P17 P18P19
P20
P21
P22
P23P24P25P26
P27
P28
P29
P30
P31 P32
P33
P34
P35P36
P37
P38 P39
P40
P41
P42
(b) (c)
Fig. 10. (a) A network of four omni cameras mounted on a table top. The positions of the person in number of images are shown. Note that the person is not fully
visible in all images, and therefore the longitudinal localization error is high. (b) Due to the longitudinal error, there is a large difference between the estimated (blue)
and ground truth positions (red). (c) Result after scaling the estimated positions to align the cameras 1 and 2. Alignment for cameras 3 and 4 is much better now, in
spite of the large longitudinal pixel errors.
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948 947
If greater accuracy is required, one can use surveying poles
standing vertically with distinctive marks at K different
heights. The approach can also be used for integrated
calibration of rectilinear cameras along with omni cameras.
For rectilinear cameras, the camera axis is likely to deviate
from vertical, and vanishing point method described here and
in [6] to determine the axis orientation would be especially
useful. For handling outliers, the approaches used by Martine
and Pajdla [26] would be useful in factorization. Currently, the
internal calibration of omni camera is predetermined using
features with known world coordinates. However, we would
like to study if it is feasible to estimate the internal camera
parameters simultaneously with sufficient accuracy. Finally,
we plan to integrate the calibration with person tracking and
event analysis systems shown in Fig. 1 for real world
deployment of sensor networks containing omni as well as
rectilinear cameras.
Acknowledgements
We thank the reviewers for the careful reading of the earlier
version and for insightful comments, which has allowed us to
improve the paper. We are thankful for the grant awarded by
the Technical Support Working Group (TSWG) of the US
Department of Defense, which provided the primary sponsor-
ship of the reported research. We also thank our colleagues
from the UCSD Computer Vision and Research Laboratory for
their contributions and support.
References
[1] S. Baker, S.K. Nayar, A theory of catadioptric image formation, in:
ICCV98, 1998, pp. 35–42.
[2] R. Benosman, S.B. Kang, Panoramic Vision: Sensors, Theory, and
Applications, Springer, Berlin, 2001.
T. Gandhi, M.M. Trivedi / Image and Vision Computing 24 (2006) 935–948948
[3] K.C. Huang, M.M. Trivedi, Video arrays for real-time tracking of persons,
head and face in an intelligent room, Machine Vision and Applications 14
(2) (2003) 103–111.
[4] T. Sogo, H. Ishiguro, M. Trivedi, N-ocular stereo for real-time human
tracking, in: R. Benosman, S. Kang (Eds.), Panoramic Vision: Sensors,
Theory, and Applications, Springer, Berlin, 2001, pp. 359–376.
[5] L. Lee, B. Romano, G. Stein, Monitoring activities from multiple video
streams: establishing a common coordinate frame, PAMI 22 (8) (2000)
758–767.
[6] F. Lv, T. Zhao, R. Nevatia, Self-calibration of a camera from video of a
walking human, in: Int. Conf. on Pattern Recognition 2002, pp. I:
562–567.
[7] C. Tomasi, T. Kanade, Shape and motion from image streams under
orthography: A factorization method, International Journal of Computer
Vision 9 (2) (1992) 137–154.
[8] C. Poelman, T. Kanade, A paraperspective factorization method for shape
and motion recovery, PAMI 19 (3) (1997) 206–218.
[9] D. Forsyth, J. Ponce, Computer Vision: A Modern Approach, Prentice-
Hall, Upper Saddle River, NJ, 2003.
[10] P. Strum, B. Triggs, A factorization-based algorithm for multi-image
projective structure and motion, in: Proceedings of the European
Conference on Computer Vision, 1996, pp. 709–720.
[11] Tomas Svoboda, Daniel Martinec, and Tomas Pajdla. A convenient multi-
camera self-calibration for virtual environments, PRESENCE: Tele-
operators and Virtual Environments, 14(4) August (2005), pp 407–422.
[12] D. Martinec, T. Pajdla, Structure from many perspective images with
occlusions, in: Proceedings of the European Conference on Computer
Vision (ECCV), Springer, Berlin, 2002, pp. 355–369.
[13] X. Chen, J. Davis, P. Slusallek, Wide area camera calibration using virtual
calibration objects, in: IEEE Conference on Computer Vision and Pattern
Recognition, 2000, pp. 2520–2527.
[14] J. Barreto, K. Daniilidis, Wide area multiple camera calibration and
estimation of radial distortion, in: Proceedings of the fifth Workshop on
Omnidirectional Vision, Camera Networks and Non-classical cameras,
2004.
[15] F. Pedersini, A. Sarti, S. Tubaro, Accurate and simple geometric
calibration of multi-camera systems, Signal Processing 77 (1999)
309–334.
[16] M. Antone, S. Teller, Scalable extrinsic calibration of omnidirectional
image networks, International Journal on Computer Vision 49 (2–3)
(2002) 143–174.
[17] D. Aliaga, Accurate catadioptric calibration for real-time pose estimation
in room-size environments, in: IEEE International Conference on
Computer Vision, vol. 1, 2001, pp. 127–134.
[18] Z. Zhang, Camera calibration with one-dimensional objects, in:
A. Heyden et al. (Ed.), European Conference on Computer Vision,
LNCS 2353, Springer, Berlin, Heidelberg, 2002, pp. 161–174.
[19] O. Faugeras, Three-Dimensional Computer Vision: A Geometric View-
point, The MIT Press, Cambridge, MA, 1993.
[20] Z. Zhang, A flexible new technique for camera calibration, Pattern
Analysis and Machine Intelligence 22 (11) (2000) 1330–1334.
[21] T. Gandhi, M.M. Trivedi, Calibration of a reconfigurable array of
omnidirectional cameras using a moving person, in: Second ACM
International Workshop on Video Surveillance and Sensor Networks,
New York, 2004, pp. 12–19.
[22] D. Jacobs, Linear fitting with missing data: applications to structure from
motion and to characterizing intensity images, in: IEEE Conference on
Computer Vision and Pattern Recognition, 1997, pp. 206–212.
[23] T. Svoboda, D. Martinec, T. Pajdla, J.-Y. Bouguet, T. Werner, O. Chum,
Multi-camera self-calibration, Czech Technical University, Prague, Czech
Republic, Available from: http://cmp.felk.cvut.cz/wsvoboda/SelfCal/
[24] A.W. Fitzgibbon, M. Pilu, R.B. Fisher, Direct least-squares fitting of
ellipses, IEEE Transactions on Pattern Analysis and Machine Intelligence
21 (5) (1999) 476–480.
[25] R.I. Hartley, In defence of the eight-point algorithm, IEEE Transactions
on Pattern Analysis and Machine Intelligence 19 (6) (1997) 580–593.
[26] D. Martinec, T. Pajdla, Automatic factorization-based reconstruction
from perspective images with occlusions and outliers, in: Proceedings of
the eighth Computer Vision Winter Workshop (CVWW), Czech Pattern
Recognition Society, Prague, 2003, pp. 147–152.