Pose Optimization in Edge Distance Field for Textureless 3D ......Figure 1: 3D contour matching. (a)...
Transcript of Pose Optimization in Edge Distance Field for Textureless 3D ......Figure 1: 3D contour matching. (a)...
Pose Optimization in Edge Distance Field for Textureless 3DObject Tracking
Bin WangSchool of Computer Science andTechnology, Shandong University
Fan ZhongSchool of Computer Science andTechnology, Shandong University
Xueying QinSchool of Computer Science andTechnology, Shandong University
ABSTRACT
This paper presents a monocular model-based 3D trackingapproach for textureless objects. Instead of explicitly search-ing for 3D-2D correspondences as previous methods, whichunavoidably generates individual outlier matches, we aim tominimize the holistic distance between the predicted objectcontour and the query image edges. We propose a methodthat can directly solve 3D pose parameters in unsegmentededge distance field. We derive the differentials of edge match-ing distance with respect to the pose parameters, and searchthe optimal 3D pose parameters using standard gradient-based non-linear optimization techniques. To avoid beingtrapped in local minima and to deal with potential largeinter-frame motions, a particle filtering process with a firstorder autoregressive state dynamics is exploited. Occlusionsare handled by a robust estimator. The effectiveness of ourapproach is demonstrated using comparative experimentson real image sequences with occlusions, large motions andcluttered backgrounds.
CCS CONCEPTS
β’ Computing methodologies β Mixed / augmentedreality; Tracking ;
KEYWORDS
3D tracking, Pose optimization, Distance field, Particle filter
ACM Reference format:Bin Wang, Fan Zhong, and Xueying Qin. 2017. Pose Optimizationin Edge Distance Field for Textureless 3D Object Tracking. In
Proceedings of CGI β17, Yokohama, Japan, June 27-30, 2017,6 pages.https://doi.org/10.1145/3095140.3095172
1 INTRODUCTION
3D object tracking is a fundamental computer vision task witha variety of applications in augmented reality and robotics.
Permission to make digital or hard copies of all or part of this workfor personal or classroom use is granted without fee provided thatcopies are not made or distributed for profit or commercial advantageand that copies bear this notice and the full citation on the firstpage. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copyotherwise, or republish, to post on servers or to redistribute to lists,requires prior specific permission and/or a fee. Request permissionsfrom [email protected].
CGI β17, June 27-30, 2017, Yokohama, Japan
Β© 2017 Association for Computing Machinery.ACM ISBN 978-1-4503-5228-4/17/06. . . $15.00https://doi.org/10.1145/3095140.3095172
3D tracking systems are expected to estimate the six degreesof freedom (6DoF) pose parameters of an object relative tothe camera in unknown and dynamic environments.
Thanks to robust keypoint extractors and descriptors [1,14, 19], keypoint-based 3D tracking methods [13, 17, 23]have been proposed in last decades. Although these methodsachieve impressive performance for textured objects, theyare not applicable for textureless objects due to the lack ofreliable feature matches.
For textureless objects, edges or contours are the vitalvisual cue that can be detected in most situations. Therefore,edges or contours are exploited in edge-based 3D trackingmethods. RAPID [7] is the first edge-based 3D tracker byprojecting the sampled 3D model edge points to a 2D im-age and aligning the projected edge points with the imageedge points. A 1D search for the image edge point is per-formed at the projected edge point along the direction per-pendicular to the projected model edge, and the 2D pixelposition with maximum gradient is considered as the corre-spondence of the sampled 3D model point. Several improve-ments [5, 15, 20, 22, 24, 25] have been proposed for better3D-2D correspondences afterwards. These methods are shownto be effective in some situations. However, in order to ob-tain 3D-2D correspondences, all of these methods perform1D local search perpendicular to object contours within alimited extent. Since individual contour points are usuallyindistinctive, incorrect 3D-2D correspondences are unavoid-able, especially in more complicated cases of occlusions, largeinter-frame motions and background clutters. On the otherhand, the search extent is difficult to be determined, largeextent may result in more incorrect correspondences, whilesmall extent will lead to sensitivity with large inter-framemotions.
In this paper, we propose an edge-based 3D tracking ap-proach without explicit 3D-2D correspondences. We formu-late the 3D object tracking as a contour matching problemby fitting the 3D object contour to the image edge distancefield. The distance between the predicted object contour andthe query image edge in distance field is minimized by directoptimisation of the 3D pose parameters. The differentials ofthis energy with respect to the pose parameters are derived,and the pose parameters are optimized iteratively with theLevenberg-Marquardt(L-M) algorithm. The image edges areextracted by edge detector, without requirements to do objectsegmentation and edge filtering. Cluttered backgrounds canbe handled because of the holistic matching energy function.For better tracking performance, a particle filtering process
CGI β17, June 27-30, 2017, Yokohama, Japan B. Wang et al.
with a first order autoregressive state dynamics is exploitedto deal with potential large inter-frame motions, and a robustestimator is adopted to handle occlusions. Comparative ex-periments demonstrate that the proposed method is effectiveon real image sequences with occlusions, large motions andcluttered backgrounds.
2 RELATED WORK
The literature of 3D object tracking is particularly massive.Given a 3D model of the target, 3D-2D correspondencesbetween 3D features of the model and 2D measurements inthe image are exploited for 3D tracking. According to the typeof 2D measurements, 3D-2D correspondences based methodsare classified as fudicial-based [10], keypoint-based [13, 17, 23]and edge-based [5, 7, 11, 15, 25]. We refer the reader to [12, 16]for more details. Here we restrict ourselves to monocular edge-based methods with a 3D model of the target available.
As the vital visual cue of textureless objects, edges orcontours are employed by edge-based trackers. To construct3D-2D correspondences, [5, 15] adopted a precomputed con-volution kernel function of the contour orientation to findthe image edge point only with an orientation similar tothe projected contour orientation, not the edge point withmaximum gradient in the scanline. [25] proposed multipleedge hypotheses that it attributed all the local extrema ofthe gradient along the scanline as potential correspondences.Multiple hypotheses prevent a wrong gradient maximum frombeing assigned as a correspondence, but increase the com-putation cost. [20, 24] exploited the local region knowledgeof foreground and background, and the affinity of adjacentimage edge points to search the optimal correspondences.These improvements are impressive, however the robustnessdecreases when ambiguities between different edges occur inthe scene. All of these methods assume that edge correspon-dences are determined by local search in a limit extent basedon a prior pose. If the prior pose is sufficiently incorrect,tracking probably fails especially when the object moves fast.
To ensure a good prior pose, multiple pose hypotheses areproposed to propagate the pose using particle filters. SinceIsard et al. [9] applied particle filters to 2D edge tracking, var-ious edge-based 3D tracking methods have been implementedusing a particle filter framework. [11] tracked complex 3Dobjects by utilizing the GPU to calculate edges and evaluatepose likelihoods. [4] employed keypoint correspondences forparticle initialization, and then refined the estimated pose byaligning the projected model edges and the image edges using3D-2D correspondences explicitly. Our approach is similarto [3] by both using the edge distance field. [3] is a tracking-by-detection framework that uses distance field for chamfermatching between offline 2D edge templates and the sceneimage, and the coarse pose of the matched template is usedfor initializing particles. It employed a standard edge-basedtracking to establish the correspondences and predict thefinal pose, while our approach directly optimizes the pose indistance filed. These methods can achieve impressive resultsespecially for large inter-frame motions.
(a) (b) (c) (d)
Figure 1: 3D contour matching. (a) A color image πΌ ofthe target (a CAT). (b) Canny edge map of πΌ. (c) Projected3D contour points are evolved under Equation 4 from a priorpose (in red) to the optimal pose (in green) in edge distancefield of πΌ. (d) The optimal pose is visualized by the greenwireframe overlaid on πΌ.
3 POSE PARAMETERIZATION
3D tracking aims to estimate the 6DoF pose of an objectrelative to the camera given the camera intrinsic matrixπΎ β R3Γ3, the image πΌ, the 3D model π . A 3D model pointπ β R3 is projected to an image pixel π₯ β R2 using thestandard pinhole camera model by
οΏ½ΜοΏ½ = πΎ Β· [π (π)|π‘] Β· οΏ½ΜοΏ½ (1)
where π‘ and π (π) are respectively the translation vector androtation matrix parameterized by the Rodrigues rotationvector π. οΏ½ΜοΏ½ and οΏ½ΜοΏ½ are respectively the homogenous repre-sentation of π₯ and π. The 6DoF pose is parameterized byπ = (π, π‘) in this paper.
4 3D TRACKING IN EDGE DISTANCEFIELD
This section describes our 3D tracking approach in edgedistance field in detail. We begin with an energy functionthat defines the contour matching in distance field. Then inorder to overcome the local minima in 3D pose optimization,we introduce a particle filtering process with a first orderautoregressive state dynamics. Finally, the optimization canbe made more immune to occlusions by employing a robustestimator.
4.1 Pose optimization as 3D contourmatching
We formulate the pose optimization as a contour match-ing process by fitting the 3D contour points Ξ¦ to the edgedistance field π· of the image πΌ.
To generate π·, we use canny edge detector [2] to extractthe edge map, then apply a fast distance transform [6] tothe edge map. For each image pixel π₯, π·(π₯) indicates thedistance to its nearest image edge point. Figure 1(a), 1(b), 1(c)illustrate the procedure of generating edge distance field.
Given a prior pose ππ and the 3D model π , we can renderthe depth map and extract the 2D contour points on it. ThenΞ¦ can be easily obtained by back-projecting the 2D contourpoints to the the 3D model π .
For a 3D contour point ππ β Ξ¦, the matching cost ππ isdenoted as follows:
ππ = π·(π(πΎ Β· [π (π)|π‘] Β· οΏ½ΜοΏ½π)) (2)
Pose Optimization in Edge Distance Field for Textureless 3D Object Tracking CGI β17, June 27-30, 2017, Yokohama, Japan
(a) (b) (c) (d)
(e) (f) (g) (h)
Figure 2: Particle filtering between 2 consecutiveframes of the target (a CAT). We take π as 10 forexample. (a) πΌπ‘β1 with pose ππ‘β1. (b) πΌπ‘ with a relative largemotion from πΌπ‘β1. (c) The estimated optimal pose using ourmethod is visualized by green wireframe overlaid on the tar-get. (d) The estimated pose with ππ‘β1 as the single priorpose is visualized, obviously it converged to a local minimum.(e) The prior particles Sβ
π‘ sampled from S+π‘β1 are visualized
by red projected contours. (f) The updated particles S+π‘ by
Equation 4 are visualized by green projected contours. (g)Projected 3D contour of the best particle is evolved usingour method from a prior pose (in red) to the optimal pose(in green) in distance field of πΌπ‘. (h) Projected 3D contouris evolved from the single prior pose ππ‘β1 (in red) to theoptimal pose (in green), it got stuck in a local minimum.
where π transforms the homogenous coordinates into its non-homogenous representation. Therefore, the whole matchingcost πΈ between Ξ¦ and π· is defined by following objectiveenergy function:
πΈ(π, π‘) =βοΈ
ππβΞ¦
π2π . (3)
Starting from ππ, the optimal pose ππ is calculated by itera-tively minimizing Equation 3 using L-M algorithm:
ππ = arg minπ,π‘
πΈ(π, π‘). (4)
Figure 1(c) shows the evolution of projected 3D contourpoints from ππ to ππ by Equation 4. Figure 1(d) shows theoptimal pose ππ by overlying green wireframe on the target.
We can differentiate Equation 3 with respect to the poseparameters π to get the Jacobian required by L-M:
π½ =βοΈπ
πππππ₯πΒ· ππ₯π
ππ(5)
where π₯π is the projection of ππ. The differential πππππ₯πβ R1Γ2
can be computed using centered finite difference in π·, andππ₯πππβ R2Γ6 can be derived from Equation 1 analytically.
4.2 Particle filtering
Generally 3D tracking starts from a prior pose ππ‘ at frame π‘.Many edge-based tracking methods [5, 15, 20, 24] initialize ππ‘
using the estimated pose ππ‘β1 of frame π‘β1 under small inter-frame motion assumption. If large inter-frame motions occur,these methods with single pose hypothesis fail inevitablyas the initial pose is not close to the global minimum. In
this paper we exploit a particle filtering framework with afirst order autoregressive dynamical model to deal with largeinter-frame motions. Figure 2(a), 2(b) give 2 consecutiveframes with a relative large inter-frame motion. Figure 2(c)shows the optimized pose with our particle filtering methodin contrast to the estimated pose only using a single priorpose from previous frame as in Figure 2(d).
In our particle filtering framework, the posterior distribu-tion (denoted +) at π‘β1 is represented as a set of π particlesS+π‘β1 associated with normalized weights Ξ+
π‘β1 by{οΈS+π‘β1 = {π(0)
π‘β1, . . . ,π(πβ1)π‘β1 }
Ξ+π‘β1 = {π(0)π‘β1, . . . , π
(πβ1)π‘β1 }
(6)
where the particle π(π)π‘β1 is the πth sample in the 6DoF pose
space with an associated weight π(π)π‘β1. For the next frame π‘,
particles Sβπ‘ are resampled according to the weights Ξ+
π‘β1 andtransited by a motion model to form the prior distribution(denoted β) of frame π‘:{οΈ
Sβπ‘ = {π(0)
π‘ , . . . ,π(πβ1)π‘ }
Ξβπ‘ = {1/π, . . . , 1/π}
(7)
where Ξβπ‘ indicates each particle with an uniform weight. Sβ
π‘
is updated to S+π‘ using 3D contour matching as described in
setection 4.1, and Ξ+π‘ is evaluated according to the contour
matching cost.
For each particle π(π)π‘ β Sβ
π‘ at frame π‘, the transition isprocessed as:
π(π)π‘ = π
(π)π‘ + ππ£π£
(π)π‘β1 + πππ
(π)π‘ ,π£
(π)π‘β1 = π
(π)π‘β1 β π
(π)π‘β2 (8)
where π£(π)π‘β1 denotes the velocity of the πth particle between
π(π)π‘β1 β S+
π‘β1 and π(π)π‘β2 β S+
π‘β2. π(π)π‘ β R6 is a Gaussian noise
from π© (0,Ξ£) with a zero mean and a covariance Ξ£ β R6Γ6.ππ£ and ππ are the weights for balancing the autoregressivemotion and random motion respectively.
Once each particle is transited, it is employed in Equa-
tion 3 as the initial pose. The updating from π(π)π‘ β Sβ
π‘ to
π(π)π‘ β S+
π‘ is accomplished by optimizing the Equation 3.Figure 2(e), 2(f) illustrate the particles updated from Sβ
π‘ toS+π‘ , and Figure 2(g) shows the evolution of the best particle
in contrast to the evolution using a single prior pose as given
in Figure 2(h). We denote the matching cost π(π)π‘ of π
(π)π‘
as the residual after optimization, then the corresponding
weight π(π)π‘ of π
(π)π‘ β S+
π‘ is evaluated using the residual π(π)π‘
as follows:
π(π)π‘ = exp(βπ
(π)π‘
ππ) (9)
where the positive ππ is a control parameter for scaling the
residual. After updating all the particles, the weight π(π)π‘ β
Ξ+π‘ of each particle π
(π)π‘ β S+
π‘ is normalized by
π(π)π‘ =
π(π)π‘βοΈπ
π=1 π(π)π‘
. (10)
We consider the particle π(π)π‘ β S+
π‘ with the highest weightas the optimal pose at frame π‘.
CGI β17, June 27-30, 2017, Yokohama, Japan B. Wang et al.
When updating is done, we obtain the posterior distribu-tion at frame π‘, and it is used to generate the prior distributionat next frame π‘+ 1 by importance resampling. Each particle
π(π)π‘+1 in the prior particles Sβ
π‘+1 are randomly drawn from S+π‘
according to the weights Ξ+π‘ . After resampling is done, we
can start the next particle filtering process.
4.3 Occlusion handling
We assume that occluded 3D contour points tend to havea large distance to the nearest image edge. A simple qua-dratic error in Equation 3 is sensitive to occluded contourpoints. Therefore, an alternative weight function π€ can beincoporated by generalizing Equation 3:
πΈ(π, π‘) =βοΈπ
π€(ππ)π2π . (11)
We can still apply the L-M algorithm to solve the iteratedre-weighted least-squares(IRLS). In order to suppress theoccluded 3D contour points strongly by assigning them zeroweights, in this paper we chose the Tukey estimator:
π€(π) =
{οΈπ[1β ( π
π)2]2 if |π| β€ π
0 otherwise(12)
where π is the maximum valid distance for a projected 3Dcontour point to the nearest image edge.
4.4 Implementation details
This section elaborates the complete framework of our ap-proach and the details of parameter settings. Given an imagesequence β, the camera intrinsic matrix πΎ and an initialpose ππ, our 3D tracking approach estimates the pose ππ‘
at each frame π‘ with the 3D model π . The framework ofour approach is summarized into Algorithm 1. To minimize
Algorithm 1: 3D Tracking in Edge Distance Field
Input: β = {πΌ0, πΌ1, Β· Β· Β· , πΌπβ1}, π , πΎ, ππ
Param : π, π , Ξ£, ππ£, ππ, ππ
Output: π« = {π0,π1, Β· Β· Β· ,ππβ1}1 for π‘β 0 to π β 1 do2 πΈπππππππ‘ β Canny(πΌπ‘)
3 π·π‘ β DistanceTransform(πΈπππππππ‘)
4 if π‘ = 0 then5 Sβ
π‘ β InitializeParticles(π , ππ)
6 Ξβπ‘ β InitializeWeightsUniform(π)
7 Sβπ‘ β TransitParticles(Sβ
π‘ , ππ£, ππ, Ξ£)
8 for π β 0 to π β 1 do
9 Ξ¦π β Extract3DContour(πΎ, π(π)π‘ β Sβ
π‘ , π)
10 (π(π)π‘ β S+
π‘ , π(π)π‘ ) β IRLS(π·π‘, Ξ¦π, π, Ξ£)
11 π(π)π‘ β Ξ+
π‘ β EvaluateWeights(π(π)π‘ , ππ)
12 Ξ+π‘ β NormalizeWeights(Ξ+
π‘ )
13 ππ‘ β π« β SelectBestParticle(S+π‘ , Ξ
+π‘ )
14 Sβπ‘+1 β ResampleParticles(S+
π‘ , Ξ+π‘ , π)
15 return π«
Figure 3: Tracking results showing the effectivenessof particle filtering. The rows are respectively the track-ing results with 1, 10, 100 particles. Our method with 100particles works fine, others fail.
Equation 11, the L-M algorithm is employed and terminateduntil an maximum iteration steps(100). The number of par-ticles π is set as a different value from {1, 10, 100} so as toevaluate the efficiency and effectiveness of particle filtering.The covariance matrix Ξ£ β R6Γ6 in Equation 8 is diagonal,and ππππ(Ξ£) = (0.1, 0.1, 0.1, 0.1, 0.1, 0.1). In Equation 8, ππ£ is0.1, and ππ is 1. ππ in Equation 9 is set as the number of 3Dcontour points(i.e. |Ξ¦|) to normalize the matching residual.The threshold π of Equation 12 is set as 20.
5 EXPERIMENTS
We validate our method using different comparative experi-ments. Firstly, we demonstrate the effectiveness of particle fil-tering and occlusion handling. Then we compare our methodwith a pixel-wise tracker PWP3D [18] and a state-of-the-arttracker GOS [24] with a single pose hypothesis. Finally, weadopt marker-based tracker [10] as the baseline, and compareour method with it to accomplish quantitative evaluations.
Our system is implemented in C++, and runs on an Inteli5 CPU with 8GB RAM. The test sequences are captured bya camera with 640Γ 480 resolution, and both the object andcamera are movable.
Many edge-based methods assume that the motion ofcamera or object is small and smooth, thus the prior poseis close to the global minimum. If large inter-frame motionsoccur, the tracking fails and the estimated pose convergesto a local minimum. In order to deal with large inter-framemotions, we respectively employ 1, 10, 100 particles to trackthe object under fast camera or object motion. Figure 3 givesthe comparison results of 1, 10, 100 particles in case of fastcamera motion, we use 100 particles to track the BUNNYeven with a bit motion blur.
Most edge-based methods construct 3D-2D edge correspon-dences explicitly. When the target object is occluded, theyfind wrong edge correspondences, or even cannot find edgecorrespondences. As we described in section 4.3, we employthe Tukey estimator to suppress the importance of occludedcontour points. Our method can work if the occlusion is not
Pose Optimization in Edge Distance Field for Textureless 3D Object Tracking CGI β17, June 27-30, 2017, Yokohama, Japan
Figure 4: Tracking results showing the effectivenessof occlusion handling. The DUCK is tracked using 10particles when it is occluded by a card or a hand.
Figure 5: Comparison between our method (π = 100)with PWP3D and GOS. The first row is the trackingresults of our method (π = 100). The second row is theresults of PWP3D, and the tracking drifted due to fast cameramotion and white background. The third row is the resultsof GOS, and the edge correspondences of white BUNNY aredisturbed by the white edges from background.
very severe. Figure 4 shows that the DUCK is tracked suc-cessfully using 10 particles with occlusion handling when itis occluded by a card or a hand.
We compare our method with a pixel-wised tracker PW-P3D [18]. PWP3D proposed a probabilistic framework forsimultaneous 2D image segmentation and 3D object track-ing without building 3D-2D correspondences explicitly. Itemployed the statistic color information of foreground andbackground based on a prior pose, thus the tracking driftswhen the target object has similar color statistics with theenvironment. Figure 5 compares the tracking results obtainedby PWP3D (the second row) with our method (π = 100, thefirst row). PWP3D drifted due to the fast camera motionand white background.
As a representative edge-based tracker, GOS [24] con-structs 3D-2D edge correspondences explicitly. The imageedge correspondences are determined by a 1D local searchwith a limited extent based on a prior pose. Although GOSexploits the region knowledge around the edge point and theaffinity of adjacent image edge points, it still suffers fromthe error correspondences raised from the similar edge ofbackground. Figure 5 compares the tracking results betweenGOS (the third row) and our method (π = 100, the firstrow). For the GOS tracker, the edge correspondences of whiteBUNNY are disturbed by the white edges from background.
Table 1: Performance evaluation on 4 sequences. Rfor rotation error, T for translation error, and AD for averagedistance.
Seq(#) Method Time Accuracy(ms) R(β) T(cm)AD(cm)
PWP3D 79.8 201.8 10.2 11.9GOS 148.6 5.7 1.0 0.9
BUNNYOur method(π = 1) 16.0 8.0 1.9 2.0(1657) Our method(π = 10) 90.7 2.9 1.6 1.5
Our method(π = 100) 934.5 2.4 1.6 1.3PWP3D 90.0 208.8 6.0 7.9GOS 154.7 1.8 1.3 1.3
CAT Our method(π = 1) 16.7 3.6 1.9 1.9(1427) Our method(π = 10) 110.6 1.5 2.0 2.0
Our method(π = 100)1015.8 1.3 1.9 1.9PWP3D 73.5 189.2 7.8 9.1GOS 143.8 75.8 3.2 3.5
DUCK Our method(π = 1) 15.9 6.5 1.9 2.0(1677) Our method(π = 10) 104.5 6.5 1.8 1.9
Our method(π = 100) 960.9 6.2 1.8 2.0PWP3D 52.3 235.8 16.9 22.6GOS 147.6 16.2 3.5 3.3
LEGO Our method(π = 1) 15.9 9.3 1.5 1.9(1466) Our method(π = 10) 87.4 4.6 1.5 1.7
Our method(π = 100) 830.7 4.4 1.3 1.7PWP3D 73.9 208.9 10.2 12.9GOS 148.7 24.9 2.3 2.3
AVG Our method(π = 1) 16.1 6.9 1.8 2.0(1556) Our method(π = 10) 98.3 3.9 1.7 1.8
Our method(π = 100) 935.5 3.6 1.7 1.7
In order to evaluate the tracking accuracy and time perfor-mance of our method, we adopt the marker-based trackingmethod [10] as the baseline. The coordinate system of theobject is predefined and fixed relative to the marker, thusthe ground truth pose of the object can be transformed fromthe marker. We captured 4 sequences respectively for theBUNNY, CAT, DUCK and LEGO with a hand-hold cam-era as Figure 6 shows. Table 1 gives the accuracy and timeperformance of our method, we use two criteria to evaluatethe accuracy: rotation error and translation error [21], andaverage distance [8]. The proposed method runs 62fps with 1particle, 10fps with 10 particles and 1fps with 100 particleson average. For tracking accuracy, it achieves 3.6β rotationerror and 1.7cm translation error, and 1.7cm average distancebetween all vertices of the 3D model in the estimated poseand the ground truth pose with 100 particles on average.
6 LIMITATIONS AND FUTURE WORK
Our method directly optimizes the pose parameters in edgedistance field, thus it depends on the edge map of the queryimage. When the color of background is similar to the object,or severe motion blur occurs in the scene, we cannot getadequate contours of the target. Our method will fail becauseit merely exploits the contour information. In future work,we will consider the inner pixel information of the object toenhance the tracking robustness.
For symmetrical objects, multiple poses may result intothe same contour, which is ambiguous for our method toestimate the pose correctly. Moreover, the computation cost
CGI β17, June 27-30, 2017, Yokohama, Japan B. Wang et al.
Figure 6: Tracking results by our method(π = 100)for BUNNY, CAT, DUCK and LEGO.
is also a critical problem for fast object tracking using massiveparticles. We will speed up the particle filtering using GPUtechnique.
7 CONCLUSIONS
This paper proposes a monocular model-based 3D trackingapproach for textureless objects. We minimize the holisticdistance between the predicted object contour and the queryimage edge in distance field via direct optimisation of the 3Dpose parameters. We derive the differentials of this energywith respect to the pose parameters, and search the optimalpose parameters using L-M algorithm. We employ a particlefiltering framework to avoid being trapped in local minima.Occlusions are handled by a robust estimator. We demon-strated the effectiveness of our method using comparativeexperiments on real image sequences with occlusions, largemotions and cluttered backgrounds.
ACKNOWLEDGMENTS
The authors gratefully acknowledge the anonymous reviewersfor their comments to help us to improve our paper. This workis supported by the National Key Research and DevelopmentProgram of China (No. 2016YFB1001501).
REFERENCES[1] Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal
Fua. 2010. BRIEF: Binary Robust Independent Elementary Fea-tures. In European Conference on Computer Vision. 778β792.
[2] John Canny. 1986. A Computational Approach to Edge Detec-tion. IEEE Transactions on Pattern Analysis and MachineIntelligence 8, 6 (1986), 679β98.
[3] Changhyun Choi and Henrik I. Christensen. 2012. 3D TexturelessObject Detection and Tracking: An Edge-Based Approach. InInternational Conference on Intelligent Robotics and System.3877β3884.
[4] Changhyun Choi and Henrik I. Christensen. 2012. Robust 3DVisual Tracking using Particle Filtering on the Special EuclideanGroup: A Combined Approach of Keypoint and Edge Features.
International Journal of Robotics Research 33, 4 (2012), 498β519.
[5] Andrew I. Comport, Eric Marchand, and Francois Chaumette.2003. A Real-Time Tracker for Markerless Augmented Reality.In IEEE International Symposium on Mixed and AugmentedReality. 36β45.
[6] Pedro F. Felzenszwalb and Daniel P. Huttenlocher. 2004. DistanceTransforms of Sampled Functions. Theory of Computing 8, 19(2004), 415β428.
[7] Chris Harris and Carl Stennett. 1990. RAPiD - A Video-RateObject Tracker. In British Machine Vision Conference. 73β77.
[8] Stefan Hinterstoisser, Vincent Lepetit, Slobodan Ilic, Stefan Holz-er, Gary Bradski, Kurt Konolige, and Nassir Navab. 2012. ModelBased Training, Detection and Pose Estimation of Texture-less3D Objects in Heavily Cluttered Scenes. In Asian conference oncomputer vision. 548β562.
[9] Michael Isard and Andrew Blake. 1998. CONDENSATION -Conditional Density Propagation for Visual Tracking. In Inter-national Journal of Computer Vision. 5β28.
[10] Hirokazu Kato and Mark Billinghurst. 1999. Marker Trackingand HMD Calibration for a Video-Based Augmented Reality Con-ferencing System. In IEEE and ACM International Workshopon Augmented Reality. 85β94.
[11] Georg Klein and David W. Murray. 2006. Full-3D Edge Trackingwith a Particle Filter. In British Machine Vision Conference.1119β1128.
[12] Vincent Lepetit and Pascal Fua. 2005. Monocular Model-Based3D Tracking Of Rigid Objects: A Survey. Foundations and Trendsin Computer Graphics and Vision 1, 1 (2005), 1β89.
[13] Manolis Lourakis and Xenophon Zabulis. 2013. Model-Based PoseEstimation for Rigid Objects. In International Conference onComputer Vision Systems. 83β92.
[14] David G Lowe. 2004. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision60, 2 (2004), 91β110.
[15] Eric Marchand, Patrick Bouthemy, and Francois Chaumette. 2001.A 2DC3D Model-Based Approach to Real-time Visual Tracking.Image and Vision Computing 19, 13 (2001), 941β955.
[16] Eric Marchand, Hideaki Uchiyama, and Fabien Spindler. 2015.Pose Estimation for Augmented Reality: A Hands-On Survey.IEEE Transactions on Visualization and Computer Graphics22, 12 (2015), 2633β2651.
[17] Youngmin Park, Vincent Lepetit, and Woontack Woo. 2008. Mul-tiple 3D Object Tracking for Augmented Reality. In IEEE Inter-national Symposium on Mixed and Augmented Reality. 117β120.
[18] Victor A. Prisacariu and Ian D. Reid. 2012. PWP3D: Real-TimeSegmentation and Tracking of 3D Objects. International Journalof Computer Vision 98, 3 (2012), 335β354.
[19] Ethan Rublee, Vincent Rabaud, Kurt Konolige, and Gary Bradski.2011. ORB: An Efficient Alternative to SIFT or SURF. In IEEEInternational Conference on Computer Vision. 2564β2571.
[20] Byung Kuk Seo, Hanhoon Park, Jong Il Park, Stefan Hinterstoiss-er, and Slobodan Ilic. 2014. Optimal Local Searching for Fast andRobust Textureless 3D Object Tracking in Highly Cluttered Back-grounds. IEEE Transactions on Visualization and ComputerGraphics 20, 1 (2014), 99β110.
[21] Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi,Antonio Criminisi, and Andrew Fitzgibbon. 2013. Scene Coor-dinate Regression Forests for Camera Relocalization in RGB-DImages. In IEEE Conference on Computer Vision and PatternRecognition. 2930β2937.
[22] Luca Vacchetti, Vincent Lepetit, and Pascal Fua. 2004. Combin-ing Edge and Texture Information for Real-Time Accurate 3DCamera Tracking. In IEEE International Symposium on Mixedand Augmented Reality. 48β56.
[23] Luca Vacchetti, Vincent Lepetit, and Pascal Fua. 2004. StableReal-Time 3D Tracking using Online and Offline Information.IEEE Transactions on Pattern Analysis and Machine Intelli-gence 26, 10 (2004), 1385β1391.
[24] Guofeng Wang, Bin Wang, Fan Zhong, Xueying Qin, and BaoquanChen. 2015. Global Optimal Searching for Textureless 3D ObjectTracking. The Visual Computer 31, 6 (2015), 979β988.
[25] Harald Wuest, Florent Vial, and Didier Strieker. 2005. AdaptiveLine Tracking with Multiple Hypotheses for Augmented Reality.In IEEE International Symposium on Mixed and AugmentedReality. 62β69.