3D Reconstruction Of Occluded Objects From Multiple Views · Vice versa the reﬁned camera poses...

3D Reconstruction Of Occluded Objects From Multiple Views

Cong QiaobenStanford University

Dai ShenStanford University

Kaidi YanStanford University

Chenye ZhuStanford University

AbstractIn this paper we present three different al-gorithms to approximately reconstruct cer-tain parts of an object that can only beseen in one view due to occlusion. Singleview 3D reconstruction is generally hardto carry out due to the under-determinednature. Nevertheless, we make certain ob-servations on the general shape of the tar-get object and make use of other parts ofthe object which we can accurately trian-gulate. We test our algorithms on differentobjects and conduct both qualitative andquantitative analysis on results we get.

1 Introduction

Taking a video camera and walking around an ob-ject to automatically construct a 3D model hasbeen a long-standing dream in computer vision.What seems unreachable 20 years ago is no closeto our abilities thanks to the many discoveries in3D geometry and image matching over the years.The various 3D construction techniques have alsobeen able to take advantage of the developmentsin digital cameras and the increasing resolutionand quality of images they produce along with thelarge collections of imagery that have been estab-lished on the Internet. One of the famous appli-cation is the Photo Tourism project such as the re-construction of the Coliseum in Rome as shown inFigure 1.

Figure 1: Reconstructed 3D model of the Coli-seum

However, what currently hinders the use ofmore 3D information in other domains seems to bethe specialization of 3D reconstruction methods tospecific settings. State of the art multi-view stereomethods are able to produce accurate and dense3D reconstructions. Unfortunately, many of themare tuned to settings like the Middlebury bench-mark (Kazhdan et al., 2006), where the cameraposes are known, silhouette information can eas-ily be retrieved and the number of views are ade-quately large (usually more than 20). In this pa-per, we aim to remove these requirement of con-text information in the standard multi-view meth-ods through three approximation methods - patch,symmetry, and ratio approximation.

The rest of the paper is organized as follows.Section 2 reviews the inadequacies of some of theexisting approaches to the problems and our solu-tions. Section 3 gives a detailed explanation of ourmethodology as well as the underlying principlesof our approximation methods. Section 4 showsthe results of our methods on various objects sub-ject to various camera settings and illuminations.Section 5 conclude our work.

2 Related Work

2.1 Existing Literature

Apart from the above mentioned works and thereferences within the Middlebury benchmark,there are a couple of works, which are closely re-lated to the approaches we present in this paper.

Yezzi (Yezzi et al., 2003) proposes a methodwhere the cameras are refined during the densereconstruction process. However, their silhouettebased approach is not well suited for general im-age sequences. It works best for textureless sceneswhere the background can be clearly separatedfrom the object.

Furukawa (Furukawa et al., 2008) describes aniterative camera calibration and reconstruction ap-

proach. A large set of patches is used to refine thecamera poses. Vice versa the refined camera posesare used to improve the reconstruction. To com-pute the dense surface they use the method fromKazhdan (Kazhdan et al., 2006). The resultingsurface is fitted to the patches by the solution ofa Poisson problem. As the spatial Poisson prob-lem does not take photoconsistency into account,results suffer in areas not covered by sufficientlymany patches.

2.2 Our Contribution

We aim to address the issues associated with theapproaches mentioned above. In particular, theapproximations we propose enable us to obtainan accurate object structure without the need fora background of clear contrast nor the need fora large number of photos to provide a sufficientcoverage for each possible patch. In addition, oursymmetry approximation method provides a sim-ple but effective way of solving the reconstructionof symmetric objects. This method, in fact, goesbeyond the problem of reconstruction, and is po-tentially also applicable in image recognition andeven graphical reasoning.

3 Approach

3.1 General Methodology

In general triangulation, we solve the following si-multaneous equations to find the 3D location ofthe unknown point P .{

M1P = p1

M2P = p2

M1 and M2 ∈ R3×4 and the unknown point P ∈R4×1. However, in our scenario only one of thetwo observation p1 and p2 is observable. Thuswe have 4 degrees of freedoms and only 3 scalarequations, resulting in an under-determined linearequation system.

Therefore, the remaining part of the paper ex-plores different heuristics we can use in order togive one more constraint (one more scalar equa-tion) with respect to P .

As a preliminary step, we first find matchingpoints on different images and apply triangulationto find their 3D locations. This gives us some basicinformation about certain parts of the object whichcan be used in our methods.

3.2 Technical Details

For this part we are going to elaborate on the mainheuristics we adopt in approximating the 3D loca-tions of points that can only be seen in one view.

3.2.1 Patch ApproximationIn this section we employ the techniques of tes-sellation in computer graphics to help us recover-ing the shapes of our occluded object. We subdi-vide the surface of the object into planar polygons,called tessellators or patches, and simulate thesmoothness of the surface with non-trivial amountof such patches, see figure 2. The idea is as fol-low: we first assign each point on the object to aplanar polygon, then we calculate the position ofthe polygon in 3D space, and afterwards, we re-cover the exact location of each point by findingthe corresponding 3D points in the polygons.

Figure 2: Tessellation of a Smooth Object

Before we proceed with our discussions weneed to define some terminologies that we will usein this section and the remaining part of the paper.A polygon, or a patch, is a 3D planar tessellator.Its projection into 2D image is a region. Further-more, if two points lie on the same polygon, theirprojections lie within the same region. Full viewimage refers to the image in which we can seethe overall shape of the object, as in figure 6(b)whereas the occluded image refers to the image inwhich object is partially occluded by obstacles, asin figure 6(a).

1. Edge detection

Consider the intersection of two patches.Since the points belong to two different re-gions and lie on two disparate planar poly-gons, we shall expect surface orientation dis-continuities near the region, one of the mainsources of image brightness discontinuities inthe perceived images, i.e. edges. Thereforewe shall be able to detect some really thin

edges along the corresponding regions inter-sections.

Notice that these edges are delicate as the ob-ject surface is rather smooth and the changesof gradients are relatively mild. Thus we useCanny Edge Detector by drastically lower-ing the two thresholds to capture such minuteparts. We apply the canny edge detectiontechniques to the full view image near the ob-ject region and obtain the preliminary resultfor the edge map. For an example of how itlooks like, see figure 11 in Experimental Re-sults section.

2. Edge completion

Nonetheless, some of the edges are discontin-ued as there are nontrivial amount of noisesin the images. Post-processing is warrantedas to connect dots and complete the separa-tion of planes. We employ two methods inachieving our goals: (1) a methodology simi-lar to non-max-suppression + edge continua-tion in the canny edge detector but with moreleniency, in that we would allow slightlymore deviations from the suggested path ac-cording to the gradient directions and (2) awindow-based geometric methodology thatscan through each patches of the images andconnect two dots if both have been labelededges and one has a gradient direction is tan-gent to vector connecting these two dots. Forinstance in figure 3, the two blue blocks weredesignated as edge block by Canny detectorand the one on the left bottom has a gradientdirection perpendicular to the line segmentconnecting the centers of the two blue blocks.Thus we mark all of blocks in between as po-tential edge blocks.

Figure 3: One possible scenario in which two dotswill be connected

Finally we combine these partial edge mapsfrom Canny Edge Detection, figure 11 andfrom Edge Completion, figure 12, and pro-duce the full view object edge map in figure13 by restricting the total region counts in thewindow.

3. Region assignment

With the edge map in hand, we proceed to as-sign each image pixel into a region. We knowfor sure that pixels enclosed in the same re-gions must belong to the same polygons. Butwe are not guaranteed whether two differentregions correspond to disparate polygons ornot. Notice that discontinuities of surface ori-entations are not the only reasons for edgesin the images. Other artifacts such as changein material properties (color, reflectance andetc.) and changes in scene illuminations mayalso result in edges in the perceived images.We need to account for these artifacts. 1

4. Reconstructing polygons

We import the results from the preliminarytriangulation. In particular, to account forthe inaccuracies incurred due to floating pointcalculations and projection errors, we ar-gue that for each triangulated point, close-byneighbors 2 should all correspond to the same3D points. We claim that a region is prop-erly defined if and only if we could find threeor more points on the corresponding patches.For each of these proper regions, we find theplane equation of its corresponding 3D patchby interpolating the 3D locations of the pointssitting in the region.

5. Recover points within a polygon

This is the final step of our reconstructionprocess. For any pixel x in the full view im-age, if it belongs to a region that is properlydefined, i.e. we know the plane equation ofthe corresponding polygon D, we estimatethe 3D location of x, which is denoted as X,by solving the following minimization prob-

1Within the scope of this project, we decide to manuallycombine regions that should have been grouped together. Butif time allowed we could have explored more in details andrecover the information through material properties, shadow-ing and etc. (Bell et al., 2015)

2Here we impose a 5-pixel-by-5-pixel window to deter-mine nearby neighbors

lem.

minimizeX

distance(X, D)

subject to PX = x

Where distance(·, ·) is the distance functionfrom a point to a plane, and P is the projec-tion matrix associated with the full view im-age.

3.2.2 Symmetry ApproximationIn this section we make the assumption that thetarget object is nearly symmetric. We also assumethat the full view image is taken with the targetobject facing right in front of the camera, and thecamera projection is roughly affine.

The high level idea of this approach is to find thesymmetry plane, and then estimate the distance tothe symmetry plane based on the distance we re-trieve from the full view image. When the sym-metry plane is perpendicular to the image view,the symmetry plane is projected into a symmetryaxis. Therefore, the problem of finding the sym-metry plane is equivalent to finding the symmetryaxis in the image in this case.

Our symmetry detection algorithms considertwo cases. The first algorithm tries to resolve thesimplified version of symmetry detection: it as-sumes the image is taken parallel to the plane theobject is placed. Therefore, this symmetry axis isa vertical line in this case. The second algorithmis more generic, because it will find the symmetryaxis of any arbitrary near-symmetric image, giventhat the symmetry plane is perpendicular to the im-age view.

Both algorithms first use canny edge detectorto find edges of the image. For convenience, themask is denoted by ME . canny edge detectiontracks edges by suppressing all the other edges thatare weak and not connected to strong edges usingtwo thresholds (upper and lower). In our case, thelower threshold is particularly interesting becauseit suppresses weak edges. Here we set the lowerthreshold to be sufficiently large to filter away asmuch weak edges as possible; this also reduces theamount of computation for the symmetry detec-tion algorithm, which is explained next.

1. Symmetry Detection algorithm for verticalsymmetry axis

This algorithm takes each row of ME andapplies a Gaussian filter. This generates the

vector ρ′i(ME). Then it finds the x0 valuethat maximizes the sum of the convolution ofeach row of ME and its ”reflection” acrossx = x0, ρ′i,x0

(ME).

x = argmaxx0

∑i

ρ′i(ME) ∗ ρ′i,x0(ME)

The time complexity of this algorithm isO(σsize(ME)), where σ is the standard devi-ation of the Guassian filter. Higher σ meanshigher error tolerance in symmetry matching.

2. Symmetry Detction algorithm for arbi-trary symmetry axis

This algorithm uses Hough Transform to findthe symmetry axis.The symmetry axis couldbe denoted as

x cos θ + y sin θ = ρ

. We could find the potential symmetry axisby looking at any pair of points (x1, y1),(x2, y2) and assuming they are symmetric.This would give us:

tan θ =y1 − y2x1 − x2

ρ =x1 + x2

2cos θ +

y1 + y22

sin θ

The number of bins we use is 1000 for ρ and20 for θ. The bin with the highest frequencygives us the best estimate.

After finding the symmetry axis, we coulduse the triangulation method to find the sym-metry plane S, by picking SIFT keypointsclose to the symmetry axis and interpolat-ing these points. Also using SIFT, we couldget correspondence between the the pointswe want to estimate 3D locations for andtheir symmetric counterparts. This is done byflipping the full view image horizontally andthen perform SIFT between the original fullview image and the flipped one.

For each symmetric pair (x, y), the distancesto the symmetry plane should be the same forx and y in 3D space. Therefore if we knowthe 3D location of y is Y, we can calculatethe distance from y to S. The 3D location ofx, which is denoted as X, can thus be esti-mated by solving the following minimization

problem.

minimizeX

|distance(X, S)− distance(Y, S)|

subject to PX = x

Where P denotes the projection matrix of thefull view image.

3.2.3 Ratio ApproximationThe last heuristic is based on the fact that affineprojections preserve the ratio of lengths for paral-lel line segments. Given an affine transformationA(p) =Mp+b, the ratio of lengths between par-allel line segments is invariant. This is true sincetwo parallel line segments l1 and l2 has the rela-tionship l1 = kl2, and after affine transformationthe ratio of lengths is

‖M l1‖‖M l2‖

=‖kM l2‖‖M l2‖

= k =‖l1‖‖l2‖

In reality the camera projection is not affine -however for the sake of approximation we assumethe camera is close to affine. If we can find twoparallel line segments from the image which con-tains the entire object, then we can compute theratio of lengths of these two segments. Assum-ing an affine transformation, the ratio of lengths ofthese two line segments should remain the sameafter we project them back into 3D world space. Ifwe happen to know the length of one line segmentin the 3D world space, then we can calculate thelength of another one, thus potentially giving usmore new constraints on the unknown point loca-tion.

(a) (b)

Figure 4: Example of pot. Figure 4(a) shows onlypart of the pot while figure 4(b) shows the entirepot. Our goal is to reconstruct the part that is oc-cluded by the book.

Here is an illustration using the example of apot. Suppose we want to estimate the 3D loca-tion of point x, which is indicated as a red spotin Figure 4(b). Clearly the pot is asymmetric, and

Figure 5: Graphic illustration of applying ratio ap-proximation

thus we cannot apply symmetry approximation tothis object; patch approximation also faces obsta-cles as we need to know at least 3 more points onthe pot handle, which in this case is impossible toknow given the occlusion in Figure 4(a).

In order to apply ratio approximation, we firstfind a vertical line l in figure 4(b) using the sameway as we find the vertical symmetric axis in sym-metry approximation. Notice that in this case weuse the same method but the line we find no longercarries the meaning of a symmetric plane - it onlyserves as a reference line. After that, we choose aSIFT point y that is located on the opposite side ofl from x. From here we can calculate the distancefrom x to l and from y to l. Denote them as l1 andl2 respectively.

The rest part is similar to what we do in sym-metry approximation. Again, we assume that wecan find at least 3 SIFT points on line l which wealready know the 3D locations - thus when l is pro-jected back into 3D space we can approximate thisplane S by interpolating these 3 SIFT points. Wealso find the 3D location of point y, and hence wecan calculate the distance from Y to plane S in the3D space. Let it be l′2. If the distance from the 3Dlocation of point x to plane S is l′1, then we havethe following ratio of length equality:

l′1l′2

=l1l2

l1, l2 and l′2 are all known. Thus we can computel′1. Now we know the distance from the 3D loca-tion of x into plane S, and therefore we have onemore constraint which allows us to solve for actualX .

4 Experiments

4.1 Experimental SetupFor the experimental setup, we need to take twoimages of the same target object. The full viewimage contains the entire front face of the targetobject unoccluded; while another image only con-tains the target object occluded by other objects.Our goal is to use these two images and recon-struct the part that is occluded by other objects.

(a) (b)

Figure 6: Experiment 1. The target object is theChinese doll.

(a) (b)

Figure 7: Experiment 2. The target object is thechair.

(a) (b)

Figure 8: Experiment 3. The target object is thepot sitting on the table.

In order to evaluate our reconstruction result,for each experiment we also take a third image,which also contains the entire front face of the ob-ject.

By including the third image, we are able to usetriangulation to accurately find 3D locations of allpoints that we see on the front face of the targetobject. This is the baseline result for our experi-ments - when implementing our heuristics we are

Figure 9: SIFT matching points we find for experi-ment 1. The red lines are the epipolar lines at eachpoint.

Figure 10: Triangulation result for SIFT matchingpoints in experiment 1

only going to use the images shown in Figure 6, 7and 8. For points we have estimated, we are go-ing to compare their estimated locations with theactual locations found from triangulation.

4.2 Experiment Results

4.2.1 Triangulation Result

The first part of the experiment is to find 3Dlocations of points that appear in both images.This provides some basic information about cer-tain parts of the object and is needed for each ofthe heuristic we propose.

We calculate the intrinsic camera matrices usingsingle view metrology. After that, we use SIFT tofind matching points in the two images. For thesematching SIFT points, we apply projective trian-gulation in structure from motion to estimate therotation and translation matrices R and T of cam-eras, and therefore find the 3D locations of thosematching SIFT points (For more details, refer toCS231A HW2 Question 4). Figure 9 and 10 showthe resulting 3D points we find for experiment 1.

Figure 11: Pure Canny Edge Detection Resultwith Canny Bound as 10−4 and 0.05

Figure 12: Edge Completion Result

4.2.2 Patch Approximation Result

Figure 13 shows the resultant full edge map we getfor experiment 1, using edge detection and edgecompletion technique we mentioned in TechnicalDetails section.

Using Figure 13, we are able to assign regionsfor the target object. Figure 14 shows the regionmap we get experiment 1.

With region map, we are able to approximatethe plane equation of some of the regions if we al-ready have three or more SIFT points with known

Figure 13: Combine Edge Detection Results andEdge Completion Results

Figure 14: Region assignment for experiment 1

Figure 15: Reconstruction result with patch ap-proximation on experiment 1

3D locations sit on the regions. Thus we are ableto recover the entire polygon by estimating the 3Dlocations of each pixel in that region. The recon-struction result is shown in Figure 15 and 16.

Note that certain part of the object is missing,since the region at those parts do not have suffi-cient number of SIFT points for us to approximatethe plane equations. We also try patch approxima-tion for experiment 2 and 3 - however the resultsare mostly unsatisfactory because the number ofpatches on the target object which we can approx-

Figure 16: Reconstruction result with patch ap-proximation on experiment 1 projected onto theoccluded image

Figure 17: Symmetric line generated for experi-ment 1

Figure 18: Symmetric line generated for experi-ment 2

imate is limited (fewer than 3 in experiment 2 and3). And the occluded part are mostly not sittingon those patches. Therefore, patch approximationcannot achieve much in such scenarios.

4.2.3 Symmetry Approximation ResultWe first use canny edge detector to generate thevertical symmetric line. Figure 17 and 18 show theresulting symmetric lines we get for experiment 1and experiment 2.

With the symmetric line, we are able to calcu-late the equation of the symmetric plane. The nextpart is find symmetric point correspondences. Asdescribed in Technical Details section, we flip theimage horizontally and then use SIFT matchingagain to find matching between the original imageand the flipped image. The resulting symmetricpoint correspondences we find for experiment 1 isshown in Figure 19.

After we find the symmetric plane S as well asall these symmetric point correspondences, we ap-ply symmetry approximation to each pair of cor-respondences we find. Figure 20 and 21 show theestimated 3D locations of the SIFT points wherewe can only see from the full view image for ex-periment 1 and 2.

Figure 19: Symmetric point correspondencesfound by flipping images and SIFT matching forexperiment 1

Figure 20: Reconstruction result for experiment1. Blue points indicate SIFT points where we al-ready know the 3D locations from triangulation;red points are the ones we estimate from symme-try.

Figure 21: Reconstruction result for experiment2. Blue points indicate SIFT points where we al-ready know the 3D locations from triangulation;red points are the ones we estimate from symme-try.

Figure 22: Reconstruction result for experiment3. Blue points indicate SIFT points where we al-ready know the 3D locations from triangulation;red points are the handle part which we estimateusing ratio approximation.

4.2.4 Ratio Approximation ResultSince ratio approximation is essentially an exten-sion of symmetry approximation, we expect simi-lar results to be achieved for experiment 1 and 2.Thus, our main focus here is the result for exper-iment 3. We pick 10 SIFT points on the handlepart of the pot from the full view image and try toestimate their 3D locations. Figure 22 is the re-construction result for the handle of the pot.

4.3 Result EvaluationAs mentioned previously, we also have a ‘hidden’image for each experiment so that we can performtriangulation on the points we estimate during ex-periments in order to achieve a baseline result fortheir 3D locations.

For patch approximation, we are able to recoverall pixels sit in the same region if we can calcu-late the plane equation for the region. The obsta-cle is that it is extremely difficult for us to triangu-late all these points and compare the results - thuswe choose to visually verify the result and basedon Figure 15, the reconstruction result is satisfac-tory for experiment 1. As mentioned before, thisapproximation only works well for experiment 1since we are unable to locate many useful patchesfor experiment 2 and 3.

For symmetry approximation and ratio approxi-mation, we only reconstruct those SIFT points thatcan just be seen in one view; therefore we are ableto use the hidden image to actually perform tri-angulation and find their 3D locations (subject toprojective ambiguity).

Figure 23: Re-projection result for experiment 1;White points are SIFT points that are re-projectedback; yellows points are SIFT points for symmet-ric correspondences.

Figure 24: Re-projection result for experiment 2;White points are SIFT points that are re-projectedback; yellows points are SIFT points for symmet-ric correspondences.

In order to compare our estimation with the tri-angulated locations, we project both our estimated3D points and the triangulated 3D points back ontoanother image, where the target object is occludedby another object. In ideal situation, these pointsshould sit in regions which are occupied by the oc-clusion when projected back onto the image. Wecan visually verify that to determine if our estima-tions are good or not. Furthermore, we can com-pute the average Euclidean distances between the2D points projected back from our estimations andthe ones projected back from triangulation. Weuse this as our reconstruction error.

Figure 23, 24 and 25 show the re-projection re-sults when we project our estimated points backonto the image with occlusion for experiment 1, 2and 3. Table 1 shows the reconstruction error foreach of the experiment and the heuristic we adopt.

Figure 25: Re-projection result for experiment 3;White points are SIFT points that are re-projectedback. There are no yellow points on this graph aswe use ratio instead of symmetry approximation.

Experiment Heuristic Reconstruction Error1 Symmetry 1.112 Symmetry 32.403 Ratio 5.70

Table 1: Reconstruction error for different experi-ments

We also have estimation error during triangulation.The average triangulation error when using non-linear estimate (refer to CS231A HW2 Question 4for more details) for each of the 3 experiments islisted in Table 2.

Experiment Triangulation Error1 0.972 1.503 2.00

Table 2: Triangulation error for different experi-ments

Comparing results in Table 1 and 2, we observethat the reconstruction error is close to the trian-gulation error when we use symmetry approxima-tion on the Chinese doll. This also gets visuallyverified by Figure 23, where the white points sitroughly at the expected locations.

Meanwhile, the reconstruction error on chairswhen using symmetry approximation is about 20times as large as the triangulation error. We canverify this by observing that most white pointsin Figure 24 are located in ground, implying that

many the 3D estimates are deviated significantlyaway from the actual 3D locations. A poten-tial explanation for this is the dark light conditionwhen conducting experiments which causes trou-ble for finding symmetric correspondences usingSIFT. This shows that symmetry approximation isvery sensitive to the environment light as it makesheavy use of SIFT.

The reconstruction performance for experiment3 is somewhat in between the performance for ex-periment 1 and the performance for experiment2. From Figure 25 we see there are a few pointsthat are off the expected locations; but most ofthe white points are still around the handle area(which is covered by the book).

5 Conclusion

We show that using patch, symmetry and ratioapproximation, we are able to reconstruct certainoccluded parts of the target object given that oc-cluded part is only seen in one view. Different ap-proximations do suffer from different drawbacks- patch approximation requires a good amount ofpatches to be approximated; symmetry approxi-mation requires the target object to be nearly sym-metric; and lastly both symmetry and ratio approx-imation make the assumption that the camera isclose to affine. Therefore, a potential future workfor our project is to combine different heuristicstogether and compare the overall reconstructionresult with the one we achieve when using indi-vidual approximation.

Our code can be found at https://www.dropbox.com/sh/3a67yvxpe0a9osn/AAB3N4jJ5Hpvq0E8LHNZpMTea?dl=0 .

ReferencesKazhdan, M., Bolitho, M., Hoppe, H. Poisson surface

reconstruction Symposium on Geometry Process-ing. Eurographics Association (2006)

Yezzi, A.J., Soatto, S. Structure from motion for sceneswithout features Proc. CVPR (2003)

Furukawa, Y., Ponce, J. Accurate camera calibrationfrom multi-view stereo and bundle adjustment Proc.CVPR (2008)

Kazhdan, M., Bolitho, M., Hoppe, H. Poisson surfacereconstruction Symposium on Geometry Process-ing. Eurographics Association (2006)

Sean Bell, Paul Upchurch, Noah Snavely, Kavita BalaMaterial Recognition in the Wild with the Materialsin Context Database

3D Reconstruction Of Occluded Objects From Multiple Views · Vice versa the reﬁned camera poses...

Documents

Transcript of 3D Reconstruction Of Occluded Objects From Multiple Views · Vice versa the reﬁned camera poses...