Download - Occlusion-Aware Multi-View Reconstruction of Articulated Objects for Manipulation

Occlusion-Aware Multi-View Reconstruction ofArticulated Objects for Manipulation

Xiaoxia Huang

Committee members

Dr. Stanley Birchfield (Advisor)Dr. Ian Walker

Dr. John Gowdy Dr. Damon Woodard

Motivation

Service robots

2/72

Care-O-bot 3 ( Fraunhofer IPA )

Robot Rose ( TSR )

Motivation

Domestic robots in many applications require manipulation of articulated objects Tools: scissors, shears, pliers, stapler Furniture: cabinets, drawers, doors, windows, fridge Devices: laptop, cell phone Toys: truck, puppet, train, tricycle

3/72

Important problem: Learning kinematic models

Approach

Part 1: Reconstruct 3D articulated model using multiple perspective views

Part 2: Manipulate articulated objects – even occluded parts

Part 3: Apply RGBD sensor to improve performance

4/72

Related Work

5/72

[ Yan et al., PAMI 2008 ]

[ Katz et al., ISER 2010 ]

[ Ross et al., IJCV 2010 ]

[ Sturm et al., IJCAI 2009, IROS 2010 ] [ Sturm et al., ICRA 2010 ]

Our Approach

Recovers kinematic structure from images

Features: Uses single camera Produces dense 3D models Recovers both prismatic and revolute joints Handles multiple joints Provides occlusion awareness

6/72

Approach




7/72

Articulated Model Reconstruction

8/72

Procrustes-Lo-RANSAC (PLR)

9/72

{I1}

{I2}

w

2D joint estimation

R t

Alignment /Segmentation

{I1}

{I2}

…

…

3D reconstruction

P

Q

u

w

θ

Axis direction estimation

Axis point estimation

3D jointestimation


10/72

3D reconstruction

uw

2D joint estimation

w

θ




3D jointestimation

{I1}

{I2}

P

Q

{I1}

{I2}

R t

…

…

Camera Calibration

11/72

Camera model

intrinsic parameters K

extrinsic parameters

imagepoint

x’

Features

Input

K, R, tOutput

object point

Bundler: http://phototour.cs.washington.edu/bundler/{I}

SIFT

SIFT Features

Input images

SIFT features[ Lowe, IJCV 2004 ]

Matched SIFT features

12/72

658 keypoints 651 keypoints

24 matches

Camera Calibration

13/72

Camera model

EXIF tag or default value

intrinsic parameters K

extrinsic parameters

imagepoint

x’

Features

Structure from motion

K, R, t

Minimize error

Input

Output

object point

Bundler: http://phototour.cs.washington.edu/bundler/{I}

SIFT

Camera Calibration

14/72

Camera positions

3D model with 147 images

Toy truck

3D Model Reconstruction

Expand to neighboring empty image cells Not expanded if there is a depth discontinuity

15/72

patch

Image projection of a patch

I1 I2

Object

PMVS: http://grail.cs.washington.edu/software/pmvs/

3D Model Reconstruction

16/72

3D model from Bundler 3D model from PMVS


17/72

3D reconstruction

uw

2D joint estimation

w

θ




3D jointestimation

{I1}

{I2}

R t

P

Q

{I1}

{I2}

…

…

Alignment / Segmentation

18/72

Matchfeature

C

3D correspondenceFind closest

Find closestProcrustes +Lo-RANSAC

R, t,σ

up

dat

e

Segment

End N

Y

Good?

Project into image

S1

S2

ASIFT F1

ASIFT F2

{I1}

{I2}

…

…

P

Q

Procrustes Analysis

19/72

Procrustes analysis is the process of performing a shape-preserving similarity transformation.

Greek myth{A}

{B}

http://www.mythweb.com/today/today07.html

Procrustes AlgorithmAlgorithm

Step 1: Translation

20/72

X

μX

Yμ

Y

X

Y


21/72

Step 2: Scale

X

YY

X


22/72

Step 3: Rotation

XY

R


23/72

3D reconstruction

uw

2D joint estimation

w

θ




3D jointestimation

{I1}

{I2}

R t

P

Q

{I1}

{I2}

…

…

2D Joint Estimation

24/72

Link 0

Link 1

3D model

{A}

Joint point w

Configuration 1

Two Links: Change Configuration

25/72

Link 1

Link 0

3D model

{B}

Joint

point w

Configuration 2

Object Model in 2D

26/72

{A} {B}

Link 1

Link 0

Transformation of Link 0 between two configurations:

Configuration 1 Configuration 2

Link 1

Link 0

Align Link 0

27/72


Link 0

Link 1

{A}

Link 1

Link 0

{A}

Transformation of Link 1 between two configurations:

Align Link 1

28/72


Link 0

Link 1

{A}

Link 1

Link 0

{A}

2D Joint Estimation

29/72

R1AA t1

AA

+w

-w

2D joint :

R1AA


30/72

3D reconstruction

uw

2D joint estimation

w

θ




3D jointestimation

{I1}

{I2}

R t

P

Q

{I1}

{I2}

…

…

3D Joint

Revolute joint

Prismatic joint

31/72

Axis direction u

Axis point w

u = t/|t|

w = mean({pi})Joint is classified using R

Axis direction u

Axis point w

Revolute Joint Direction

32/72

u

θAxis angle

representation

(Singularity: = 0°or = 180°)

Direct computation Eigenvalues / eigenvectors

Tw

o m

etho

ds

Revolute Joint Point

33/72

Rotation axis

Rotation plane

u

θ πu

u

θRA

A

π

πu

Revolute Axis Point

34/72

u

πu

θ

• Rotation

• Translation

• 2D joint

• 3D axis point

Experimental Results(1)

35/72

1 out of 22 1 out of 19

Red line is the estimated axis


36/72



Experimental Results (3)

37/72




38/72




39/72



Experimental Results

40/72

Average and standard deviation of angle error

High performance


41/72

7.6°angle difference between two axes


42/72


43/72

2.5°angle difference between two axes

Approach




44/72

Part 2 : Object Manipulation

Robotic arm + Eye in hand (camera on the end effector)

3D articulated model + Scale estimation(σ) Object registration + Manipulation

45/72

3D articulated modelRobotic arm Manipulating object

Object Registration

Camera calibration

Hand-eye calibration

Robot calibration

46/72

Hand-eye Calibration

Calibration object: chessboard Robotic arm with a camera moves from P to Q A is the motion of the camera, B is the corresponding

motion of the robot hand

47/72

Let:

Then:

Since:

Only unknown:

Object Pose Estimation

Place the object in the camera field of view Take an image of the object at some viewpoint Detect 2D-3D correspondences Estimate the object pose by POSIT algorithm

48/72

POSIT :

• does not require the correspondences are planar

• Iteratively approximate an object pose using POS (Pose from Orthography and Scaling)

• POS simplifies a perspective projection by a scaled orthographic projection


Camera calibration

49/72

20 different views of a chessboard (7×9 squares of 10×10mm)


Corners extraction

50/72

+ : extracted corner (up to 0.1pixel)

: corner finder window (5×5mm)


Intrinsic parameters

51/72

The calibration results of a Logitech Quickcam Pro 5000


Extrinsic parameters

52/72

All 20 camera positions and orientations


Reprojection corners

53/72

+ : extracted corner

: corresponding reprojected corner

: reprojection error and direction


Hand-eye calibration

54/72

position (X, Y, Z) (mm) rotation angles (Rx,Ry,Rz) (rad)

from

rob

ot system Average reprojection error is 1.1 pixel


Object pose estimation

55/72

Ten images taken at arbitrary viewpoints

The corresponding images with the highest number of matched features


Object manipulation

56/72

Frames of a video sequence of PUMA manipulating a toy truck

Once and are obtained, is provided by the robot kinematic system. Given any particular point on the object

the robot can locate its end effector to that position using:

where

Approach




57/72

RGBD Sensor

Motion sensing system : Microsoft Kinect, Asus Xtion sensor Microsoft Kinect (Nov. 2010)

Xbox 360 video game console CMOS color camera (8-bit VGA resolution, 30Hz) Infrared (IR) laser projector (structure lighting) CMOS IR camera (11-bit VGA resolution) Multi-array microphone

58/72Microsoft Kinect sensor

Light pattern

Reconstruct real-time 3D models

KinectFusion

59/72

Overview of KinectFusion algorithm

KinectFusion- Depth map

L projects a light spot P on an object surface, and O observes the spot

Triangle (OPL)

solve d

Given b,α and β

60/72

Standard structured lighting model

Depth image (VGA &11-bit )

KinectFusion-Vertex and normal map

Vertex map is a 3D point cloud

Normal vector indicates the direction of the surface at a vertex

61/72

intrinsic matrix (IR camera)

2D depth pointdepth

3D vertex

cross product

neighboring points

KinectFusion- Camera tracking

Small motion between consecutive positions of Kinect Find correspondences using projective data association Estimate camera pose Ti by applying ICP algorithm to

vertex and normal maps

62/72

Tracking camera pose

KinectFusion- Volumetric integration

Volumetric representation (3×3×3m, 512 voxels/axis)

(0,1] (outside of the surface) tsdf(g) = 0 (on the surface)

[-1,0) (inside the surface)

63/72TSDF volume grid

outside

inside

surface (zero-crossing)

KinectFusion- Surface rendering

Cast a ray through the focal point for each pixel Traverse voxels along the ray Find the first surface by observing the sign change of tsdf(g) Compute the intersection point using points around the surface

boundary

64/72

outside

inside

Ray-casting technique

surface

TSDF volume grid

Learning Articulated Objects

65/72

Overview of the system using Kinect

…


66/72

Build a 3D model of a microwave

color images

corresponding depth images

3D model


67/72

Build another 3D model of the microwave

color images


3D model


68/72

Interactively change the configuration of the microwave

color images



69/72

Estimate rotation axis of the microwave

Conclusion

Yield occlusion-aware multi-view models Do not require prior knowledge of the object Do not make any assumptions regarding planarity of the

object support both revolute and prismatic joints automatically classify joint type work for objects with multiple joints only require two different configurations of the object show its effectiveness on a range of environmental

conditions with various types of objects useful in domestic robotics

70/72

Reconstruct kinematic structure of articulated objects

Conclusion

Given a particular point on the object, the robot can move its end effector to that position, even if the point is not visible in the current view

Given a particular grasp point, the robot can grab the object at that point and move in such a way so as to exercise the articulated joint

71/72

Manipulate articulated objects

Do not require artificial markers attached to objects Yield much denser 3D models Reduce the computation time

Apply Kinect to system

Thanks!