H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for...

34
H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions Buğra Tekin Microsoft MR & AI Lab, Zürich

Transcript of H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for...

Page 1: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

H+O: Unified Egocentric Recognition of

3D Hand-Object Poses and Interactions

Buğra Tekin

Microsoft MR & AI Lab, Zürich

Page 2: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Hand & Object Interactions

Page 3: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Outline1. Object Understanding

2. Hand + Object Understanding

1. B. Tekin, S. Sinha, P. Fua, “Real-time Seamless Single Shot 6D Object Pose Prediction”, CVPR’18

2. B. Tekin, F. Bogo, M. Pollefeys, “H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions”, CVPR’19

Page 4: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Outline1. Object Understanding

1. B. Tekin, S. Sinha, P. Fua, “Real-time Seamless Single Shot 6D Object Pose Prediction”, CVPR’18

Page 5: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Real-time Seamless Single Shot 6D Object Pose Prediction

Buğra Tekin, Sudipta Sinha, Pascal Fua, “Real-time Seamless Single Shot 6D Object Pose Prediction”, CVPR’18

Benchvise

Holepuncher

Driller

Cam

ApeDuck

Page 6: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

6D Object Pose Estimation

Estimating the pose (rotation and translation) of an object instance in the camera coordinate system

Camera

coordinate

system

Object

coordinate

system

Page 7: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Challenges

Variation in illumination Occlusion

Clutter Variation in viewpoint

Page 8: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Our Goal

Benchvise: 0.3

Holepuncher: 0.35

Driller: 0.45

Cam: 0.9

Ape: 0.85

Duck: 0.78

Predict the object pose, the class and the detection confidence on

the full image with a CNN in a single shot

Page 9: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Our Approach

Page 10: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Our Approach

• Predict relative coordinates of the 8 2D corner points

• Also predict the confidence and class probabilities

• At inference time, recover the pose of the object with PnP

Page 11: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Qualitative ResultsSequence containing “Ape” object

Sequence “Ape”

Page 12: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Sequence containing “Benchvise” object

Qualitative Results

Sequence “Benchvise”

Page 13: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Sequence containing “Cat” object

Qualitative Results

Sequence “Cat”

Page 14: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

• Evaluation metric: Percentage of correctly estimated poses

Quantitative Results

Page 15: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

• Runtime efficiency

• Evaluation metric: Percentage of correctly estimated poses

Real-time 94 fps speed

Quantitative Results

Page 16: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

• Runtime efficiency

• Evaluation metric: Percentage of correctly estimated poses

Quantitative Results

Page 17: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Multi-Object 6D Pose Estimation

Page 18: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Outline1. Object Understanding

2. Hand + Object Understanding

1. B. Tekin, S. Sinha, P. Fua, “Real-time Seamless Single Shot 6D Object Pose Prediction”, CVPR’18

2. B. Tekin, F. Bogo, M. Pollefeys, “H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions”, CVPR’19

Page 19: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions

B. Tekin, F. Bogo, M. Pollefeys, “H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions”, CVPR’19

Page 20: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

H+O

2D Hand+Object Pose 3D Hand+Object Pose

Per-frame predictions

Page 21: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

ObjectBrachmann et al., CVPR’16

Rad & Lepetit, ICCV’17

Kehl et al., ICCV’17

Tekin et al., CVPR’18

Sundermeyer et al., ECCV’18

Zimmermann & Brox, ICCV’17

Müller et al., CVPR’18

Spurr et al., CVPR’18

Iqbal et al., ECCV’18

Hand

Page 22: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

RGBAction

RecognitionObject Pose Hand Pose

Hamer et al., ICCV’09 ⚫

Oikonomidis et al., ICCV’11 ⚫ ⚫

Sridhar et al., ECCV’16 ⚫ ⚫

Tzionas et al, IJCV’16 ⚫ ⚫

Müller et al., ICCV’17 ⚫

Garcia-Hernando et al., CVPR’18 ⚫ ⚫

Hasson et al. CVPR’19 ⚫ ⚫ ⚫

OURS ⚫ ⚫ ⚫ ⚫

Hands in Interaction with Objects

Page 23: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Hand+Object

1. Unified framework for recognizing 3D hand and object poses and interactions, by

simultaneously solving 4 tasks

a) 3D hand pose estimation

b) 6D object pose estimation

c) Object recognition

d) Activity recognition

Page 24: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Hand+Object

1. Unified framework for recognizing 3D hand and object poses and interactions, by

simultaneously solving 4 tasks

2. A single-shot deep architecture that jointly solves for articulated and rigid pose estimation

Page 25: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Hand+Object

1. Unified framework for recognizing 3D hand and object poses and interactions, by

simultaneously solving 4 tasks

2. A single-shot deep architecture that jointly solves for articulated and rigid pose estimation

3. A temporal model to explicitly model interactions in 3D between hands and objects

Page 26: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Hand+Object

Unified Egocentric Scene Understanding

𝐩𝑖𝑎

Page 27: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Hand+Object

store 3D locations

in a 3D grid

𝐩𝑖𝑎

Page 28: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Hand+Object

3D hand skeleton

and 3D object BB

at the corresponding cell

𝐩𝑖𝑎

Page 29: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Hand+Object

encode 3D poses,

estimation confidence,

object and activity classes

𝐩𝑖𝑎

Page 30: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Hand+Object

Santoro et al., “A simple neural network module for relational reasoning”, NIPS’17.

explicitly model

hand-object interactions,

directly in 3D

𝐩𝑖𝑎

𝑓𝜙(𝑔𝜃(ෝ𝒚ℎ, ෝ𝒚𝑜))

Page 31: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Ablation Studies

Jointly using hand and

object poses improves

activity recognition

Page 32: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Ablation Studies

Unified framework yields

better accuracy than

task-dedicated

individual networks

Page 33: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

State-of-the-Art Comparison

Hand pose estimationObject pose estimationActivity Recognition

FPHA dataset

Garcia-Hernando et al., “First-Person Hand Action Benchmark with RGB-D Videos and 3D Hand Pose Annotations”, CVPR’18.

Page 34: H+O: Unified Egocentric Recognition of 3D Hand-Object ... · Hand+Object 1. Unified framework for recognizing 3D hand and object poses and interactions, by simultaneously solving

Thanks!

H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions