FACTS - A Computer Vision System for 3D Recovery and ...€¦ · galvanic skin response temperature...

FACTS - A Computer Vision System

for 3D Recovery and Semantic

Mapping of Human Factors

Lucas Paletta, Katrin Santner, Gerald Fritz, Albert Hofmann,

Gerald Lodron, Georg Thallinger, Heinz Mayer

Human Attention

& Environment

Selectively attending to

one aspect of the environment

Study of joint attention for

communication on objects

Human factors in the context of

environments

Study of attention, workload, memory,

stress, emotion and decision making

Study of wayfinding systems, marketing

concepts, usability of user interfaces and

products

2

http://www.google.at/url?sa=i&rct=j&q=three-dimensional+computer+simulations+of+product+designs+and+store+layouts+with+eye-tracking+technology.&source=images&cd=&cad=rja&docid=Xq1D7pGlq1wmuM&tbnid=qSK99pccrC8u5M:&ved=0CAUQjRw&url=http://theyellowbulb.wordpress.com/&ei=RT6dUefTLoPRtAa7k4HoBg&bvm=bv.46751780,d.bGE&psig=AFQjCNGbnr4_cadHWc0qsAKK3s1xX6UuHw&ust=1369345926402512

Wearable „Eye Tracking Glasses“

HD camera

3

Eye Tracking Glasses (SMI ETG) wearable, 30 Hz binocular

Arousal (Affectiva Q)

Computational audition

Biosensor pulse sensor

acceleration

galvanic skin response

temperature

limb motion 6DOF

Eye Tracker static, 500 Hz binocular, SMI RED 500

Suite of (Wearable) Sensors

4

Human Factors Analysis, User Modeling, and Simulation

5

Wearable

Multimodal Sensing

User Interaction &

Human Factors

Analysis

User Model

Attention Model

Simulation in

3D Model

Statistical

Analysis

3D model

Motivation 3D Gaze Estimation

Understanding behavior in task specific ambiente

Localise Real Human Gaze in the 3D environment –

Saliency map on attended infrastructure

6

© Vrvis, JR, AIT, 2006

Previous Work on 3D Attention Mapping

Munn et al. [ETRA, 2008]

Introduced monocular eye-tracking and triangulation of 2D gaze

positions of subsequent key frames within the scene video of the eye-

tracking system.

Reconstructed only single 3D points without the reference to a

complete 3D model achieving angular error of ≈3.8 (our: ≈0.6 °)

Voßkühler et al. [ECEM 2009], Pirri et al. [CVPR 2011]

Requires special, not mass marketed stereo rig that is required in

addition to a commercial eye-tracking device.

The achieved accuracy indoor is ≈3.6 cm at 2 m distance to the target

(our: ≈0.9 cm) at the same distance of our proposed workflow.

No reference to 3D model

7

Workflow: Recovery of 3D Gaze & Semantics

8

3D Model Generation: RGB-D based Map Building

9

Pose trajectory

on ground

plane

pointcloud

Depth assocation

by means of

stereo calibration

3D Model Generation: Methodology

Fully automated 3D model generation

Grabbing RGB-D images of environment with Kinect

Performing depth based visual SLAM using both image

and depth information [*] Reconstruction of sparse point cloud consisting of 3D feature points

Each feature point is attached to a SIFT descriptor for robust data

association during pose estimation

Pose estimation using sliding window bundle adjustment while

minimizing reprojection error and depth discrepancy using 2D-3D

correspondences

10

[*] K. Pirker Katrin, G. Schweighofer, M. Rüther, H. Bischof. GPSlam: Marrying Sparse Geometric and

Dense Probabilistic Visual Mapping, Proc. 22nd British Machine Vision Conference (BMVC), 2011.

Loop closure detection through vocabulary tree search

11

query frame potential loop closing candidates

returned by the vocabulary tree

Returns a probability for each image in the map/tree

Geometr. consistency check delivers candidate frame

Low memory and fast computation time

3D Model Generation: Loop Closing

3D Model Generation: Dense Model

For human attention analysis and realistic surface

reconstruction, a dense environment model is

constructed afterwards

Using probabilistic occupancy grid mapping Every depth image is inserted into the voxel space

Using pyramidal approach presented in [*]

Real-time performance using GPU implementation

Surface reconstruction is handled by standard

marching cubes algorithm [**]

[*] K. Pirker, G.Schweighofer, M. Rüther, H. Bischof: Fast and Accurate Environment Modeling using Three-Dimensional

Occupancy Grids, Proc. 1st IEEE/ICCV Workshop on Consumer Depth Cameras for Computer Vision, 2011.

[**] W. E. Lorensen, H. E. Cline: Marching Cubes: A high resolution 3D Surface Construction Algorithm, in Computer

Graphics, vol. 21, 1987, pp. 163-169.

12

Result: 3D Model

13

Image based Pose Estimation: Matching Process

14

Results in pose for every ETG frame

point cloud matching

matching

Image based Pose Estimation [**]

Estimate the user‘s pose within previously

reconstructed area

Sparse three-dimensional point cloud and its SIFT

keypoints build the matching model

ETG 2D image descriptors are matched against

those in the 3D point cloud (global/local)

Pose estimation through perspective n-point

algorithm [*]

RANSAC is used to eliminate matching outlier

[*] Lepetit V., Moreno-Noguer F. and Fua P.: EPnP: An Accurate O(n) Solution to the PnP Problem, International Journal

of Computer Vision, pp. 155-166, 2009.

[**] Santner, K., Paletta, L., Fritz, G., Mayer, H., Visual Recovery of Saliency Maps from Human Attention in 3D

Environments, Proc. ICRA 2013.

15

16

200 out of 2200 poses could not be

estimated (~90% coverage)

! less image feature points (textureless area)

! rapid head movements (motion blur)

point cloud

?

?

Image based Pose Estimation: Issues

Given the estimated

camera pose

intersection of viewing

ray with the dense

environment model

fast interference

detection using object

oriented bounding box

tree [*]

[*] Gottschalk S. & Lin M. C. & Manocha D.; OBB-Tree: A Hierarchical Structure for Rapid Interference Detection, Proc.

23rd Annual Conference on Computer Graphics and Interactive Techniques, 1996.

6 DOF Reconstruction of Human Gaze

17

Reconstruction of Human Gaze

18

19

Reconstruction of Human Gaze

Precision of Gaze Mapping 20

Angular Error

max. ≈0,6 º

Euclidean Error

max. ≈1,1 cm

Continuous Estimation of 3D Attention

21

Large 3D Model

22

23

Mapping of Gaze and Arousal in Large Environments

23

„3D attention shop“

Attention Guided Behaviors: “Exploration” and “Visual Search”

24

Region (=objects) of interest (ROI) detection

Annotation in 2D → Annotation in 3D

25 ROIs for Visual Search

Towards Cognition from Attention Mapping

Dwell time indicates that gaze / points of regard (PORs) are

in series within ROI

Dwell times on ROI indicate conscious processing of object

information (e.g., ROI #1)

26

region of interest

(ROI)

http://www.google.at/url?sa=i&rct=j&q=human+focus+of+attention&source=images&cd=&cad=rja&docid=Gga7woe8tTaSTM&tbnid=HdGqvw2XHYWmfM:&ved=0CAUQjRw&url=http://www.tobii.com/en/eye-tracking-research/global/research/linguistics/&ei=NYWcUYX9N4StPOqEgMgM&bvm=bv.46751780,d.ZWU&psig=AFQjCNFlNP_cGc12QQxJFlM4DcFYLpYDtg&ust=1369298596486539

27

related work Context of the FACTS System

Eye-tracking videos

Computer vision /multisensor analysis applied

Driver analysis:

Driver distraction analysis

Usability engineering:

Mobile user behavior analysis

User modeling:

Eye contact behavior analysis

related work: Driver Distraction Analysis

Driver with Eye Tracking

Glasses

Gaze tracked with optical

flow analysis

Projection onto reference

images

Collective saliency map onto

environment

Time analysis

28

Localisation of smartphone in eye-tracking videos

Attention on display vs. environment

Marker free tracking of the smartphone

Saliency mapping on display image capture, rectified

Behavior analysis

related work: Mobile User Behavior Analysis

29

Smartphone eye-tracking Smartphone saliency mapping

related work: Eye Contact - Behavior Analysis

30

subject A B C D mean

UAR 70 % 67 % 65 % 68 % 67.4 %

± .02

AUC 77 % 71 % 68 % 78 % 73.2 %

± .05

Eyben, Schuller,

Paletta, et al.,

submitted to

IEEE Pervasive

Computing 2013

unweighted

average recall

area under

the ROC

System Components 31

Summary & Conclusions

Summary

Recovery of 3D gaze:

Automated reconstruction of a 3D model

Automated mapping of gaze into a 3D model

Full recovery of semantic analysis (in the frame of ROIs)

System approach – various applications

Future work

Multisensor positioning (accelerometer, vision)

Computational attention model using 3D information

32

JOANNEUM RESEARCH

Forschungsgesellschaft mbH

Institute for Information and

Communication Technologies

www.joanneum.at/digital

Dr. Lucas Paletta

+43 664 602 876 1769

[email protected]

Thank you for your attention

FACTS - A Computer Vision System for 3D Recovery and ...€¦ · galvanic skin response temperature...

Documents

Transcript of FACTS - A Computer Vision System for 3D Recovery and ...€¦ · galvanic skin response temperature...