Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf ·...
Transcript of Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf ·...
![Page 1: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/1.jpg)
Vision Based Interaction
Matthew TurkComputer Science Department and
Media Arts and Technology ProgramMedia Arts and Technology ProgramUniversity of California, Santa Barbara
http://www.cs.ucsb.edu/~mturk
![Page 2: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/2.jpg)
Schedule
• Vision based interaction – background and motivation
• VBI-related projects in the Four Eyes Lab
• The Allosphere
• Late afternoon group project• Late afternoon group project
![Page 3: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/3.jpg)
CVPR4HB Mission Statement
A widely accepted prediction is that computing will move to the background, weaving itself into the fabric g gof our everyday living spaces and projecting the human user into the foreground. To realize this prediction,
i i ill d d lnext-generation computing will need to develop anticipatory user interfaces that are human-centered, built for humans and based on naturally occurringbuilt for humans, and based on naturally occurring multimodal human communication. Emerging interfaces will need to include the capacity to p yunderstand and emulate human communicative intentions as expressed through behavioral cues such as affective and social signals.
![Page 4: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/4.jpg)
My background
1982 BS, Virginia Tech1984 MS, Carnegie Mellon University, g y1984-87 Martin Marietta Aerospace1991 PhD, MIT Media Lab1992 Postdoc, LIFIA (Grenoble, France)1993-94 Teleos Research1994-2000 Microsoft Research2000-pres UC Santa Barbara
![Page 5: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/5.jpg)
R b ti d i iRobotics and vision
Face recognition
Vision-based interaction, multmodal interfaces
Computer vision, multimodal interfaces, di i l didigital media, …
![Page 6: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/6.jpg)
UCSB Four Eyes Lab
4 I’s: Imaging, Interaction, and Innovative InterfacesCo-directors: Matthew Turk and Tobias HöllererCo directors: Matthew Turk and Tobias Höllerer
Research in computer vision and human-computer interaction– Vision based and multimodal interfacesVision based and multimodal interfaces– Augmented reality and virtual environments– Mobile human-computer interaction
M lti d l bi t i– Multimodal biometrics– Novel 3D displays and interaction– Activity recognition and surveillance– ….
http://ilab cs ucsb eduhttp://ilab.cs.ucsb.edu
![Page 7: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/7.jpg)
The history of computing
Purposes:• Counting manipulating numbers
Form factors:• Mainframes• Counting, manipulating numbers
• Assessing taxes, determining projectiles• Creating tables of numbers• Simulation (predicting the weather, the
• Mainframes• Lab computers• Desktop• Handheld
economy, material processes)• Word processing and spreadsheets• Email
A di + id di l
• Cell phone• Immersive• Wearable
• Audio + video display• Mobile, multimedia communication Environments:
• Building• Laboratoryy• Desk• Coffee shop• Airport• Everywhere
![Page 8: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/8.jpg)
Computing has changed...
• Form, function, and context have all changed dramatically• The central data element of computing has evolved:
– Numbers– Text– ImageImage– Audio+video– 3D
Progress
– ...– All data underlying communication
Time• What has driven all this? Moore’s Law
B t th h b M ’ L f h t i t ti !But there has been no Moore’s Law progress for human-computer interaction!
![Page 9: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/9.jpg)
The curse of the delta
Progress HW
Δ Curse of the delta!
Time
SW
Computing Capacity
Human CapacityAnother view:There’s no Moore’s Law for people!
Δ
Time
![Page 10: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/10.jpg)
The result Video
![Page 11: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/11.jpg)
What to do?
• Maybe we need to rethink the way we interact with computerscomputers
• Question: What’s the ultimate user interface?) A ll d i d hi /ia) A well-designed machine/instrument
b) An assistant or butlerc) None! UIs are a necessary evild) All of the above
• UI Goals:– Transparency– Minimal cognitive load– Task-oriented, not technology-oriented, gy– Ease of learning, ease of use (adaptive)
![Page 12: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/12.jpg)
Evolution of user interfaces
When Implementation Paradigmp g
1950s Switches, punched cards None
1970s Command-line interface Typewriter
1980s Graphical UI (GUI) Desktop
2000s ??? ???2000s Perceptual UI (PUI) Natural interaction
![Page 13: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/13.jpg)
Perceptual Interfaces
Highly interactive, multimodal interfaces g y fmodeled after natural human-to-human interaction
• Goal: For people to be able to interact with computers in a similar fashion to how they interact with each other yand with the physical world
M l i l d li i jNot just passive Multiple modalities, not just mouse, keyboard, monitor
![Page 14: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/14.jpg)
Natural human interaction
hsight sound
touch
Sensing/perceptionSensing/perceptionCognitive skillsCognitive skills
Social skillsSocial skillsSocial conventionsSocial conventionsShared knowledgeShared knowledge
AdaptationAdaptation
taste (?) smell (?)
![Page 15: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/15.jpg)
Perceptual and multimodal interaction
learninguser modeling
vision graphics
learninguser modeling
speech haptics
Sensing/perceptionSensing/perceptionCognitive skillsCognitive skills
Social skillsSocial skillsSocial conventionsSocial conventionsShared knowledgeShared knowledge
AdaptationAdaptation
taste (?) smell (?)
![Page 16: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/16.jpg)
Early example “Put That There” (Bolt 1980)
![Page 17: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/17.jpg)
Video
![Page 18: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/18.jpg)
Other examples...
![Page 19: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/19.jpg)
Control vs. awareness/context
• Almost all current UI requires explicit (foreground) interaction
– Intentional control or communication w/ computer– Often high physical and cognitive engagement
• Very few examples of system awareness– Touching or releasing an input device
U l ti tt ti d l– User presence, location, attention, mood, arousal– Back channels of communication (e.g., nodding, “hmm”)
![Page 20: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/20.jpg)
How can achieve the goals of PUI?
• To develop powerful, adaptive, compelling multimodal interfaces that reach well beyond the GUI, researchers
d t d l d i t t i l t ineed to develop and integrate various relevant sensing, display, and interaction technologies, such as:
Speech recognitionSpeech synthesisNatural language processing
Haptic I/OAffective computingTangible interfacesg g p g
Vision (recognition and tracking)
G hi i ti
gSound recognitionSound generationUser modelingGraphics, animation,
visualizationUser modelingConversational interfaces
![Page 21: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/21.jpg)
A strawman PUI architecture
Events Event handlers
mouse Event streamOnMouseClick
keyboard
windowsystem
OnMouseClick
OnKeyboardDown
systemOnResizeWindowperceptual
OnPersonEnter OnPersonLeave
OnSmile OnWaving
![Page 22: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/22.jpg)
Strawman PUI
• Superset of the GUI • Adds perceptual events• Presents a common, unified approach to PUI-based
application development Pl tf th d t th d f d l• Platform opens the door to thousands of developers
![Page 23: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/23.jpg)
Some issues
• Is the event-based model appropriate?• What defines a perceptual event?• Is there a useful, reliable subset of perceptual events?• Non-deterministic events• Future progress (expanding the event set)• Input/output modalities? (vision, speech, haptic, taste,
smell?)smell?)• Allocation of resources• Multiple goal managementp g g• Training, calibration• Quality and control of sensors• Privacy
![Page 24: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/24.jpg)
Direct Manipulation objection
• Shneiderman (and others): HCI should be characterized by– direct manipulation– predictable interactions– giving responsibility to the users– giving users a sense of accomplishment
• Argument against intelligent, adaptive, agent-based, and anthropomorphic interfaces – and PUI
• ... But is it really either/or? Perhaps not.
![Page 25: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/25.jpg)
PUI/multimodal interface research status
• Young field• Growing interestg• Resonates with researchers with a wide range of interests
(not just HCI researchers, or vision researchers, or …)• Mixing up the “gene pool “• Many existing projects and research efforts• But … still asking basic questions• Still narrow participation (but growing)
![Page 26: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/26.jpg)
PUI, MLMI, ICMI
ICMI (1996, 1999, 2000, 2002-2010)PUI Workshop (1997 1998 2001)PUI Workshop (1997, 1998, 2001)MLMI (2004-2008)
htt // /i ihttp://www.acm.org/icmi
![Page 27: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/27.jpg)
Vision Based Interfaces (VBI)
• Visual cues are important in communication!• Useful visual cues
– Presence– Location– Identity (and age, sex, nationality, etc.)Identity (and age, sex, nationality, etc.)– Facial expression– Body language
Att ti ( di ti )– Attention (gaze direction)– Gestures for control and communication– Lip movement– Activity
VBI – using computer vision to perceive these cues
![Page 28: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/28.jpg)
Elements of VBI
Hand tracking
Head trackingGaze tracking
Hand trackingHand gesturesArm gestures Gaze tracking
Lip readingFace recognition
g
Face recognitionFacial expression
Body trackingy gActivity analysis
![Page 29: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/29.jpg)
Some VBI application areas
• Accessibility, hands-free computing• Entertainment and gaming • Interactive art• Social interfaces/agents• Teleconferencing• Improved speech recognition (speechreading)• User-aware applications• Intelligent environments• Biometrics• Movement analysis (medicine, sports)• Visualization environments
![Page 30: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/30.jpg)
What makes VBI difficult?
• User appearance– size, sex, race, hair, skin, make-up, fatigue, clothing color & fit,
f i l h i l ifacial hair, eyeglasses, aging….
• Environment– lighting, background, movement, camerag g g
• Multiple people and occlusion
• Intentionality of actions (ambiguity)Intentionality of actions (ambiguity)
• Speed and latency
• Calibration FOV camera control image quality• Calibration, FOV, camera control, image quality
![Page 31: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/31.jpg)
Some VBI examples
Myron Krueger1980s
![Page 32: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/32.jpg)
MIT Media Lab1990s
![Page 33: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/33.jpg)
HMM based ASL recognition Video
![Page 34: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/34.jpg)
The KidsRoom Video
![Page 35: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/35.jpg)
Interaction using hand tracking Video
![Page 36: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/36.jpg)
Gesture recognition Video
![Page 37: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/37.jpg)
Video
![Page 38: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/38.jpg)
Commercial systemsCommercial systems2000s
![Page 39: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/39.jpg)
Sony EyeToy Video
![Page 40: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/40.jpg)
Reactrix Video
![Page 41: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/41.jpg)
Microsoft Kinect (Project Natal)
• RGB camera, depth sensor, and microphone array in one package– Xbox add-on– RGB: 640x480, 30Hz– Depth: 320x240, 16-bit precision, 1.2-3.5m
• Capabilities– Full-body 3D motion capture and gesture recognition
T l 20 j i t (??)• Two people, 20 joints per person (??)• Track up to six people
– Face recognition– Voice recognition, acoustic source localization
![Page 42: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/42.jpg)
Video
![Page 43: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/43.jpg)
Where we are today
• Perceptual interfaces– Progress in component technologies (speech, vision, haptics, …)– Some multimodal integration– Growing area, but still a small part of HCI
• Vision based interfacesVision based interfaces– Solid progress towards robust real-time visual tracking, modeling,
and recognition of humans and their activitiesSome first generation commercial systems available– Some first generation commercial systems available
– Still too brittle
• Big challenges– Serious approaches to modeling user and context– Interaction among modalities (except AVSP)– Compelling applicationsCompelling applications
![Page 44: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/44.jpg)
Moore’s Law progress
Year 19750.001 CPU cycles/pixel of video streamy p
Year 200057 cycles/pixel
Year 20253.7M cycles/pixel
(64k d )(64k x speedup)
![Page 45: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/45.jpg)
Killer app?
• Is there a “killer app” for vision-based interaction?– An application that will economically drive and justify extensive
h d d l t i t ti t l iresearch and development in automatic gesture analysis– Fills a critical void or creates a need for a new technology
• Maybe not but there are however many practical uses• Maybe not, but there are, however, many practical uses– Many that combine modalities, not vision-only
• This is good!!• This is good!!– It gives us the opportunity to do the right thing
• The science of interaction– Fundamentally multimodal– Understanding people, not just computers– Involves CS, human factors, human perception, …., , p p ,
![Page 46: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/46.jpg)
Some relevant questions about gesture
• What is a gesture?– Blinking? Scratching your chin? Jumping up and down? Smiling?
Ski i ?Skipping?
• What is the purpose of gesture?– Communication? Getting rid of an itch? Expressing feelings?g p g g
• What does it mean to do gesture recognition?– Just classification? (“Gesture #32 just occurred”)
S ti i t t ti ? (“H i i db ”)– Semantic interpretation? (“He is waving goodbye”)
• What is the context of gesture?– A conversation? Signaling? General feedback? Control?g g– How does context affect the recognition process?
![Page 47: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/47.jpg)
Gestures
• A gesture is the act of expressing communicative intent via one or more modalities
• Hand and arm gesturesHand and arm gestures– Hand poses, signs, trajectories…
• Head and face gestures– Head nodding or shaking, gaze direction, winking, facial
expressions
• Body gestures: involvement of full body motionBody gestures: involvement of full body motion– One or more people
![Page 48: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/48.jpg)
Gestures (cont.)
• Aspects of a gesture which may be important to its meaning:– Spatial information: where it occurs– Trajectory information: the path it takes– Symbolic information: the sign it makes– Affective information: its emotional quality
• Some tools for gesture recognition• Some tools for gesture recognition– HMMs– State estimation via particle filtering– Finite state machines– Neural networks– Manifold embeddingManifold embedding– Appearance-based vs. (2D/3D) model-based
![Page 49: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/49.jpg)
A gesture taxonomy
Humanmovement
Gestures Unintentional movements
SemioticErgoticManipulate the environment
Communicate Epistemic Tactile discovery
SymbolsActsLinguistic roleInterpretation of
the movement
DeicticMimetic ModalizingReferential
Imitate Pointing Object/action Complement to speech
![Page 50: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/50.jpg)
Kendon’s gesture continuum
• Gesticulation– Spontaneous movements of the hands and arms that accompany
hspeech
• Language-like gestures– Gesticulation that is integrated into a spoken utterance, replacing a g p p g
particular spoken word or phrase
• PantomimesGestures that depict objects or actions with or without– Gestures that depict objects or actions, with or without accompanying speech
• Emblems– Familiar gestures such as “V for victory”, “thumbs up”, and
assorted rude gestures (these are often culturally specific)
• Sign languagesg g g– Well-defined linguistic systems, such as ASL
![Page 51: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/51.jpg)
McNeill’s gesture types
• Within the first category – spontaneous, speech-associated gesture – McNeill defined four gesture types:– Iconic – representational gestures depicting some
feature of the object, action or event being describedMetaphoric gestures that represent a common– Metaphoric – gestures that represent a common metaphor, rather than the object or event directly
– Beat – small, formless gestures, often associated with word emphasis
– Deictic – pointing gestures that refer to people, objects, or events in space or timeor events in space or time.
![Page 52: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/52.jpg)
Gesture and context
• Context underlies the relationship between gesture and meaning
• Except in limited special cases, we can’t understand gesture (derive meaning) apart from its context
• We need to understand both gesture production and gesture recognition together (not individually)
• That is, “gesture recognition” research by itself is, in the long run, a dead end
– It will lead to mostly impractical toy systems!
![Page 53: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/53.jpg)
So… the bottom line
• Gesture recognition is not just a technical problem in Computer Science
• A multidisciplinary approach is vital to truly “solve” gesture recognition – to understand it deeplygesture recognition to understand it deeply– “Thinkers” and “builders” need to work together
S ill h i l h i f i b h d h ifi• Still, there is low-hanging fruit to be had, where specific gesture-based technologies can be useful before all the Big Problems are solved– (Good…!)
![Page 54: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/54.jpg)
Guidelines for gestural interface design
• Inform the user. People use different kinds of gestures for many purposes, from spontaneous gesticulation associated with speech to structured sign languages. Similarly, gesture may play a number ofstructured sign languages. Similarly, gesture may play a number of different roles in a virtual environment. To make compelling use of gesture, the types of gestures allowed and what they effect must be clear to the user.
• Give the user feedback. Feedback is essential to let the user know when a gesture has been recognized. This could be inferred from the action taken by the system, when that action is obvious, or by more subtle visual or audible confirmation methods.
• Take advantage of the uniqueness of gesture. Gesture is not just a substitute for a mouse or keyboard.
• Understand the benefits and limits of the particular technology. For example, precise finger positions are better suited to data gloves than vision-based techniques. Tethers from gloves or body suits may constrain the user’s movement.
![Page 55: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/55.jpg)
Guidelines for gestural interface design (cont.)
• Do usability testing on the system. Don’t just rely on the designer’s intuition.
• Avoid temporal segmentation if feasible At least with the current• Avoid temporal segmentation if feasible. At least with the current state of the art, segmentation of gestures can be quite difficult.
• Don’t tire the user. Gesture is seldom the primary mode of communication When a user is forced to make frequent awkward orcommunication. When a user is forced to make frequent, awkward, or precise gestures, the user can become fatigued quickly. For example, holding one’s arm in the air to make repeated hand gestures becomes tiring very quickly.tiring very quickly.
• Don’t make the gestures to be recognized too similar. For ease of classification and to help the user.
• Don’t use gesture as a gimmick If something is better done with a• Don t use gesture as a gimmick. If something is better done with a mouse, keyboard, speech, or some other device or mode, use it –extraneous use of gesture should be avoided.
![Page 56: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/56.jpg)
Guidelines for gestural interface design (cont.)
• Don’t increase the user’s cognitive load. Having to remember the whats, wheres, and hows of a gestural interface can make it oppressive to the user. The system’s gestures should be as intuitive and simple asto the user. The system s gestures should be as intuitive and simple as possible. The learning curve for a gestural interface is more difficult than for a mouse and menu interface, since it requires recall rather than just recognition among a list.
• Don’t require precise motion. Especially when motioning in space with no tactile feedback, it is difficult to make highly accurate or repeatable gestures.
• Don’t create new, unnatural gestural languages. If it is necessary to devise a new gesture language, make it as intuitive as possible.
![Page 57: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/57.jpg)
P tt R iti /MLComputer Vision
Pattern Recognition/ML
H B h i
CommunicationHCIHuman Behavior
Analysis
A h l SociologyAnthropology
Speech and Language Analysis
Social and Perceptual Psychology
![Page 58: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/58.jpg)
Some VBI-related research at theUCSB Four Eyes Lab
![Page 59: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/59.jpg)
HandVu: Gestural interface for mobile systems
• Goal: To build highly robust CV methods that allow out-of-the-box use of hand gestures as an interface modality for
bil ti i tmobile computing environments
![Page 60: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/60.jpg)
System components
• Detection– Detect the presence of a hand in the expected configuration and
i itiimage position
• Tracking– Robustly track the hand, even when there are significant changes in y g g
posture, lighting, background, etc.
• Posture/gesture recognitionRecognize a small number of postures/gestures to indicate– Recognize a small number of postures/gestures to indicate commands or parameters
• Interface– Integrate the system into a useful user experience
![Page 61: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/61.jpg)
HandVu
failure
handd t ti
handt ki
postureiti
success success
detection tracking recognition
![Page 62: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/62.jpg)
Robust hand detection
• Detection using a modified i f h J Vi l fversion of the Jones-Viola face
detector, based on boosted learningg
• Performance:− Detection rate: 92%− False positive (fp) rate:
1.01x10-8
One false positive in 279 VGA sized image framesOne false positive in 279 VGA-sized image frames− With color verification: few false positives per hour of live video!
![Page 63: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/63.jpg)
Hand tracking
• “Flocks of Features”– Fast 2D tracking method for non-rigid and highly articulated
bj t h h dobjects such as hands– KLT features + foreground color model
![Page 64: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/64.jpg)
![Page 65: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/65.jpg)
Tracking Video
![Page 66: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/66.jpg)
HandVu application Video
![Page 67: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/67.jpg)
Gesture recognition
• Really view-dependent posture recognition– Recognizes six hand postures
sidepoint victory open Lpalm Lback grab
![Page 68: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/68.jpg)
Driving a user interface
![Page 69: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/69.jpg)
![Page 70: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/70.jpg)
An AR application
![Page 71: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/71.jpg)
HandVu softwareGoogle: “HandVu”
• A library for hand gesture recognition– A toolkit for out-of-the-box interface deployment
• Features:– User independentUser independent– Works with any camera– Handles cluttered background
Adj t t li hti h– Adjusts to lighting changes– Scalable with image quality and processing power– Fast: 5-150ms per 640x480 frame (on 3GHz)
• Source/binary available, built on OpenCV
![Page 72: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/72.jpg)
Multiview 3D hand pose estimation
• Appearance based approach to hand pose estimation– Based on ISOSOM (ISOMAP + SOM) nonlinear mapping
• A MAP framework is used to fuse view information and bypass 3D hand reconstruction
![Page 73: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/73.jpg)
The retrieval results of the MAP framework with two-view images
![Page 74: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/74.jpg)
Isometric self-organizing map (ISOSOM)
• A novel organized structure– Kohonen’s Self-organizing
MMap– Tenenbaum’s ISOMAP– To reduce information
redundancy and avoidredundancy and avoid exhaustive search by nonlinear clustering techniquestechniques
• Multi-flash camera for the depth edges
L b k d l tt– Less background clutters– Internal finger edges
![Page 75: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/75.jpg)
Experimental Results
Number IR SOM ISOSOM
Top 40 44.25% 62.39% 65.93%
Top 80 55.75% 72.12% 77.43%
Top 120 64.60% 78.76% 85.40%
Top 160 70.80% 80.09% 88.50%
Top 200 76.99% 81.86% 91.59%
Top 240 81 42% 85 84% 92 48%Top 240 81.42% 85.84% 92.48%
Top 280 82.30% 87.17% 94.69%
The correct retrieval rates
The performance comparisonsPose retrieval results
![Page 76: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/76.jpg)
HandyAR: Inspection of objects in AR
![Page 77: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/77.jpg)
HandyAR Video
![Page 78: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/78.jpg)
Surgeon-computer interface S. Grange, EPFL
Uses depth data (stereo camera) and video
![Page 79: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/79.jpg)
interaction zone(50x50x50 cm)
t i
tool tracker
1.5 to 3 m
( )stereoscopic
camera
30 cm
navigationGUI
2D camera
GU
![Page 80: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/80.jpg)
Video
![Page 81: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/81.jpg)
Video
![Page 82: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/82.jpg)
Video
![Page 83: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/83.jpg)
Transformed Social Interaction
Studying nonverbal communication by manipulating reality in collaborative virtual environments
![Page 84: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/84.jpg)
Manipulating appearance and behavior
• Visual nonverbal communication is an important aspect of human interaction
• Since behavior is decoupled from its rendering in CVEs, the opportunity arises for new interaction strategies based on manipulating the visual appearance and behavior of the p g ppavatars.
• For example:Ch id tit d th h i l– Change identity, gender, age, other physical appearance
– Selectively filter, amplify, delete, or transform nonverbal behaviors of the interactant
– Culturally sensitive gestures, edit yawns, redirect eye gaze, …– Could be rendered differently to every other interactant
![Page 85: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/85.jpg)
Transformed Social Interaction (TSI)
• TSI: Strategic filtering of communicative behaviors in order to change the nature of social interaction
![Page 86: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/86.jpg)
A TSI experiment: Non-zero-sum gaze
Presenter
Li
Reduced Natural Augmented
Listeners
• Is it possible to increase one’s power of persuasion by “augmented non-zero-sum (NZS) gaze”?
– Presenter gives each participant > 50% of attention
• Experiment: A presenter tries to persuade two listeners by reading passages of text Gaze direction is manipulatedreading passages of text. Gaze direction is manipulated.
![Page 87: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/87.jpg)
Non-zero-sum gaze
Presenter
Li
Reduced Natural Augmented
Listeners
• Three levels of gaze of the presenter:– Reduced: no eye contact– Natural: unaltered, natural eye contact– Augmented: 100% eye contact
NZSGconditions
![Page 88: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/88.jpg)
Initial results
3
(95%
CI)
2
1
Agre
emen
t (
0 Gaze Condition
Mea
n A
-1
-2
Reduced
Natural
FemaleMale-3 Augmented
GENDER
![Page 89: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/89.jpg)
TSI conclusions
• TSI is an effective paradigm for the study of human-human interaction
• TSI should inform the study and development of multimodal interfaces
• TSI may help overcome deficiencies of remote collaboration and potentially offer advantages over even face-to-face communicationface to face communication
• This is just one study, somewhat preliminary – others are in the works….
![Page 90: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/90.jpg)
PeopleSearch: Finding Suspects IBM Research
![Page 91: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/91.jpg)
PeopleSearch
• Video Security Cameras– Airports– Train Stations– Retail Stores– Etc.
• For– Eyewitness descriptions
Mi i l– Missing people– Tracking across cameras
• Large amounts of video data– How to effectively search through these archives?
![Page 92: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/92.jpg)
Suspect Description Form
![Page 93: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/93.jpg)
Problem definition
• Given a Suspect Description Form, build a system to automatically search for potential suspects that match the
ifi d h i l tt ib t i ill idspecified physical attributes in surveillance video
• Query Example: “Show me all bearded people entering• Query Example: Show me all bearded people entering IBM last month, wearing sunglasses, a red jacket and blue pants.”
![Page 94: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/94.jpg)
Face Recognitiong
• Long-term recognition (need to be robust to makeup, clothing, etc.)
• Return the identity of the personRecognition
• Return the identity of the person• Not reliable under pose and
lighting changes
Our Approach: People Search by Attributespp p y
• Short-term recognition (take advantage of makeup, clothing, etc.)
• Return a set of images that match the
Query: Show me all people with moustache and hat
• Return a set of images that match the search attributes
• Based on reliable object detection technologytechnology
![Page 95: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/95.jpg)
System overview
D t b
Video from camera Analytics Engine
Database Backend
Face Detection & Tracking
BackgroundSubtraction
Attribute Detectors
Search Interface
Result – thumbnailsResult thumbnails of clips matching the query
Suspect description form (query specification)
![Page 96: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/96.jpg)
Human body analysis
Face Detector Hair or Bald or Hat
Divide face into
three regions"No Glasses" or Eyeglasses or Sunglasses
"No Facial Hair" or Moustache or Beard
Shirt color
Pants colorPants color
![Page 97: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/97.jpg)
Bald Hair Hat
No Glasses Sunglasses Eyeglasses
Beard Moustache No Facial Hair
![Page 98: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/98.jpg)
Adaboost learning w/Haar features
Integral Image
D = ii(4) + ii(1) – ii(2) – ii(3)
= (A+B+C+D)+(A)–(A+B)–(A+C)
![Page 99: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/99.jpg)
Adaboost learning
• Adaboost creates a single strong classifier from many weak classifiers
• Initialize sample weights• For each cycle:
– Find a classifier that performs well on theFind a classifier that performs well on the weighted sample
– Increase weights of misclassified examples
• Return a weighted combination of• Return a weighted combination of classifiers
![Page 100: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/100.jpg)
Cascade of Adaboost classifiers
![Page 101: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/101.jpg)
Applying the detector
Search over all possible window positions and scales
l h l d d b l i i i h d h i lApply the learned Adaboost classifier using the cascade scheme of Viola & Jones for each window position/scale
![Page 102: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/102.jpg)
Multiple detector learning
Beard DetectorBeard DetectorMoustache Detector"No Facial Hair" DetectorSunglasses DetectorSunglasses DetectorEyeglasses Detector"No Glasses" DetectorBald DetectorBald DetectorHair DetectorHat Detector
![Page 103: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/103.jpg)
Results: Sunglasses Detector
![Page 104: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/104.jpg)
Results: Eyeglasses DetectorResults: Eyeglasses Detector
![Page 105: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/105.jpg)
Results: "No Glasses" DetectorResults: No Glasses Detector
![Page 106: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/106.jpg)
Results: Beard Detector
![Page 107: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/107.jpg)
Results: Moustache Detector
![Page 108: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/108.jpg)
Results: "No Facial Hair" Detector
![Page 109: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/109.jpg)
Results: Bald Detector
![Page 110: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/110.jpg)
Results: Hair Detector
![Page 111: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/111.jpg)
Results: Hat Detector
![Page 112: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/112.jpg)
Performance evaluation
![Page 113: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/113.jpg)
Examples of failure cases
(a) Lower Face Part
Shadow looks like beard
(b) Middle Face PartShadow looks like sunglasses
(c) Upper Face Part
Fringe confusedconfused with hat
![Page 114: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/114.jpg)
Multispectral/IR
Attribute detection in multispectral imagesp g
![Page 115: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/115.jpg)
Media Arts and Technology (MAT)
• Media Arts and Technology is an transdisciplinary graduate t UCSB f d d t i t itiprogram at UCSB, founded to pursue emerging opportunities
for education and research at the intersection of Art, Science, and Engineering.
Media Arts and TechnologyGraduate Program
![Page 116: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/116.jpg)
Devices for interactivityMedia Arts and TechnologyGraduate Program
![Page 117: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/117.jpg)
Interactive artMedia Arts and TechnologyGraduate Program
Sensing/Speaking Space @ SFMOMA
![Page 118: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/118.jpg)
Algorithmic artMedia Arts and Technology
Graduate Program g
Blink @ SBMA
![Page 119: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/119.jpg)
Tracking and recognitionMedia Arts and Technology
Graduate Program g g
![Page 120: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/120.jpg)
Augmented environmentsMedia Arts and Technology
Graduate Program g
![Page 121: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/121.jpg)
Interactive displaysMedia Arts and Technology
Graduate Program p y
![Page 122: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/122.jpg)
Sound synthesisMedia Arts and Technology
Graduate Program y
![Page 123: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/123.jpg)
Scientific visualizationand auralization
Media Arts and TechnologyGraduate Program
and auralization
![Page 124: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/124.jpg)
––http://www.mat.ucsb.edu/allospherehttp://www.mat.ucsb.edu/allosphere
––The AllosphereThe Allosphere
![Page 125: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/125.jpg)
![Page 126: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/126.jpg)
![Page 127: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/127.jpg)
![Page 128: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/128.jpg)
What is the Allosphere?
• A three-story anechoic space containing a built-in spherical 10 i di t d lk th h th tscreen, 10m in diameter, and a walkway through the center
• A large-scale immersive surround-view instrument • A digital media center in the California Nanosystems InstituteA digital media center in the California Nanosystems Institute• A cross-disciplinary community around the UCSB Media Arts
and Technology Program
• An advanced instrument for scientific researchTh i l ti l ti d l i f l l d t t– The manipulation, exploration and analysis of large-scale data sets
• ... and for artistic exploration
![Page 129: Matthew Turk - Northwestern Universityvlpr2010.eecs.northwestern.edu/handout/mturk slides.pdf · 2010. 7. 22. · • Desktop • Handheld economy, material processes) • Word processing](https://reader033.fdocuments.us/reader033/viewer/2022060902/609ef0b6bb05f159d82d5110/html5/thumbnails/129.jpg)
Acknowledgements
• Tobias Höllerer, Mathias Kolsch, Rogerio Feris, YaChang, Haiying Guan, Changbo Hu, Longbin Chen, S b ti G Ch l B T h L ISebastien Grange, Charles Baur, Taehee Lee, IsmoRakkalainen, Ramesh Raskar, Andy Beall, Jim Blascovich, Jeremy Bailenson, Daniel Vaquero, JoAnn Kuchera-Morin, Allosphere group
• MERL, IBM, NokiaNSF• NSF
Computer Science Department andMedia Arts and Technology Program
University of California, Santa Barbara
http://www cs ucsb edu/~mturkhttp://www.cs.ucsb.edu/~mturk