Representing People in Virtual Environments
Transcript of Representing People in Virtual Environments
Representing PeopleRepresenting Peoplein Virtual Environments
Will Steptoe30th November 2010
What’s in this lecture?
• Part 1: Virtual CharactersHistory Agency Control in Immersive and Non– History, Agency, Control in Immersive and Non-Immersive systems, Copresence and measures, Fidelity Uncanny ValleyFidelity, Uncanny Valley.
• Part 2: 3D Studio Max Demo• Part 3: Technical Aspects of Virtual Characters
– Motion Capture, Skeletal Subspace Deformation, Forward Kinematics, Inverse Kinematics, Blend Shapes
Early Virtual Characters• “Mechanical Turk” chess-playing machine, 1770.• Instead of just a machine a human figure is presentedInstead of just a machine, a human figure is presented.• Makes experience more compelling, provides a focus
for visual attention relates to theory of social agencyfor visual attention, relates to theory of social agency.
Social Agency - General“I di id l i dl l l i l l d t ti• “Individuals mindlessly apply social rules and expectations to computers”, Nass and Moon.
• People generally require minimal encouragement to view• People generally require minimal encouragement to view computer systems and applications as social agents, reading far more understanding than is warranted fromreading far more understanding than is warranted from symbols and graphical displays.
ELIZA, Weizenbaum, 1996
S i l A EliSocial Agency - Eliza• First documented example is ELIZA: a computer program for the• First documented example is ELIZA: a computer program for the
study of natural language communication between man and machine (Weizenbaum, 1966).
• ELIZA used text-processing to rephrase input statements from users into questions. People often became emotionally engaged
h “ i ti ” ith ELIZA d k d t bwhen “communicating” with ELIZA, and some even asked to be left alone with the system. Often termed the “ELIZA effect”.
• Due to tendency for humans to unconsciously equateDue to tendency for humans to unconsciously equate programmed computer behaviour as analogous to conscious human behaviour despite conscious knowledge to the contrary.
• May be considered a precursor to many observations of immersion and presence reported in the VE literature.
A Vi t l Ch tAgency – Virtual Characters• In the specific context of software-based virtual humanoids• In the specific context of software-based virtual humanoids,
agency describes their method of control or interaction, with avatars and agents occupying either end of the agency-g py g g yspectrum.
• Agency is the extent to which a virtual human is perceived by to be a representation of an individual in the ‘real’ world.
• Avatar/agent hybrids are common.
Agency – An Issue of Control
• For agents the behaviour is completely programmed.
• For avatars the behaviour is ideally completely determined by the behaviour of the real tracked humanhuman.
• In practice the human cannot be fully tracked –typically in VR only head and one handtypically in VR only head and one hand movements are tracked!
Control MethodsControl Methods• Typed Text, Emoticons, Traditional GUI, Speech, yp , , , p ,
Full body tracking
Minimal Tracking for IK in VR
• Badler et al showed a minimal fi ti f IK ticonfiguration for IK representing
the movements of a human in VR– www cis upenn edu/– www.cis.upenn.edu/
~hollick/presence/presence.html
• It was shown that 4 sensors are sufficient to reasonably reconstruct the approximate body
fi ti i l ticonfiguration in real-time.
E b di t i C ll b ti Vi t lEmbodiment in Collaborative Virtual Environments (CVEs)( )• In shared VEs, users’ avatar embodiments act as the
fundamental mediators of the visual component of an pinteraction.
• Avatars function both to identify users and to communicate nonverbal behaviour including position, identification, focus of attention, gesture and action.A t ll hibit i h id f hi h• Avatars generally exhibit generic humanoid form, which reflects their status as a representation of a human user, and critically enables a natural mapping between a user’sand critically, enables a natural mapping between a user s bodily movement and the corresponding virtual behaviour.
• Avatars that exhibit humanoid form and behaviour haveAvatars that exhibit humanoid form and behaviour have been shown to evoke a richer sense of copresence in observers.
Immersive Collaborative VirtualImmersive Collaborative Virtual Environments (ICVEs / CVEs)
Controlling Avatars in Non-Immersive CVEs: Spark (Morley D and Myers K 2004)Spark (Morley, D. and Myers, K., 2004)
• Text Chat based environmentenvironment
• Parse users text input for interactionalinput for interactional information
• Use this information to generate behaviour
Controlling Avatars in Non-Immersive CVEs: SparkSpark
Problems with Controlling Avatars in Non-Immersive SystemsImmersive Systems
• Two modes of control: at any moment the user must ychoose between either selecting a gesture from a menu or typing in a piece of text for the character to say. This means the subtle connections and synchronisationsmeans the subtle connections and synchronisationsbetween speech and gestures are lost.
• Explicit control of behaviour: the user must consciously p ychoose which gesture to perform at a given moment. As much of our expressive behaviour is subconscious the user will simply not know what the appropriate behaviouruser will simply not know what the appropriate behaviourto perform at a give time is
[BodyChat, Vilhjalmsson, H. and Cassell, J., 1998]
Problems with Controlling Avatars in Non-Immersive SystemsImmersive Systems
• Emotional displays: current systems mostly concentrate on displays of emotion whereas Thórisson and Cassell (1998) have shown that envelope displays – subtle gestures and actions that regulate the flow of a dialog and establish mutual focus and attention – are more important inmutual focus and attention are more important in conversation.
• User tracking: direct tracking of a user’s face or body does not help as the user resides in a different space from that of the avatar and so features such as direction of gaze will not map over appropriatelynot map over appropriately.
[BodyChat, Vilhjalmsson, H. and Cassell, J., 1998]
Solutions for Controlling Avatars in Non-Immersive SystemsImmersive Systems
• Always ensure that any control is done through a single interface (e g through text chat)single interface (e.g. through text chat)
• BUT….• The body language of an avatar should be largely
autonomous, and indirectly controlled by users• Minimize the level of control needed
[BodyChat, Vilhjalmsson, H. and Cassell, J., 1998]
CCopresence
• Referred to as Copresence in the CVE literature, referred to as Social Presence in general telecommunications.g
• Theory of social presence in telecommunication systems is the degree of salience of another person taking part in the interaction, with a particular emphasis on how the transmission of nonverbal cues is supported by the medium (Short 1976)(Short 1976).
• In multi-user VEs, it is the sense of being in the company of another person during the course of mediated interactionanother person during the course of mediated interaction.
• The term is parallel to the established usage of ‘presence’ that entails the sense of being present in a VE.that entails the sense of being present in a VE.
Measuring Copresence
S f VE i d i t f th t t t• Success of a VE is measured in terms of the extent to which sensory data projected within a virtual environment replaces the sensory data from the physical world– quantified by rating the individuals’ sense of presence during the
experience• For virtual characters: success is taken as the extent to
which participants act and respond to the agents as if they were real– Subjective: Questionnaires, InterviewsSubjective: Questionnaires, Interviews– Objective: Physiological, Behavioural
Subjective means
• Traditional methods: Questionnaires and• Traditional methods: Questionnaires and interviews– Various questionnaires existVarious questionnaires exist– http://www.presence-research.org
• Criticised due to its various dependenciesCriticised due to its various dependencies– the individual’s accurate post-hoc recall, – processing and rationalisations of their experience inprocessing and rationalisations of their experience in
the VE and – Varying interpretations of the word ‘presence’
Objective: Responses to stimuli
• Numerous possible objective measures– Subconscious responsesp
• Threat-related facial cues provokes individuals to use different viewing strategies
– Neural responses• Different areas of the brain are activated during +ve ve and neutral situations• Different areas of the brain are activated during +ve, -ve and neutral situations
– Psychological responses• Stress and Anxiety in response to threat
– Physiological responsesPhysiological responses• Galvanic Skin Responses, Heart Rate Variability, Electrocardiograms,
Electromyography, Respiratory activity, Pupil dilation– Behavioural responses
Fli ht Fi ht (b d iti i l)• Flight or Fight (based on cognitive appraisal)
• Vary based on cognitive factors, personality, emotional state, gender etc.– How do we interpret the data and results?
Categories of behavioural cuesArgyle, M. (1998). Bodily Communication. Methuen & Co Ltd, second edition.
• Vocal properties– Tone, Pitch, Loudness…
• Facial expressions• Facial expressions– The most studied behavioural cue due
to it’s role in communication• Gaze behaviour• Gaze behaviour
– Probably the most intense social signallers
• Kinesics: Posture and MotionKinesics: Posture and Motion– Numerous gestures depending on
culture for instance• Proxemics
– Culture and gender dependent
R li ti S i l R i VERealistic Social Responses in VEs
• People’s response to virtual representations of humans is automatic, and leads to copresencep
• Despite others being represented by avatars, social norms of gender, proxemics, and gaze transfer into CVEs:– male-male dyads maintain greater interpersonal distance than
female-female dyads,male male dyads maintain less eye contact than female female– male-male dyads maintain less eye contact than female-female dyads,
– decreases in interpersonal distance are compensated with gaze avoidance, echoing Argyle et al.’s equilibrium theory specifying an inverse relationship between mutual gaze and interpersonal distance.
R li ti S i l R i VERealistic Social Responses in VEs• Proxemics (interpersonal space) Bailenson Blascovich Beall• Proxemics (interpersonal space), Bailenson, Blascovich, Beall,
Loomis, 2001.
Different Levels of Realism
• Visual RealismWh t it l k lik ( i t VE fil )– What it looks like (pictures, games, VE, film)
• Animation Realism– How it moves, animation (film, games, VE)
• Behavioural RealismBehavioural Realism– How it responds and interacts to stimuli (games/VE)
Appearance vs. Behaviour
Vinayagamoorthy, V., Garau, M., Steed, A., and Slater, M. (2004b). An eye gaze model for dyadic interaction in an immersive virtual environment: Practice and experience. Computer Graphics Forum, 23(1):1–11.
Appearance vs. Behaviour
• Sparse environment – abandoned buildingp g– Minimise visual distraction– One genderless cartoon form character– Two gender-matched higher fidelity characters
• Behaviour– Common limb animations and condition-dependent gaze
animations– Individuals listening in a conversation look at their
conversational partner for longer periods of time and more often than when they are talkingoften than when they are talking
• Negotiation task to avoid a scandal - 10 minutes
Appearance vs. Behaviour
• In each of the responses, the higher fidelity avatar had a higher response with the inferred-avatar had a higher response with the inferred-gaze model
• And a low response with the random-gaze modelAnd a low response with the random gaze model– Important to note that the differences between both the gaze models
were very subtle• Saccadic velocity and inter-saccadic intervals (means)Saccadic velocity and inter saccadic intervals (means)
• Analysis demonstrated a very strong interaction effect between the type of avatar and the fidelity yp yof the gaze model– The higher-fidelity avatar did not outperform the cartoon-form avatar
Similar hypothesis in the fields of robotics– Similar hypothesis in the fields of robotics
Mismatch in Realism
• Maybe the problem is that levels of movement and behavioural realism do not match graphical realism
• This mismatch disturbs us something that• This mismatch disturbs us, something that looks human but does not act like a humanC i t• Consistency
Uncanny Valley
• As the behaviour and representation of robots p(and other facsimiles) of humans approaches that of actual humans, it causes a response of , prevulsion among human observers.
• Theory from 70s by roboticist Masahiro MoriControversial its not very rigorous or scientific many– Controversial, its not very rigorous or scientific, many people don’t believe itThere are problems but it maybe captures something– There are problems but it maybe captures something
The Uncanny ValleyThe Uncanny Valley
The Uncanny Valley
• Dreamworks reduced realism of Princes Fiona (Shrek):of Princes Fiona (Shrek):– “she was beginning to look too
real and the effect was gettingreal, and the effect was getting distinctly unpleasant”
• Final Fantasy• Final Fantasy– “it begins to get grotesque. You
start to feel like you'restart to feel like you re puppeteering a corpse”
Uncanny Valley
At low levels of realism the more realistic a• At low levels of realism, the more realistic a character the more people like it (even this is dubious)dubious)
• But when you get almost real then characters start to get disturbing
• This is very strong, the uncanny means very disturbing, corpses are used a lot as metaphors
• Interestingly, there are 2 graphs, movement and g y, g p ,appearance, movement is more important
Realism vs Believability
The lesson is that e need to be caref l ith• The lesson is that we need to be careful with realism for virtual humans
• Often we prefer to use the term “Believability”– Not how much a character is objectively like a humanj y– How much we feel it is/respond to it as if it is– Bugs Bunny is very Belivableg y y
• Photorealism is only one element of believabilityBut don’t turn into an anti realism zealot!– But don t turn into an anti-realism zealot!
Highly realistic characters
• Highly realistic characters can cause moreHighly realistic characters can cause more perceptual problems than simple ones
• Perceptually-realistic characters existing in stills• Perceptually-realistic characters existing in stills, beginning to appear in movies, less so in games and VEsand VEs
• Not just a computing power issue, as minimal fidelity can have significant impact on responsefidelity can have significant impact on response.
• There are a lot of complex issues to deal with when you have more realistic characters
Designing virtual humans
• GOAL: Represent the Person in VE consistentlyWith perceived realism aspects of believability– With perceived realism, aspects of believability …
• Induce responses to the virtual humanI d i li ti /lif lik– Inducing realistic/lifelike responses
• Enhancing the collaborative experience• Facilitate social communication and interpersonal
relationshipsp
State of the ArtState-of-the-ArtReal-time Pre-rendered Robotics
Heavy RainQuantic Dream,2009
The Curious Case ofBenjamin ButtonParamount Pictures
Actroid-F,Kokoro Co. Ltd & ATR20102009 Paramount Pictures,
20082010
Part 2: 3DSMax Demo
Part 3: Technical Aspects of Virtual CharactersCharacters
• Motion Capture Skeletal Subspace• Motion Capture, Skeletal Subspace Deformation, Forward Kinematics, Inverse Kinematics Blend ShapesKinematics, Blend Shapes
Graphics
• Techniques: Meshes, texture mapping, standard graphics stuffgraphics stuff
• Hand modelling: can be cartoony or highly realistic• 3D Scanning/phototextures: can have very high
realism• Rendering Opacity: Subsurface scattering
Modelling
Scanned body results in huge mesh y gwhich can be rendered at different resolutions (numbers of polygons)( p yg )
Body Animation
• Often use motion capture: optical, inertial, markerless, mechanicalmechanical.
• Can also hand animate the skeleton (Pixar), or use both.• Real data = Realism (?) relates back to consistency• Real data = Realism (?), relates back to consistency
between visual and behavioural fidelity.
Motion Capture Post-processing
• Motion capture often gives a noisy incomplete set• Motion capture often gives a noisy, incomplete set of marker positions, so need to get rid of noise.Convert to joint angles (use simple analytic IK• Convert to joint angles (use simple analytic IK type methods).
f• Deal with problems of missing markers.• Mo-cap systems all come with standard software
to do this.
Applying it to a character
• The joint angles are typically saved to a text-based format like BVH or BIPbased format like BVH or BIP.
• This is a sequence of rotation keyframes for each jointjoint.
• This can be directly applied to characters using the techniques discussed.
Skeletal Animation
• The fundamental aspect of human body motion is the motion of the skeletonthe motion of the skeleton.
• The motion of rigid bones linked by rotational joints (first approximation)joints (first approximation).
Typical Skeleton
• Circles are rotational joints lines are rigid links (bones)
• The red circle is the root (position and rotation offset from the origin)from the origin)
• The character is animated by rotating joints and y g jmoving and rotating the root
Making it look good – “Skinning” / “Rigging”
A k l t i t f• A skeleton is a great way of animating a character but it d ’t il l kdoesn’t necessarily look very realistic when rendered.
• Need to add a graphical “skin” around the character.
Segmented CharactersTh i l i h i• The simplest way is to attach separate pieces of geometry to each joint
• Leads to body being broken up – may work for robots, but not human characters
Skeletal subspace deformation
• We want to represent a character as a single smooth mesh (a “skin”)( )
• This should deform smoothly based on the motion of the skeleton – similar to humans and zombiesof the skeleton similar to humans and zombies
Map skeleton to geometry
• Associate each vertex in a mesh with one or more jointsjoints
• The vertices are transformed individually by their associated jointsassociated joints
• Each vertex has a weight for each joint• The resulting position is a weighted sum of the
individual joint transforms
Representation
• Layered representation– Skeleton structure forms a
hscene graph– Scene graph embodies a
set of jointsj– A mesh overlays the scene
graphA th k l t l t t– As the skeletal structure moves the mesh must deform appropriately (otherwise there are holes)
MPEG4 examplehttp://ligwww.epfl.ch/~maurel/Thesis98.html
Multi-layered MethodsMulti layered Methods• The deformation of a human body doesn’t just
depend on the motion of the skeletondepend on the motion of the skeleton• The movement of muscle and fat also affect the
appearance• These soft tissues need different techniques from
rigid bones
Forward Kinematics (FK)Th iti f li k i l l t d b• The position of a link is calculated by concatenating rotations and offsets
R00
P2
O0O1 O
R1
O0 1 O2
Forward Kinematics (FK)
• First you choose a position on a link (the end point)First you choose a position on a link (the end point)• This position is rotated by the rotation of the joint
above the link• Translate by the length (offset) of the parent link and
then rotate by its joint. Go up it its parent and iterate y j p puntil you get to the root
• Rotate and translate by the root position
Forward Kinematics (FK)
• Simple and efficientCome for free in a scene graph architecture• Come for free in a scene graph architecture
• Difficult to animate with, – often we want to specify the positions of a characters
hands not the rotations of its joints• The Inverse Kinematics problem:
– Calculating the required rotations of joints needed to put a hand (or other body part) in a given position.
Inverse Kinematics
• An number of ways of doing itAn number of ways of doing it http://chrishecker.com/Inverse_Kinematics
• Matrix methods (hard)• Matrix methods (hard)• Cyclic Coordinate Descent (CCD)
A t i th d ( tl t i d th)– A geometric method (secretly matrices underneath)
R0
R1
Pt
O1 O2
Inverse Kinematics
• Start with the final link
Inverse Kinematics
• Rotate it towards the target
Inverse Kinematics
• Then go to the next link up
Inverse Kinematics
• Rotate it so that the end effector points towards the target
Inverse Kinematics
• And the next…
Inverse Kinematics
• And iterate until you reach the target
Inverse Kinematics
• IK is a very powerful toolHowever it’s computationally intensive• However, it’s computationally intensive
• IK is generally used in animation tools and for applying specific constraints
• FK is used for the majority of real time animation systems
Blend Shapes (AKA Morph Targets)
• Primarily used for facial i tianimation
• Don’t have a common underlying structure like aunderlying structure like a skeleton
• Blend between meshes of• Blend between meshes of vertices
• Animate by movingAnimate by moving individual vertices
Morph Targets
• Have a number of facial expressions, each represented by a separate meshrepresented by a separate mesh
• Each of these meshes must have the same number of vertices as the original mesh but withnumber of vertices as the original mesh but with different positions
• Build new facial expressions out of these base expressions (called Morph Targets)
Morph Targets
Morph Targets
• Smoothly blend between targetsGive each target a weight between 0 and 1• Give each target a weight between 0 and 1
• Do a weighted sum of the vertices in all the targets to get the output mesh
1; wvwv 1;th t
tt
titi wvwv etsmorph_targt
Using Morph Targets
• Morph targets are a good low level animation techniquetechnique
• Also need ways of choosing morph targets• Could let the animator choose (nothing wrong with
that)• But there are also more principled ways such as
emotional modelling (see FACS, Ekman 1976).
Summary
• VHs are represented typically as ‘skinned’ skeletal scene graphs, representing sets of joints.
• Forward kinematics determines overall configuration given joint angles and Inverse kinematics determinesgiven joint angles and Inverse kinematics determines joint angles from requirements for end-effectors
• Representations typically need to be a mixture based on tracking data and inferred state.
• Morph targets are a method of mesh deformation often used for facial animation