Avatar interoperability and control in virtual Worlds

Contents lists available at SciVerse ScienceDirect

Signal Processing: Image Communication

Signal Processing: Image Communication 28 (2013) 168–180

0923-59

http://d

n Corr

E-m

journal homepage: www.elsevier.com/locate/image

Avatar interoperability and control in virtual Worlds

Marius Preda n, Blagica Jovanova

ARTEMIS, Institut TELECOM/TELECOM SudParis, 9 rue Charles Fourier, 91011 Evry CEDEX, MAP5, CNRS UMR 8145, University Paris Descartes, 45,

rue des Saints P�eres, 75270 Paris CEDEX 06, France

a r t i c l e i n f o

Available online 7 November 2012

Keywords:

Avatar

Virtual character

Interoperability

Standards

Personalization

65/$ - see front matter & 2012 Elsevier B.V

x.doi.org/10.1016/j.image.2012.10.012

esponding author.

ail address: [email protected] (M

a b s t r a c t

Virtual worlds (VWs), and especially the ones using 3D graphics content, have been

massively developed and deployed over the last few years. Technological developments,

both in hardware and software, bring up significant improvements in the areas of

display, graphics, animation, distributed systems, network and mobile phones technol-

ogy. These improvements allow almost every user on Web 2.0 to have a representation in

different VWs. In those VWs, avatars have a specific place, representing a graphical

appearance of human beings and a container for the users’ specific data. Therefore, the

need to reuse them across different VWs and across different platforms increases

significantly.

In this paper we are presenting a framework that allows interoperability of avatars

between VWs. The proposed method consists of considering the avatar as a combination

of a template and a set of characteristics that personalizes the template. The former,

which is specific to each VW, can remain proprietary; the latter will be exchanged

between the VWs. Therefore, the proposed solution is equilibrating the interests of end-

users and VWs providers, allowing to not recreate an avatar for each VW and ensuring

that user-defined avatars are consistent with the VW, technically- and businesswise.

& 2012 Elsevier B.V. All rights reserved.

1. Virtual worlds and avatars

A recent trend in online graphics applications is thedevelopment of Virtual worlds (VWs). Compared to otheronline applications (especially games), they are moreopen with respect to the user-generated content. WVswere initially conceived for social purposes as a supportfor textual and voice communication and offered aware-ness about the interlocutors’ presence, availability andmood. VWs are now reaching a milestone: the technologyfor representing and visualizing 3D content becomesavailable and widely accessible. Therefore, VWs become3D-VWs and include new paradigms for content repre-sentation and for interaction in a more natural manner.One facilitator of the 3D migration is the fast development

. All rights reserved.

. Preda).

of high performing 3D graphics cards and their availabil-ities in ordinary computers – trend driven by the power-ful market of computer games – making almost anyinternet user a potential player of 3D VW.

After a significant step toward the awareness and demo-cratization of VWs provided mainly by the success of SecondLife, VWs are now looking for sustainable business models.The most probabilistic situation is that, in the near future,several VWs will be available, offering complementary func-tionalities and user experiences. The issue of interoperabilitybetween them or at least re-usability of assets, avatars andmedia content will become more and more important.Standards for representation of graphics and media assetsare nowadays available, MPEG being a very active commu-nity in providing tools, mainly the ones published as part ofMPEG-4, for compressed representation of any type of media:image/video, sound, graphics, text, etc.

MPEG-4 can be used as a base layer for ensuringinteroperability at the level of data representation.

www.elsevier.com/locate/image

www.elsevier.com/locate/image

dx.doi.org/10.1016/j.image.2012.10.012

dx.doi.org/10.1016/j.image.2012.10.012

dx.doi.org/10.1016/j.image.2012.10.012

mailto:[email protected]

dx.doi.org/10.1016/j.image.2012.10.012

M. Preda, B. Jovanova / Signal Processing: Image Communication 28 (2013) 168–180 169

A VW compliant with MPEG-4 would implement impor-ters and exporters able to convert between the com-pressed form provided by MPEG-4 and the VW’s internalstructure. While exporting content from VW may beacceptable (businesswise) and possible (technically),importing graphical assets into a VW is challenging. Ingeneral, the creators of VWs keep a strict control of thecontent presented in the world. On one hand, this contentis a valuable asset and an entire economic model is basedon it. On the other, the content should answer to specificrequirements in terms of resolution (mesh, texture, andanimation) to ensure that it can be played on ordinaryPCs. Therefore, the only possibility to ‘‘create’’ content forVW users is to do it inside the VW by personalizingtemplates or to build it under constraints.

Based on this analysis, the MPEG committee identifiedadditional needs and started a new work item, calledMPEG-V — Information Exchange for Virtual Worlds. Onegoal of MPEG-V is to standardize metadata that, combinedwith media representation, will ensure a completelyinteroperable framework. Part 4 of this standard dealsexclusively with the interoperability of avatars and virtualgoods in VWs and this paper presents the work performedby the authors as contributions to the standard.

A specificity of the 3D VW with respect to othermultimedia applications consists in the (visual) represen-tation of the user inside the environment. This represen-tation usually takes the form of an avatar, a graphic objectthat serves different purposes, such as making the pre-sence of a real user in the VW visible, characterizing theuser within the VW, or supporting the user interactionwith the VW. Via the avatar, the user can directlyparticipate he can communicate with other avatars; heis able to speak and ask for help from 3D agents, buy fromvirtual sellers, learn from virtual avatars, and performother actions as similar as possible to reality and with thebig advantage of removing the limitations due to physicalspace. The uses of the avatars in real-time communica-tions, virtual worlds and games are well known. Amongthem, Second Life, Active Worlds, There, Sims, HiPiPi, IBMQWAQ (mainly dedicated to project management anddocument sharing) are very popular. Some illustrationsof current 3D-VWs are provided in Fig. 1.

Moreover, several applications use virtual charactersas web-assistants, thus facilitating web navigation, or forteaching dedicated skills. The avatar becomes an agent,empowered with some intelligence. Examples includeagents trained for teaching the Braille alphabet for blind

Fig. 1. Snapshots from two 3D VW: IMVU and Second Life.

people, or sign languages for deaf people. Yet another kindof application using avatars, with impact on real economyor communication, is the virtual meeting where, based onavatar representation, the efficiency of distance-communi-cation can be increased.

Nowadays, it has become almost a practice (drivenby the user’s desires and needs) that applications firstlydesigned for PC can also run on the mobile phone,adapted as well as possible. For example, many webpages, chat applications, and social networks (likeMySpace and Facebook) can be reached by mobile. Read-ing e-mail or streaming video and audio is possible, on topof making voice calls. Challenging applications for mobiledevices become the ones that are using 3D graphics andreal-time communication, such as 3D-VWs for instance.Some mobile applications that are using 3D graphicsapplications are, for example, Cellufun [1] (illustrated inFig. 2a), Itsmy [2] and Sparkle [3] (illustrated in Fig. 2b).

Difficulties, time consumption and exceptionalities toreproduce the content bring up the more and more impor-tant issue of interoperability or at least re-usability of assets,in particular of avatars, between VWs. In the last twodecades, the research community has actively worked ondeveloping tools for the creation, representation, animation,transmission and display of avatars and several standardsand recommendations have been created to representgraphics objects and, in particular, avatars. Standards suchas X3D and COLLADA ensure the representation of graphicsassets, and MPEG is a very active community in providingtools for compressing them.

Let us note that current VWs are populated with othertypes of objects than avatars, which are not directlydriven by end-users. The modeling and animation of someof them (especially animals) can be very similar to thetechniques used for human avatars. However, behind thistype of objects there is no end-user, only computerprograms that drive them. In the perspective of thecurrent research, an avatar can be whatever 3D objectthat represents and is driven by the end-user.

The paper is structured as follows. Section 2 provides anoverview of the existing standards and formalisms relatedto 3D avatars. Section 3 presents the main contribution ofthis work: a set of descriptors, representing the avatarmetadata model, and Section 4 introduces several use-cases demonstrating different levels of interoperability that

Fig. 2. Avatars on mobile phone: Cellufun (a) and Sparkle (b).

M. Preda, B. Jovanova / Signal Processing: Image Communication 28 (2013) 168–180170

can be achieved between VWs. The last section concludesthe paper.

2. Standards and open formalisms for avatars

The avatars represent an interesting topic for differentstandardization groups mainly due to the big potential ofapplications involving them. There are currently two types ofstandards for avatars: the ones interested in the appearanceand the animation of the avatar in the 3D graphics applica-tions (the avatars as representation objects), and the onesinterested in the avatars’ characteristics, such as personality,emotions, etc. (the avatars as agents). In addition, there areseveral proprietary formats, imposed as de facto standards bythe authoring tools or VW providers. In this section, webriefly introduce some of the standards from each categoryand provide the main motivation behind the need for a newavatar metadata model.

2.1. Avatars as representation objects

In the last decade, several efforts have been made todevelop a unique data format for 3D graphics. In thecategory of open standards, X3D [4], based on VirtualReality Modeling Language (VRML) [5] and COLLAborativeDesign Activity (COLLADA) [6] are the best known, thelatter being the most commonly adopted by current tools.While COLLADA concentrates on representing 3D objectsor scenes, X3D pushes the standardization further byaddressing user interaction and application behavior aswell. This is performed thanks to an event model in whichscripts, possibly external to the file containing the 3Dscene, may be used to control the behavior of its objects.Also in the category of open standards, but specificallytreating the compression of media objects, there is MPEG-4.

VRML was designed with a view to allowing thedistribution of 3D objects over Internet, as HTML does itfor text. Its first version was specified in 1994. In 1997, anew version of the format was finalized, known asVRML97. VRML was superseded in 2005 by X3D, whichalso specifies a version for fast transfer through networkscalled X3Db. VRML/X3D allows creating files by using anadditional mechanism that uses hyperlinks to refer to anexternal file, reutilizing data by a referencing mechanism(DEF, USE), reutilizing complex content by macro-definitions (PROTO, EXTERNPROTO). It also address userinteraction, performed thanks to an event model in whichscripts, possibly external to the file containing the 3Dscene, may be used to control the behavior of its objects.

The avatars in VRML/X3D are defined as specific objectsbeing standardized under the name of H-Anim [7] providinga method of representing humanoids in a network-enabled3D graphics and multimedia environment. H-Anim identifiesa standardized humanoid skeletal system for charactersstructured for animation. Additionally, it specifies the seman-tics of avatar animation as an abstract functional behavior oftime-based, interactive 3D, multimedia articulated charac-ters. It does not define physical shapes for such characters,but does specify how such characters can be structured foranimation.

COLLADA aims to establish an interchange file format forinteractive 3D applications focusing on representing 3Dobjects and scenes. The COLLADA Schema supports all thefeatures that modern 3D interactive applications need. COL-LADA is based on XML, making it an easily-readable textualformat, allowing users to extend it for their own specificpurposes. Defined as an open file format, COLLADA is alreadysupported by numerous tools, such as Maya, 3DS Max,Softimage XSI and Blender. Game engines, such as the Unrealengine, and Google Earth, have also implemented this format.However, being highly extensive and complete makes itinappropriate for devices with constraints, such as mobilephones, mainly due to high memory requirements. Further-more, not providing a binary representation limits its usageonly as interchange file format between content creationtools, and less as a delivery format.

In COLLADA there is no distinction between a humanavatar and a generic skinned model, therefore there is noseparate scheme for modeling three-dimensional humanfigures. The general approach adopted is based on defining3D objects once, and using object-containers and instancingthem when used. However, all necessary elements to definean avatar, such as support for object-graph, geometry,appearance and animation, exist.

Built on top of VRML, MPEG-4 contained, already in itsfirst two versions [8], tools for the compression and stream-ing of 3D graphics assets, enabling to describe compactly thegeometry and appearance of generic, but static objects, andalso the animation of human-like characters. Since then,MPEG has kept working on improving its 3D graphicscompression toolset and published two editions of MPEG-4Part 16, Animation Framework eXtension (AFX) [9] whichaddresses the requirements above within a unified andgeneric framework and provides many more tools to com-press more efficiently more generic-textured, animated 3Dobjects. In particular, AFX contains several technologies forthe efficient streaming of compressed multi-textured poly-gonal 3D meshes that can be easily and flexibly animatedthanks to the Bone-Based Animation (BBA) toolset, making itpossible to represent and animate all kinds of avatars.

Table 1 summarizes the similarities and differencesbetween the previously described technologies for avatarrepresentation. Let us observe that MPEG-4 is the mostcomplete standard with respect to avatar representationand compression.

2.2. Avatars as agents

While offering a full set of features allowing to displaythe avatars, none of the above-mentioned standardsinclude semantic data related to the avatar. Several otherrecommendations, standards or markup languages areproviding semantics on top of virtual characters, mainlyto describe features that do not necessarily have a visualrepresentation (such as personality or emotions) or toexpose properties that may be used by an agent (languageskills, communication modality, etc.).

The Human Markup Language (HumanML) [10] byOasis Web Services is an attempt to codify the characte-ristics that define human physical description, emotion,action, and culture through the mechanisms of XML, RDF

Table 1Summary on several features of VRML/X3D, COLLADA and MPEG-4 standards.

VRML/X3D COLLADA XMT/BIFS/MPEG-4

Object graph Nodes for mesh, appearance,

animation; dedicated avatar

graph in H-Anim

Support for avatar as a regular 3D

object with mesh, appearance,

animation, skeleton, physics

Nodes for mesh, appearance, animation, morph and

dedicated nodes for avatars such as skeleton and

muscle

Graph compression Zip Zip BIFS

Geometry Mesh, NURBS Mesh, NURBS Mesh, NURBS

Geometry Compression Zip Zip BIFS, WSS, 3DMC, SC3DMC

appearance Color, texture Color, texture Color, texture

Appearancecompression

Image compression Image compression Image compression

Animation Key frame animation by

linear interpolators

Key frame animation, interpolators Animator node in BIFS, BIFS-Anim; BIFS Commands;

interpolators; face and body animation; skeletal and

morph animation;

AnimationCompression

Zip Zip BIFS-Anim, coordinate, Orientation, position

interpolators compression, face and body animation,

bone-base animation, frame-based animation

compression

Animation streaming No No Yes


and other appropriate schemas. HumanML is intended toprovide a basic framework for a number of endeavors,including (but, as with human existence itself, hardly limitedto) the creation of standardized profiling systems for variousapplications. It builds a framework for describing the emo-tional state and response of both people and avatars, layingthe foundation for the interpretation of gestures for bothperson-to-person and person-to-computer interpretations,the encoding of gestures and expressions to facilitate a betterunderstanding of modes of communication.

EmotionML (EML) by W3C covers three classes ofapplications: manual annotation of material involvingemotionality, such as annotation of videos, of speechrecordings, of faces, of texts, etc.; automatic recognitionof emotions from sensors, including physiological sensors,speech recordings, facial expressions, etc., as well as frommulti-modal combinations of sensors; generation ofemotion-related system responses, which may involvereasoning about the emotional implications of events,emotional prosody in synthetic speech, facial expressionsand gestures of embodied agents or robots, the choice ofmusic and colors of lighting in a room, etc.

Behavior Markup Language (BML) [11] is an XML-basedlanguage that can be embedded in a larger XML message ordocument simply by starting a /bmlS block and filling itwith behaviors that should be realized by an animatedagent. The possible behavior elements include coordinationof speech, gesture, gaze, head, body, torso face, legs, lipsmovement, and a wait behavior.

Multimodal Presentation Mark up Language (MPML)[12] is a script language that facilitates the creation anddistribution of multimodal contents with character pre-senter. It also supports media synchronization with thecharacter agents’ actions and voice commands that con-form to SMIL specification.

Virtual Human Mark up Language (VHML) is designed toaccommodate the various aspects of Human–ComputerInteraction with regard to Facial Animation, Body Animation,Dialog Manager interaction, Text-to-Speech production, andEmotional Representation plus Hyper- and Multi-Mediainformation.

Character Mark-up Language (CML) [13] is anXML-based character attribute definition and animationscripting language designed to aid in the rapid incorpora-tion of life-like characters/agents into online applicationsor VWs. This multi-modal scripting language is designedto be easily understandable by human animators andeasily generated by a software process such as softwareagents. CML is constructed based jointly on motion andmulti-modal capabilities of virtual life-like figures.

While providing a rich set of characteristics, the above-mentioned languages are not addressing the issue ofavatar interoperability in VWs. In the following sectionwe propose a description model for this purpose.

3. A metadata model for avatar interoperability in VWs

Despite the fact that several languages related to avatarsand virtual agents exist, ensuring interoperability for avatarsbetween different VWs cannot yet be obtained in an easy,ready to use and integrated manner. Identifying this gap andrecognizing that only the existence of a standardized formatcan enable the deployment of VWs at a very large scale,MPEG initiated in 2008 a new project called MPEG-V.Concerning the avatars, the following requirements wereformulated during the design of MPEG-V:

a)
it should be possible to easily create importers/expor-ters from various VWs implementations,
b)
it should be easy to control an avatar within a VW, and c) it should be possible to modify a local template of the
avatar by using data contained in an MPEG-V file.

In the MPEG-V vision, once the avatar is created (possiblyby an authoring tool independent of any VWs or inside aVW), it can be used in an existent or future VW. A user canhave a unique presentation inside all VWs, like in real life. Hecan change, upgrade, teach his avatar, i.e. ‘‘virtual himself’’ inone VW, and then all the new properties will be available inall others. The avatar itself should then contain representa-tion and animation features, but also higher level semantic


information. However, a VW will have its own internalstructure for handling avatars. MPEG-V does not imposeany specific constraints on the data representation internalstructure of a VW, but only proposes a descriptive formatable to drive the transformation of a template or the creationfrom scratch of an avatar compliant with the VW. All thecharacteristics of the avatar (including the associated motion)can be exported from a VW into MPEG-V and then importedin another VW. In the case of the interface between virtualworlds and the real world (requirement 2), the avatarmotions can be created in the virtual world and can bemapped on a real robot for the use in dangerous areas, formaintenance tasks or the support for disabled or elderlypeople and the like. The inverse operation is also possible:avatars can be animated by signals captured from the realworld, as in the case of motion capture system a descriptiveformat specifying the avatar features, it may be combinedwith MPEG-4 Part 16 (that includes a framework for definingand animating avatars) to provide a fully interoperablesolution.

Defining an interoperable schema as intended byMPEG-V can be of major economic value, being one steptowards the transformation of current VWs from stand-alone and independent applications into an intercon-nected communication system, similar to the currentInternet where a browser can interpret and present thecontent of any web site. At that moment, the VWproviders will not be anymore providers of technology,but will concentrate their efforts on creating content, onceagain the success key of Internet.

In the next section we are describing the schemarepresentation for avatars allowing the interoperabilitybetween different VWs. This schema was proposed[14,15] to MPEG for inclusion in MPEG-V standard andit was adopted as Part 4 of this standard. Let us note thatPart 4 contains the schema for generic virtual objects towhich other MPEG participants contributed as well.

3.1. Analysis of existing VWs

The main aim of creating the common schema is toestablish the possible mapping or transfer between dif-ferent avatar attributes, extracted from the examinedVWs/techniques/standards. Therefore, the first step is toidentify the exhaustive set of features that should beconsidered. Thus, we performed an analysis of existing

Table 2Different VWs and their features.

Name Launch date 3D avatar 3D

Habbo [16] January 2001 Yes Ye

Yoo-walk [17] 2006 Yes N

HiPiPi [18] 2005 Yes Ye

IBM QWAQ [19] Business friendly Ye

Sony Playstation Home [20] Yes Ye

Active Worlds [21] 1997, in beta 1995 yes Ye

Google Lively Defunct Yes Ye

Second Life [22] 23 June, 2003 Yes Ye

Teen Second Life [22] Yes Ye

There [23] 27 October, 2003 Yes Ye

Sims [24] 2002 Yes N

VWs and most popular games, tools and techniques fromcontent authoring packages, together with the study ofdifferent virtual human-related mark-up languages.Table 2 lists several VWs and mentions their features.

Let us note the following:

�

ob

s

o

s

s

s

s

s

s

s

s

/a

While in data representation standards a scene-graphis exposed, the usual practice in VWs and games is tokeep it strictly internal. Therefore, the schema wepropose does not include scene graph representationin the data model.
� The avatar appearance is exposed to the user by
numerous VW implementations; therefore, we providea set of parameters allowing to describe it.
� The avatar geometry is strongly VW/game dependent
in the sense that the resolution is carefully fine-tunedbecause the global rendering performance is dictatedby the number of polygons displayed on the screen. Asa result, we include in the proposed data model onlymeans of controlling global geometric measures andwe consider it as an external resource that can option-ally be imported into the VW.
� Concerning the animation, most VWs/games expose
the avatar skeleton (bones) in order to control theanimation. Therefore, in the data model we propose,we include skeleton characterization and a set of pre-defined animation sequences.

We propose to expose a set of personalization para-meters grouped in three categories: appearance, anima-tion and control. In the following section, we present indetail the three semantic groups as defined by alreadyexisting implementations and the ones we propose. Thewhole schema for avatar representation and how it can beused complete this section.

3.2. Proposed metadata model

3.2.1. Appearance elements

In order to create a metadata model for avatar appear-ance, we evaluated the following VWs, game platformsand standards: Second Life, IMVU, Entropia Universe,Nintendo Wii, Sony PlayStation and HumanML. Theyexpose the avatar characteristics in different quantities.For example, in the group Body, Second Life, EntropiaUniverse, Nintendo Wii, Sony PlayStation and HumanML

jects Chat Age Entert., games Specific

Yes 10–16 Yes/yes

No All Yes/no

N/a All Yes/no

N/a N/a Yes/no Project management

Yes All Yes/yes

yes All Yes/no To develop content

Yes N/a N/a, yes defunct

Yes 18þ Yes/yes To develop content

Yes �17 Yes/yes

Yes 13þ Yes/yes

Yes All Yes/yes

Fig. 4. Avatar appearance.


expose respectively, 22, 2, 1, 4 and 3 components (illu-strated in Fig. 3).

It can be observed that Second Life exposes the mostcomplete set of appearance features. Other VWs, such asNintendo Wii, use simplistic avatars, and therefore the set ofpersonalization parameters is relatively small. On the stan-dard formalisms’ side, only few of them are dealing withdefining metadata for appearance features. HumanML is onethat exposes some features; however, this set is under-defined and far from providing a real personalization support.

Based on this analysis, we designed a model containingmore than 150 different parameters clustered in 14groups and used for describing appearance features. Thegroups are illustrated in Fig. 4.

Therefore, the ‘‘Appearance’’ element contains descriptionsof the avatar’s different anatomic segments (size, form, andanthropometric parameters), as well as references to externalresources containing the geometry and texture resources.While the first can be used to adapt the internal structureof the VW avatar by personalizing it, the second can be usedto completely overwrite it. The second operation can beperformed only when the format for the resource itself isalso known by the importer/exporter—as it is the case when

Fig. 3. Elements describing human ‘‘Body’’ in Second Life (a), Entropia

Universe (b), Nintendo Wii (c), Sony PlayStation (d), and HumanML (e).

using MPEG-4 3D Graphics. In addition, this element alsocontains characteristics of objects that are related to theavatar such as clothes, shoes or accessories. The correspond-ing XML Schema of the ‘‘Appearance’’ element is presented inFig. 5.

A simple example of usage of the ‘‘Appearance’’ ele-ment is provided below.

/AppearanceS/BodyS

/BodyHeight value¼165/S/BodyFat value¼15/S

//BodyS/HeadS

/HeadShape value¼ ‘‘oval’’/S/EggHead value¼ ‘‘true’’/S

//HeadS/Clothes ID¼1 Name¼ ‘‘blouse_red’’/S/AppearanceResourcesS/AvatarURL value¼ ‘‘my_mesh.mp4’’/S//AppearanceResourcesS

//AppearanceS

3.2.2. Animation elements

Several VWs and formalisms were analyzed for animationcapabilities. Many of them provide a rich set of animationparameters, mainly for describing the emotional state of theavatar, and are related to facial (sadness, anger, happiness,etc.) and body (walk, dance, fight, etc.) movement.

For example, in the group ‘‘greetings’’, Second Life,Sony PlayStation and Microsoft Agent expose 5, 4, and 2components respectively (illustrated in Fig. 6).

Based on this analysis, we designed a model containingaround 400 different animations clustered in 12 groups.The groups are illustrated in Fig. 7.

Therefore, the ‘‘Animation’’ element contains a com-plete set of animations that the avatar is able to perform,grouped by semantic similarity. A special group contains


common actions, such as Drink, Eat, Talk, Read and Sit. Asin the previous case, the animation’s geometrical para-meters are represented by external resources, MPEG-Vproviding only the label of the animation sequences. Thecorresponding XML Schema of the ‘‘Animation’’ element isprovided in Fig. 8.

A simple example of use for the ‘‘Animation’’ elementis provided below.

/AnimationS/GreetingS

/SaluteSsalut//SaluteS/CheerScheer//CheerS

//GreetingS/FightingS

/shootSpousse//shootS/throwSthrow//throwS

//FightingS/Common_ActionsS

/drinkSboire//drinkS

Fig. 5. ‘‘Appearance’’

/eatSmanger//eatS/typeStype//typeS/writeSecrire//writeS

//Common_ActionsS/AnimationResourcesS/AnimationURLSmy_anim.mp4//AnimationURLS//AnimationResourcesS

//AnimationS

3.2.3. Control elements

The main purpose of the ‘‘Control’’ element is to definethe places on the human body where sensors can be installedfor motion capture, or to identify places on the virtual bodythat can be controlled by external signals. These locations aregrouped in body features (bones of the body skeleton) andface features (3D locations on avatar’s face).

Several VWs and formalisms were analyzed for virtualhuman control capabilities. All of them use a more or lesscomplex version of the skeleton and some defined feature

type definition.

Fig. 6. Animations in the ‘‘Greetings’’ group for Second Life (a), Sony

PlayStation (b) and Microsoft Agent (c).

Fig. 7. Avatar animation.


points on the face. For example, for the upper part of thebody, H-Anim, Second Life, IMVU, VHML/BAML and AMLexpose 12, 6, 10, 8 and 12 components respectively(illustrated in Fig. 9).

Based on this analysis, we propose two types of control:body and face. The first consists in a skeleton that contains allthe bones in a typical human body, a set of 101 bonesgrouped in 5 categories, as illustrated in Fig. 10 and detailedin Fig. 11 for the category ‘‘upper body’’.

The corresponding ‘‘Control’’ element schema is illu-strated in Fig. 12.

A simple example of using the ‘‘Control’’ element isprovided below.

/ControlS/BodyFeaturesControl S

/UpperBodyBonesS/LClavicleSmy_LClavicle//LClavicleS/RClavicleSmy_RClavicle//RClavicleS

//UpperBodyBonesS/DownBodyBonesS

/LFemurSmy_l_femur//LFemurS/LTibiaSmy_l_tibia//LTibiaS/RFemurSmy_r_femur//RFemurS/RTibiaSmy_r_tibia//RTibiaS

//DownBodyBonesS//BodyFeaturesControlS/FaceFeaturesControlS

/HeadOutlineS/Left X¼0.23 Y¼1.25 Z¼7.26/S/Right X¼0.25 Y¼1.25 Z¼7.21/S/Top X¼2.5 Y¼3.1 Z¼4.2/S/Bottom X¼0.2 Y¼3.1 Z¼4.1/S

//HeadOutlineS//FaceFeaturesControlS

//ControlS

Let us note that the proposed formalism for appear-ance, animation and control was accepted as a standardschema for avatar characteristics in MPEG-V Part-4. Dur-ing the standardization process, this schema was com-pleted by three additional elements, ‘‘CommunicationSkills’’ and ‘‘Personality’’ originally proposed by Oyarzunet al. [25], and ‘‘HapticProperties’’, originally proposed byOyarzun et al. [26]. The definition of these additionalelements is the following:

�
The ‘‘Communication Skills’’ element contains a set ofdescriptors providing information on the differentmodalities in which an avatar is able to communicate. � The ‘‘Personality’’ element contains a set of descriptors
defining the personality of the avatar.
� The ‘‘HapticProperties’’ element contains the high level
description of the haptic properties.

A detailed explanation of these elements, including theschema definition, is provided in [27]. The overall XMLSchema defining the ‘‘Avatar’’ element in MPEG-V, group-ing together the six categories introduced above, is illu-strated in Fig. 13.

In the following section, we provide several examplesof practical usage of the proposed schema, as specified inMPEG-V, as an intermediate layer for ensuring the inter-operability between different VWs. There are two scenar-ios for using the MPEG-V avatar framework. In the firstone, only avatar characteristics are shared between VWs;the second one, where MPEG-V is completed by the lowlevel representation of the avatar (such as the one definedin MPEG-4), allows the export of full avatars and visua-lization on external players.

Fig. 8. Animation element compositing elements.


4. Implementation of the interoperability framework

The vast majority of VWs and online games allowdesigning the player’s avatar with a built-in editor. Forthis purpose, they offer one or several avatar templatesthat the user can modify. The dedicated built-in toolscontaining a predefined set of default elements can bemanipulated in order to personalize the avatar. The setusually includes different head and body shapes, skincolor, position and design of the ears, eyes, nose andmouth. Additionally, some of them offer assets for dres-sing up the avatar: clothes, shoes, accessories etc.

Building an avatar from a template simplifies signifi-cantly the design phase, making it possible for unskilledusers to create avatars. However, this task remains a time-consuming process, as designing an avatar sometimes takesdays, and usually it is a continuous process: the avatar canbe upgraded at any time. Not having the possibility to reusethe designed avatar in other VWs obliged users to persona-lize their avatars in each VW separately. The proposedinteroperability framework addresses this issue where theset of metadata characterizing the avatar can be used topersonalize a pre-existent template.

4.1. Transferring ‘‘characteristics’’

In order to validate the proposed framework for avatarinteroperability, we performed the following scenario. Theuser creates one avatar in any VW that exposes its resourcesand characteristics. An example of avatar appearance that ismodeled in the VW is presentedin Fig. 14.

By using the proposed schema, it is possible to map theavatar characteristic into standardized elements. These ele-ments are imported in a second VW which ‘‘understands’’this XML, therefore applying the personalization to its own

avatar template. The mapping can be considered as anautomation of the process of redesigning the avatar in thesecond VW. It is also possible that some characteristics willbe lost or will not look exactly the same, due to differences intemplates, however many similarities will occur. An exampleof the structure representing the current avatar and mappingaccording to the schema is provided below.

/Avatar xmlns:xsi¼ ’’http://www.w3.org/2001/XMLSchema-

instance’’ xsi:noNamespaceSchemaLocation¼ ’’location of the XSD

schema’’S/AvatarAppereanceS

/BodyS/BodyHeightS1.65//BodyHeightS/BodyThicknessS1.23//BodyThicknessS/BodyFatShigh//BodyFatS/TorsoMusclesShigh//TorsoMusclesS/NeckThiknessS1.23//NeckThiknessS/NeckLengthS1.23//NeckLengthS/ShouldersS1.23//ShouldersS

yyyyyyyyyyyyyyyy..

A second capability exposed by the proposed schema isrelated to how animation can be triggered in different VWs.In the case where only characteristics are transferred, eachVW uses its own built-in resources for animations. If one’savatar ‘‘runs’’ in VW1, simply sending a fragment of XML,such as exemplified in the XML code below, will trigger‘‘run’’ on the avatar in VW2. The two animations may be thesame or slightly different, still the second world can under-stand the transmitted XML and perform the exact action.

/WalkS/DefaultRunS

/UrlSrun//UrlS//DefaultRunS

//WalkS

im (a), Second Life (b), IMVU (c), VHML/BAML (d) and AML (e).


4.2. Teleporting avatars

Fig. 9. Defined bones for the upper part of the body H-An

Although the proposed schema is generally used as acontainer for avatar characteristics, it can refer to externalresources (that can be formalized in MPEG-4 Part 11 and

Part 16) for avatar geometry, appearance and animation.Therefore, completely and exactly the same avatar can be‘‘teleported’’ through VWs, with the condition of imple-menting exporters and importers compliant with MPEG-Vand MPEG-4. The example provided below contains a


fragment of XML where the mesh is specified to be usedfrom an externely-defined resource, called ‘‘myavatar_-mesh.mp4’’. The element /AppearanceResourcesS fromthe schema is reserved for that purpose.

/AppearanceResourcesS/AvatarURLSmyavatar_mesh.mp4//AvatarURLS

//AppearanceResourcesS

Fig. 10. Avatar skeleton.

Fig. 11. Human upper body bones.

Fig. 12. Control element co

In the same manner, animation can be loaded from anexternal resource. The element /AnimationResourcesS isused as a holder of the animation resource as exemplifiedbelow.

/AnimationResourcesS/AnimationURL4 myanimation.mp4 //AnimationURLS

//AnimationResourcesS

However, usual implementations of VWs are rarelyopen and do not allow importing assets created outside,but usually enable exporting the assets. Therefore, apossible usage is to visualize the avatar in externalplayers. Since MPEG-V can refer to MPEG-4 resourcesfor avatar geometry appearance and animation, an MPEG-4 player can be used for visualizing. Fig. 15 shows anavatar exported from a VW and visualized on an MPEG-4player.

4.3. Animation mapping between VWs

Another example of mapping features between VWs ispresented by using the element ‘‘Control’’, as illustrated inFig. 16. Here, it is possible to compute an animation track,even if the corresponding animation, indicated by itsstandardized name, is not originally available in thevirtual world.

Let us consider the following situation: in VW1, thetemplate contains the definition of the skeleton and the‘‘walk’’ animation is defined. In VW2, the template alsocontains the definition of the skeleton, but no ‘‘walk’’animation. In general, the templates may be different indifferent VWs, therefore a priori the skeleton definition isdifferent. For example, let us suppose that a skeleton inVW1 is represented, for simplicity, only by three bones:Bone1, Bone2 and Bone3, and the skeleton in VW2 isrepresented by: BoneA, BoneB and BoneC. By using the‘‘Control’’ element, it is possible to create the correspon-dence between the elements of the skeletons, as pre-sented in Table 3. Here, the elements LClavicle, RClavicleand LFemur from MPEG-V schema are mapped withBone1, Bone2 and Bone3 respectively in VW1 and toBoneA, BoneB and BoneC respectively in VW2. Therefore,the correspondence between bones in VW1 and VW2is done.

Therefore, an animation track referred to by the‘‘Animation’’ element of VW1 can be mapped to ananimation track in VW2 only by updating the bonesidentifiers and re-using the geometric transformations.

mpositing elements.

Fig. 13. ‘‘Avatar’’ element compositing elements and attributes.

Fig. 14. The avatar in the original VW [28].

Fig. 15. Avatar from VW in MPEG-4 player on desktop (a) and mobile

phone (b).

Fig. 16. Motion retargeting.


/Avatar AnimationS/AnimationResourcesS

/Description4VW1 Avatar’s Walko/DescriptionS/Uri4walk.mp4o/UriS

//AnimationResourcesS//AvatarAnimationS

Table 3Example of bone mapping.

VW1 MPEG-V VW2

/ControlS/BodyFeaturesControl lS

/TopBodyBoneslSBone1 /LClavicle4o/LClaviclelS BoneA

Bone2 /RClaviclelS//RClaviclelS BoneB

Bone3 //TopBodyBoneslS BoneC

/DownBodyBoneslS/LFemurlS //LFemurlS

//DownBodyBoneslS//BodyFeaturesControl lS

//ControllS


5. Conclusion

This paper introduces the avatar data model defined inMPEG-V, which enables transferring characteristics of anavatar from one virtual world to another. It is based on thedefinition of a set of properties that allows the personaliza-tion of templates defined within the VW. ‘‘Teleporting’’completely the avatars (full geometry, appearance, anima-tion, and characteristics) from one VW to another raisestechnical and non-technical issues. The first refers to theneed of fine-tuning the mesh/texture/animation resolutionto ensure acceptable rendering performance in the targetedVW, and the second is related to the reticence of VWscreators to let the VW be invaded by objects and avatar thatthey do not control entirely. The proposed model is acompromise between the uniqueness of avatar in differentvirtual worlds and the capability of the virtual world forimporting avatars defined by external tools.

References

[1] Social Games, /www.cellufun.comS.[2] The play anywhere game cloud, /mobile.itsmy.comS.[3] Second Life on the iPhone, /sparkle.genkii.comS.[4] X3D, H-Anim and VRML specifications, /www.web3d.org/x3d/

specificationsS.[5] ISO/IEC 14772-1, The Virtual Reality Modeling Language, 1997,

/www.web3d.org/x3d/specifications/vrmlS.[6] COLLADA, 3D Asset Exchange Schema, /www.khronos.org/colladaS.[7] ISO/IEC 19774:2006: Information technology, Computer graphics

and image processing, Humanoid Animation (H-Anim), /h-anim.orgS.

[8] ISO/IEC 14496-1, MPEG-4 Part 1: Systems, /www.iso.chS.[9] ISO/IEC 14496-16, MPEG 4 Part 16: Animation Framework Exten-

sion (AFX), /www.iso.chS.[10] R. Brooks, and K. Cagle, The Web Services Component Model And

HumanML. Technical Report, OASIS/HumanML Technical Commit-tee, 2002.

[11] H. Vilhjalmsson, N. Cantelmo, J. Cassell, N.E. Chafai, M. Kipp, S.Kopp, M. Mancini, S. Marsella, A.N. Pelachaud, Z.M. Ruttkay, K.Thorisson, H.V. Werf. . The behavior markup language: recentdevelopments and challenges. In Pelachaud C., Martin J-C., AndreE. Collet G., Karpouzis K., Pele D.(Eds.), Proceedings of the 7thInternational Conference on Intelligent Virtual Agents: ElectronicNotes in Artificial Intelligence : Springer, Berlin, 2007 pp. 90–111.

[12] Tsutsui, T., Saeyor, S. Ishizuka, M., MPML: a multimodal presenta-tion markup language with character agent control functions, in:Proceedings (CD-ROM) WebNet 2000 world conference on theWWW and lnternet, San Antonio, Texas, USA, 2000.

[13] Y. Arafa, A., Mamdani, Scripting embodied agents behavior withCML: character markup language. In: IUI ‘03: proceedings of theeighth international conference on intelligent user interfaces, ACM,New York, NY, USA, pp. 313–3162003.

[14] B. Jovanova, M. Preda, F. Preteux, The role of interoperability invirtual worlds, analysis of the specific cases of avatars, Journal ofVirtual Worlds, SI on Technology, Economy, and Standards 2 (3)(2009).

[15] Jovanova B., Preda M., 2010. Avatars interoperability in virtualworlds, In: Proceeding of the MMSP’10 IEEE international work-shop on multimedia signal processing, Saint-Malo, France, October4–6, 2010, 263–268.

[16] Make friends, y /www.habbo.comS.[17] Free virtual world to walk y /www.yoowalk.comS.[18] China’s pioneer of the 3D Virtual World Platform, /www.hipihi.

com/enS.[19] J. Essid, In A strange Land, Available from /iggyo.blogspot.com/

2009/03/qwaq-and-suits-should-linden-lab-worry.htmlS.[20] Official PlayStation web site /www.playstation.comS.[21] 3D Virtual Worlds, build your own 3D avatar world in minutes,

/www.activeworlds.comS.[22] Virtual Worlds, Avatars, free 3d chat online chatting, /secondlife.

comS.[23] There — The online virtual world that is your y /www.there.

comS.[24] The SIMS game portal, /www.thesims.comS.[25] D. Oyarzun, A. Ortiz, M. del Puy Carretero, Gelissen, J., Garcia-

Alonso, A., and Sivan, Y. ADML: A framework for RepresentingInhabitants in 3D Virtual Worlds. In: Stephen N. Spencer (Ed.).Proceedings of the 14th International Conference on 3D WebTechnology (Web3D ‘09), ACM, New York, NY, USA, pp. 83–90.

[26] Cha Jongeun, Seo Yongwon, Kim Yeongmi, Ryu Jeha, An authoring/editing framework for haptic broadcasting: passive haptic interac-tions using MPEG-4 BIFS, In: Proceedings of the Second JointEuroHaptics Conference and Symposium on Haptic Interfaces forVirtual Environment and Teleoperator Systems (WHC ‘07), IEEEComputer Society, Washington, DC, USA, pp. 274–279.

[27] M. Preda (Ed.), Text of ISO/IEC CD 23005-4 Avatar Information,w10786, 89th MPEG Meeting, London, 2007.

[28] The Orange box, Half Life 2, /half-life2.comS.

www.cellufun.com

mobile.itsmy.com

sparkle.genkii.com

www.web3d.org/x3d/specifications

www.web3d.org/x3d/specifications

www.web3d.org/x3d/specifications/vrml

www.khronos.org/collada

h-anim.org

www.iso.ch

www.iso.ch

www.habbo.com

www.yoowalk.com

www.hipihi.com/en

www.hipihi.com/en

iggyo.blogspot.com/2009/03/qwaq-and-suits-should-linden-lab-worry.html

iggyo.blogspot.com/2009/03/qwaq-and-suits-should-linden-lab-worry.html

www.playstation.com

www.activeworlds.com

secondlife.com

secondlife.com

www.there.com

www.there.com

www.thesims.com

half-life2.com

Avatar interoperability and control in virtual Worlds

Documents

Transcript of Avatar interoperability and control in virtual Worlds