SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this...

18
SLTAT 2015 programme Dates: April 9–10, 2015 Venue: ISCC, 20 rue Berbier du Mets, Paris, France Day 1 Registration and welcome, 9:15 at ISCC Coffee & biscuits served, opening at 9:45 Morning session 10:00 Invited talk on necessities, feasibility, untackled problems (Interpretis & Orange) 11:15 Talks on systems to be presented in afternoon session (motivations, features, science) Lunch, 12:30 in the square René Le Gall Coffee served at 13:30 in ISCC Afternoon session, 14:00 at ISCC System demos, coffee & biscuits during break Evening event, 17:30 at ELDA 1. Presentation of a deaf University project by Mains Diamant 2. Aperitif served Dinner, 20:00 at restaurant (tbd) Day 2 Morning session, 9:30 with a 20-min coffee break User feedback and discussion, from video interviews of the previous day Lunch, 12:00 at the square René Le Gall Coffee served at 13:00 in ISCC Afternoon session, 13:30 with a 20-min coffee break Plenary scientific discussions chaired by A. Heloir Mot de la fin 16:30 Conclusive talk 17:00 Closing

Transcript of SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this...

Page 1: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

SLTAT 2015 programmeDates: April 9–10, 2015

Venue: ISCC, 20 rue Berbier du Mets, Paris, France

Day 1

Registration and welcome, 9:15 at ISCC

Coffee & biscuits served, opening at 9:45

Morning session

10:00 Invited talk on necessities, feasibility, untackled problems (Interpretis & Orange)

11:15 Talks on systems to be presented in afternoon session (motivations, features, science)

Lunch, 12:30 in the square René Le Gall

Coffee served at 13:30 in ISCC

Afternoon session, 14:00 at ISCC

System demos, coffee & biscuits during break

Evening event, 17:30 at ELDA

1. Presentation of a deaf University project by Mains Diamant

2. Aperitif served

Dinner, 20:00 at restaurant (tbd)

Day 2

Morning session, 9:30 with a 20-min coffee break

User feedback and discussion, from video interviews of the previous day

Lunch, 12:00 at the square René Le Gall

Coffee served at 13:00 in ISCC

Afternoon session, 13:30 with a 20-min coffee break

Plenary scientific discussions chaired by A. Heloir

Mot de la fin

16:30 Conclusive talk

17:00 Closing

Page 2: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

Map and locations

1. ISCC for morning and afternoon sessions : 20 rue Berbier du Mets

2. Square René Le Gall for lunch breaks

3. ELDA for social event : 9 rue des Cordelières

Page 3: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

Lists of systems presented

1

ProDeaf – 3D avatar based app for translation from Portuguese to Brazilian Sign Language (Libras)

M. Amorim, R. Kimura, J. Fernandes

2

A grammar knowledge based MT system for the language pair: Greek GSL‐

E. Efthimiou, S. E. Fotinea, K. Marmarellis, T. Goulas, O. Giannoutsou, D. Kouremenos‐

3

Exploring novel interaction methods for authoring sign language animations

A. Heloir, F. Nunnari

4

Augmenting EMBR virtual human animation system with MPEG-4 controls for producing ASL facial expressions

M. Huenerfauth, H. Kacorri

5

Learning sign language in a playful way with SiGame – The App

G. Tschare

6

Automatic translation system to JSL (Japanese Sign Language) about weather information

S. Umeda, M. Azuma, T. Uchida, T. Miyazaki, N. Kato, S. Inoue, N. Hiruma

7

Inferring biomechanical kinematics from linguistic data: a case study for role shift

R. Wolfe, J. C. McDonald, R. Moncrief, S. Baowidan, M. Stumbo

Page 4: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

ProDeaf – 3D avatar based app for translation from Portuguese to

Brazilian Sign Language (Libras)

AMORIM, Marcelo*; KIMURA, Renato; FERNANDES, Juliana

(*) [email protected]

Motivation

Worldwide, many people have some degree of hearing impairment. Only in Brazil,

according to IBGE, the Brazilian Institute of Geography and Statistics, in 2010 there

were over ten million hearing impaired people; despite of that, only few hearings

are fluent in Sign Language.

Thinking about this huge problem, was created an initiative in the Federal

University of Pernambuco, Brazil. The engine called ProDeaf is a platform for

communication between the deaf and the hearing, resulting from the need for

communication among hearing and deaf students.

Features

The Prodeaf’s engine is used in many contexts, as many are free for use and

appropriation to people who make use of Sign Language. The platform includes:

I. An app for mobile devices, which translates the text/speech from

Portuguese to Libras, besides a bilingual dictionary;

II. A website version, similar to the mobile app; a dictionary; and a

collaborative tool to register new signs;

III. Translator for websites, in order to make textual content accessible.

These solutions have different translation procedures, which may be automated or

assisted, but are always proceeded by professional translation activity, bearing

inherent linguistics and grammatical conceptions. Both automatic and assisted

translation are based on a methodology that has been designed and tested by a

multidisciplinary team composed of programmers, linguists, designers, translators

and deaf people, this team is also responsible for researching and negotiating the

software’s database, processes as well as its linguistics and grammatical

challenges, since Sign Languages are living and highly organic. Due to this work,

the platform currently has over 4,000 signs.

Page 5: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

For the Sign Language Translation and Avatar Technology 2015, we will provide

the free app which by now can translate Portuguese do Libras. We will also show a

functional demo version of ProDeaf translating English to International Sign (IS).

Science

Through complex computer algorithms, it is possible to translate text and speech,

from Portuguese to Libras (the Brazilian Sign Language). The visual-spatial

representation is performed by 3D humanoid avatars, which are slightly distorted in

their visual design in order to highlight communicative features of Sign Languages,

such as facial expressions and handshapes.

On the research needed for the process of translation, a multidisciplinary team was

assembled, including deaf people, designers, programmers, Libras specialists,

linguists, among others. The team was focused on breaking the communication

barrier between deaf and hearing people, allowing people who don’t know Libras

start to learn the language and communicate with deaf people.

For this it was necessary a survey commissioned about sign language segregation

and researches about 3D natural human movements, besides others.

ProDeaf contributes to the state of the art of 3D technologies for sign language due

to its malleable structure. The signer avatar is fully articulated, and the translations

are done in real time and can be automated. This mechanics is made possible by

the ability of the avatar to understand XML files with configuration coordinates,

making the Libras signal according to the language parameters, namely: Hand

Setup, Guidance, Joint Point, Motion, and No Manual Features (such as facial

expression).

Segregation between the presentation layer and the structural layer of the signals

makes it possible to significantly expand the signal database, and make much

lighter visual representations, allowing its use in various scenarios.

Keywords: translation, Libras, sign language, 3D avatar.

Page 6: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

Eleni Efthimiou1 <[email protected]>, Stavroula-Evita Fotinea1,Konstantinos Marmarellis1,2, Theodor Goulas1, Olga Giannoutsou1, Dimitris Kouremenos2

1 ILSP-Institute for Language and Speech Processing / ATHENA RC2 Electrical and Computer Engineering Dpt, National Technical University of Athens,

Athens, Greece

System demonstration proposal

A grammar knowledge based MT system for the language pair: Greek-GSL

We intend to demonstrate the current state of performance of an MT system forautomatic translation from written Greek to Greek Sign Language (GSL). The systemreceives as input parsed Greek sentences, chunked according to lexical and structuralinformation available for the Greek language and produces their translation output inthe form of structured GSL phrases, either as glossed sequences of structured signs oras HamNoSys coded structured phrases resulting from various matching rules betweenthe two languages. For synthesis at the output side, an electronic grammar of GSL isexploited in combination with appropriate information from a lexicon database. Thetranslation output feeds a SiGML driven signing avatar, so enabling visualization of theproduced GSL phrases.

Fig. 1: Basic MT system architecture

The MT system is open source and incorporates most of the widely accepted standards.It is programmed in Java to allow for quick and efficient design and development,compatibility with all system platforms, while XML technology has been utilised toprovide structured documents in a reusable format. The system exploits twodistinguished modules and four language resources repositories.

Page 7: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

The two modules are: i) A robust parser of Modern Greek that provides structural chunks of written

utterances decorated with part-of-speech (POS) tags, which maintain major grammarproperties of the input sentences and serve as input to the transfer module.

ii) A transfer module which performs matching of grammar structures and lexical itemsbetween the languages of the translation pair.

The resources repositories exploited in the MT process are the following:a) A morphological computational lexicon of Modern Greek,b) A robust analysis grammar component of Modern Greek,c) A bilingual Modern Greek-GSL lexicon, and d) A GSL grammar component which provides a formal representation of the core

phenomena of GSL grammar.

The system may handle affirmative sentences with stative and motion predicates,placement of arguments in the signing space, variation due to classifier use,lexicalized/prosodic and structural negation and interrogation, as well as themechanisms GSL exploits for the declaration of Tense/Aspect values.The system’s performance is rather robust in respect to its core grammar. This isverified by extensive testing with test suites of sentences structured to test differentlevels of complexity and rule combinations. In parallel, the system is been constantly enriched with new features which enableextension of coverage in respect to the grammar phenomena handled. However, in the current state of the system, which is more mature than the initial one,the scientific goal is to place emphasis on optimization of the system’s grammarwriting, targeting more efficient rule definitions that prompt for wider generalizations,and thus narrow the need for rules with hapax application.

Fig. 2: System input and output screens displaying information from rule application and lexicon check

Indicative group referencesEfthimiou, E., Fotinea, S. E. and Sapountzaki, G. 2006. Processing linguistic data for GSL

structure representation. In Proceedings of the Workshop on the Representation andProcessing of Sign Languages: Lexicographic matters and didactic scenarios,Satellite Workshop to LREC-2006 Conference, 49-54.

Kouremenos, D., Fotinea, S. E., Efthimiou, E., and Ntalianis, K. 2010. A Prototype GreekText to Greek Sign Language (GSL) Conversion System. In Behaviour & InformationTechnology Journal (TBIT), (TBIT), 29, 5, 467-481, DOI:10.1080/01449290903420192.

Efthimiou, E., S.-E. Fotinea, Goulas, T., Dimou, A.-L., Kouremenos, D. 2014. FromGrammar Based MT to Post-processed SL Representations. In Universal Access in theInformation Society (UAIS) journal. Springer (to appear).

Page 8: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

Exploring novel interaction methods for authoring signlanguage animations

System Presentation

Alexis Heloir∗

LAMIH-UMR CNRS 8201, Valenciennes, FranceSLSI group, DFKI-MMCI, Saarbrücken, Germany

[email protected]

Fabrizio NunnariSLSI group, DFKI-MMCI, Saarbrücken, Germany

[email protected]

1. INTRODUCTIONWe are developing an online collaborative framework allow-ing Deaf individuals to author intelligible signs using a ded-icated authoring interface controlling the animation of a 3Davatar. The system that we present mainly focus on a novelarchitecture for authoring 3D animation of human figures.The user interface (UI) is assisted by novel gesture-based in-put devices. We claim that this tool can not only benefit tothe Deaf but also to the linguists by providing them with anovel kind of material consisting of intelligible sign languageanimation together with a fine-grained log of the user’s editactions.

2. MOTIVATIONCurrent signing avatar technology either relies on a symbolicrepresentation which encodes the avatar’s utterance or uses aconcatenative approach consisting of stitching together cap-tured or manually authored Sign Language (SL) clips. Inboth cases, enriching the signing avatar system requires ex-perts, either in the symbolic representation language or inMotion Capture of 3D animation. We believe that this is abig challenge to the adoption of signing avatar technologysince such experts are hard to find and expensive to train.We are tackling this challenge by proposing a dedicated soft-ware tool for authoring the animation of 3D avatars. We be-lieve that this attempt could not only increase the adoptionof SL avatar technology but also be benefical to the researchin SL linguistics.

For the Deaf1, such a tool, when online, has the potentialto enable multiple linguistic communities to illustrate andshare new concepts, invent new signs and develop dictio-naries. Eventually, it might also be an alternative to video

1We follow the convention of writing Deaf with a capitalized“D” to refer to members of the Deaf community [3] whouse sign language as their preferred language, whereas deafrefers to the audiological condition of not hearing.

recording, unlocking the possibility, for deaf individuals, toexpress their opinion anonymously, on the internet, usingtheir primary language. Such a tool would also put SL stud-ies back into the hands of the Deaf by allowing them todevelop large corpora of animation data. For the linguists,large corpora of intelligible animation data would be a veryvaluable research material. Firstly, linguists would have ac-cess to a concise multimodal 3D animation that would bemuch more “readable” than traditional motion capture sinceit would mostly convey this “essential” part of the anima-tion that makes the sequence understandable whereas theessence of the animation has a tendency to be buried un-der the dense stream of data generated by motion-capturesystems. Secondly, provided that the sequence of user’s editactions is logged, linguists would also have access to theactual authoring strategy adopted by the SL author. Thiscould shed new light on the user’s actual intent.

Digital character animation is a highly cross-disciplinary do-main that spans across acting, psychology, movie-making,computer animation, and programming. Not surprisingly,learning the art of traditional animation requires time, ef-fort and dedication. However, recent consumer-range tech-nology such as the one presented by Sanna et al. [4] hasproved to be capable of enabling inexperienced users au-thoring animations of human-like bodies and interactivelycontrolling physical or digital puppets. We are aiming ata similar goal for sign language animation. Our system isinnovative because not only does it allow novice users to nat-urally edit complex animations using natural input deviceslike the Kinect2 and the Leap-Motion3, but it also allowsthem to switch seamlessly between traditional space-timeconstraint edit and interactive performance capture record-ing. The use of low-cost devices, available off-the-shelf, al-lows and encourages the adoption of the system by a largenumber of users.

In the following, we briefly describe the system which willbe demonstrated during the SLTAT workshop. It consistsof a 3D animation authoring platform that supports gestureinput and that is capable to record user’s facial expressions,handshapes and arm movement.

2http://www.microsoft.com/en-us/kinectforwindows/ - 15Jan. 20153https://www.leapmotion.com/ - 15 Jan. 2015

Page 9: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

Kinect

PerformanceCapture

PerformanceCapture

DataSimplification

DataSimplification

ManualEdit

ManualEdit

Cont

rol R

igCo

ntro

l Rig

Leap Motion

User

Straight-ahead

Pose-to-pose

AnimationAnimation

Figure 1: The authoring pipeline overview.

3. PROPOSED DESIGNThe contribution of the architecture we propose consists ofendowing the user with the capability to seamlessly recordand edit character animation using both pose-to-pose anima-tion and performance capture. The framework is depictedin Fig. 1.

In performance capture mode, the animator drives the an-imation like a puppeteer. The motion of both hands andface of the animator is tracked in real time and his live per-formance drives the animation of the avatar that is beingrecorded at a rate of 25 frames per seconds. The facial ani-mation is recorded by the Kinect. The motion of hands andfingers is recorded by the Leap Motion. The frame density islater reduced in order to allow a latter manual pose-to-poseedit.

In pose-to-pose edit mode, the user controls one hand at atime using the Leap Motion. In this mode, the user handposture is not directly followed by the system, rather, it in-tegrates sequences of consecutive relative edits that consistsinto small grab and release actions on one of the character’shand. This edit mode permits a much finer and more precisepositioning of the character’s body parts (hands, head, eyes,torso, hips). The user thus uses her or his own body to applyoffsets to the key poses resulting form the Data Simplifica-tion phase. The idea is to keep a correspondence betweenthe body parts of the author and the virtual character’s.However, in contrast with the direct control of a perfor-mance capture, the author performs a “relative” control onthe character current posture. For example, the movementof an author hand from an arbitrary position is used to applyan offset to the position of a hand of the virtual character.We can call this a form of body-coincident puppeteering.

Hence, the user doesn’t see the digital character as a mir-rored interpretation of himself: the screen is not anymorea mirror, it is rather a virtual window on the 3D editingspace. Concerning the camera control, we aim at an author-ing setup where the user focus only on character postureediting without the need of controlling the camera position;past studies already demonstrated that several solutions can

be applied to successfully enable depth perception in char-acter animation, thus eliminating the need of rotating thecamera viewpoint [1].

4. RESEARCH QUESTIONThis work is an intermediate step in a global project aimingat developing a crowd-sourced sign-language animation edi-tor for the many Deaf Communities. The prototypes we im-plemented so far only run on a single computer and cannotbe accessed by multiple users over the internet. However,using these prototypes, we could conduct two users stud-ies which let us assess and validate a number of interactionmetaphors that will be implemented in an online platform.The online platform will leverage modern CSS3/HTML5user interface technologies as well as the emerging XML3D4

technology [2]. We are currently working in close collabora-tion with the XML3D developers in order to guarantee thatall the features required by our animation platform can beimplemented in XML3D.

5. REFERENCES[1] M. Kipp and Q. Nguyen. Multitouch puppetry:

creating coordinated 3D motion for an articulated arm.page 147. ACM Press, 2010.

[2] F. Klein, K. Sons, D. Rubinstein, and P. Slusallek.Xml3d and xflow: Combining declarative 3d for theweb with generic data flows. IEEE Computer Graphics& Applications (CG&A), 33(5):38–47, 2013.

[3] C. Padden and T. Humphries. Deaf in America: voicesfrom a culture. Harvard University Press, Cambridge,Mass., 1988.

[4] A. Sanna, F. Lamberti, G. Paravati, and F. D. Rocha.A kinect-based interface to animate virtual characters.Journal on Multimodal User Interfaces, 7(4):269–279,Oct. 2013.

4http://xml3d.org/ - 15 Jan. 2015

Page 10: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

Augmenting EMBR Virtual Human Animation System with MPEG-4 Controls for Producing ASL Facial Expressions

Matt Huenerfauth Rochester Institute of Technology (RIT)

Golisano College of Computing and Information Sciences 20 Lomb Memorial Drive, Rochester, NY 14623

[email protected]

Hernisa Kacorri The Graduate Center, CUNY

Computer Science Ph.D. Program 365 Fifth Ave, New York, NY 10016

[email protected]

1. Motivations Our laboratory is investigating technology for automating the synthesis of animations of American Sign Language (ASL) that are linguistically accurate and support comprehension of information content. A major goal of this research is to make it easier for companies or organizations to add ASL content to websites and media. Currently, website owners must generally use videos of humans if they wish to provide ASL content, but videos are expensive to update when information must be modified. Further, the message cannot be generated automatically based on a user-query, which is needed for some applications. Having the ability to generate animations semi-automatically, from a script representation of sign-language sentence glosses, could increase information accessibility for many people who are deaf by making it more likely that sign language content would be provided online. Further, synthesis technology is an important final step in producing animations from the output of sign language machine translation systems, e.g. [1].

Synthesis software must make many choices when converting a plan for an ASL sentence into a final animation, including details of speed, timing, and transitional movements between signs. Specifically, in recent work, our laboratory has investigated the synthesis of syntactic ASL facial expressions, which co-occur with the signs performed on the hands. These types of facial expressions are used to convey whether a sentence: is a question, is negated in meaning, has a topic phrase at the beginning, etc. In fact, linguists have described how a sequence of signs performed on the hands can have different meanings, depending on the syntactic facial expression that performed [8]. For instance, an ASL sentence like “MARY VISIT PARIS” (English: Mary is visiting Paris.) can be negated in meaning with the addition of a Negation facial expression during the final verb phrase. As another example, it can be converted into a Yes/No question (English: Is Mary visiting Paris?) with the performance of a Yes-No-Question facial expression during the sentence.

The timing, intensity, and other variations in the performance of ASL facial expressions depend upon the length of the phrase when it co-occurs (the sequence of signs), the location of particular words during the sentence (e.g., intensity of Negation facial expression peaks during the sign NOT), and other factors [8]. Thus, it is insufficient for a synthesis system to merely play a fixed facial recording during all sentences of a particular syntactic type. So, we are studying methods for planning the timing and intensity of facial expressions, based upon the specific words in the sentence. As surveyed in [7], several SLTAT community researchers have conducted research on facial expression synthesis, e.g., interrogative questions with co-occurrence of affect [11], using clustering techniques to produce facial expressions during specific words [10], the use of motion-capture data for face animation [2], among others.

2. Features To study facial expressions for animation synthesis, we needed an animation platform with a specific set of features: 1. The platform should provide a user-interface for specifying

the movements of the character so that new signs and facial expressions can be constructed by fluent ASL signers on our research team. These animations become part of our project’s lexicon, enabling us to produce example sentences so that our animations can be tested in studies with ASL signers.

2. The virtual human platform must include the ability to specify animations of hand, arm, and body movements (ideally, with inverse kinematics and timing controls), so that we can rapidly produce those elements of the animation.

3. The platform must provide sufficiently detailed face controls so that we can create subtle variations in the face and head pose, to enable us to experiment with variations in the movement and timing of elements of the face.

4. The platform should allow the face to be controlled using a standard parameterization technique so that we can use face movement data from human signers to animate the character.

Fig. 1: EMBR user-interface for controlling virtual human

The open-source EMBR animation platform [3] already supported features 1 and 2, listed above. To provide features 3 and 4, we selected the character “Max” from this platform, and we enhanced the system with a set of face and head movement controls, following the MPEG-4 Facial Action Parameter standard [5]. Specifically, we added controls for the nose, eyes, and eyebrows of the character. While the mouth is used for some ASL linguistic facial expressions, the upper face is essential for syntactic facial expressions [8]. The MPEG-4 standard was chosen because it is a well-defined face control formalism (thereby making any models we investigate of face animation more readily applicable to other virtual human animation research), and there are various software libraries available, e.g., [9], for automatically analyzing human face movements in a video, to produce a stream of MPEG-4 parameter values representing face movements over time.

At the workshop, we will demonstrate our system for constructing facial expressions using MPEG4 controls in EMBR; we will also

Page 11: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

show animation examples synthesized by the system. Specifically, our laboratory has implemented the following:

• We added facial morphs to the system for each MPEG-4 facial action parameter: Each of these parameters specifies vertical or horizontal displacements of landmark points on the human face, normalized by the facial proportions of the individual person’s face. Thus, the morph controls for the Max character’s face had to be calibrated to ensure that they numerically followed the MPEG-4 standard.

• Prior researchers have described how wrinkles that form on the forehead of a virtual human are essential to the perception of eyebrow raising in ASL animations [11]. While computer graphics researchers study various methods for face wrinkling [0], to work within the EMBR engine, we increased the granularity of the wireframe mesh of the character’s face where natural wrinkles should appear, and wrinkle formation was incorporated into the facial morphs.

• To aid in the perception of wrinkles and face movements, a lighting scheme was designed for the character (see Fig. 2).

• Our laboratory implemented software to adapt MPEG-4 recordings of a human face movement to EMBRscript, the script language supported by the EMBR platform. In this way, our laboratory can directly drive the movements of our virtual human from recordings of human ASL signers. The MPEG-4 face recordings can be produced by a variety of commercial or research face-tracking software; our laboratory has used the Visage Face Tracker [9].

a. b. Fig 2: (a) forehead with eyebrows raised before the addition of

MPEG-4 controls, facial mesh with wrinkling, and lighting enhancements, (b) eyebrows raised in our current system

3. Science The primary goal of our implementation work has been to support our scientific agenda: to investigate models of the timing and intensity of syntactic facial expressions in ASL. As part of this work, it will be necessary for us to periodically conduct user-based studies with native ASL signers evaluating the quality of animations that we have synthesized.

As an initial test of our ability to synthesize animations of ASL with facial expressions using this new animation platform, we conducted a pilot test with 18 native ASL signers who viewed animations that were generated by our new system: full details appear in [6]. The animations displayed in the study consisted of short stories with Yes-No Question, WH-Question, and Negation facial expressions, based upon stimuli that we released to the research community in [4]. The participants answered scalar-response questions about the animation quality and comprehension questions about their information content.

In this pilot study, the participants saw animations that were driven by the recording of a human; we previously released this MPEG-4 data recording of a human ASL signer performing syntactic facial expressions in [4]. The hand movements were synthesized based on our project’s animation dictionary, which native signers in the lab have been constructing using the EMBR user-interface tool. Because the data-driven animations contained

facial expressions and head movement, they utilized the skin-wrinkling, lighting design, and MPEG-4 controls of our new animation system. As compared to animations without facial expression shown as a lower baseline, participants reported that they noticed the facial expressions in the data-driven animations, and their comprehension scores were higher [6]. While this pilot study was just an initial test of the system, these results suggested that our laboratory will be able to use this augmented animation system for evaluating our on-going research on designing new methods for automatically synthesizing syntactic facial expressions of ASL. In future work, we intend to produce models for synthesizing facial expressions, instead of simply replaying human recordings.

4. ACKNOWLEDGMENTS This material is based upon work supported by the National Science Foundation under awards 1506786, 1462280, & 1065009. We thank Andy Cocksey and Alexis Heloir for their assistance.

5. REFERENCES [0] Bando, Y., Kuratate, T., Nishita, T. 2002. A simple method

for modeling wrinkles on human skin. In Proceedings of Pacific Graphics ‘02.

[1] Ebling, S., Way, A., Volk, M., Naskar, S.K. (2011). Combining Semantic and Syntactic Generalization in Example-Based Machine Translation. In: M.L. Forcada, H. Depraetere, V. Vandeghinste (eds.), Proc. 15th Conf of the European Association for Machine Translation, p. 209-216.

[2] Gibet, S., Courty, N., Duarte, K., Naour, T.L. 2011. The SignCom system for data-driven animation of interactive virtual signers: methodology and evaluation. ACM Transactions on Interactive Intelligent Sys (TiiS), 1(1), 6.

[3] Heloir. A, Nguyen, Q., and Kipp, M. 2011. Signing Avatars: a Feasibility Study. 2nd Int’l Workshop on Sign Language Translation and Avatar Technology (SLTAT).

[4] Huenerfauth, M., Kacorri, H. 2014. Release of experimental stimuli and questions for evaluating facial expressions in animations of American Sign Language. Workshop on the Representation & Processing of Signed Languages, LREC’14.

[5] ISO/IECIS14496-2Visual, 1999. [6] Kacorri, H., Huenerfauth, M. 2015. Comparison of Finite-

Repertoire and Data-Driven Facial Expressions for Sign Language Avatars. Universal Access in Human-Computer Interaction. Lecture Notes in Computer Science. Switzerland: Springer International Publishing.

[7] Kacorri, H. 2015. TR-2015001: A Survey and Critique of Facial Expression Synthesis in Sign Language Animation. Computer Science Technical Reports. Paper 403.

[8] Neidle, C., D. Kegl, D. MacLaughlin, B. Bahan, and R.G. Lee. 2000. The syntax of ASL: functional categories and hierarchical structure. Cambridge: MIT Press.

[9] Pejsa, T., and Pandzic, I. S. 2009. Architecture of an animation system for human characters. In. 10th Int’l Conf on Telecommunications (ConTEL) (pp. 171-176). IEEE.

[10] Schmidt, C., Koller, O., Ney, H., Hoyoux, T., and Piater, J. 2013. Enhancing Gloss-Based Corpora with Facial Features Using Active Appearance Models. 3rd Int’l Symposium on Sign Language Translation and Avatar Technology (SLTAT).

[11] Wolfe, R., Cook, P., McDonald, J. C., and Schnepp, J. 2011. Linguistics as structure in computer animation: Toward a more effective synthesis of brow motion in American Sign Language. Sign Language & Linguistics, 14(1), 179-199.

Page 12: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

S i gnT ime GmbH

Schottenring 33 • A-1010 Wien • [email protected] IBAN: AT063258500003425709 BIC: RLNWATWW0BG

ATU 63863019 • Handelsgericht Wien • FN 304045a

Learning sign language in a playful way with SiGame – The App

SiGame is the world´s first game app for sign language. Not a human being but an artificial character -

the avatar SiMAX - signs and guides the user through the app. He acts as game partner and teacher at

the same time and guarantees pleasurable language acquisition for deaf and hearing people. The fictional

character was specially developed for this app - this is also a world first.

SiGame is based on an avatar developed for SiMAX, a semi-automatic translation machine for sign

language. The software is operated by a human translator who only needs to adjust translations

suggested by a “learning” database. The resulting translation is then signed by an avatar and delivered

as video. Thanks to a powerful 3D software the avatar is able to display facial expressions – especially

those which are connected with grammatical meanings in sign language – a feature that is unique. It

furthermore displays emotions and body language – all in a very smooth way, significantly advancing the

quality of existing avatar based systems. SiMAX is currently at a prototype stage and therefore not yet

open for testing. The first application of the avatar is SiGame.

1. Motivations

You can play SiGame in different sign languages such as American Sign Language, International Sign and

German Sign Language. For the first time, deaf people are offered the opportunity to play a game in their

mother tongue. Additionally, SiGame enables learning a different sign language in order to connect

internationally or preparing for holidays or a congress abroad.

The game addresses everyone: Hearing and deaf, young and old, game lovers and beginners. With the

game, people working in social services can easily acquire additional skills on the one hand, and children

are pedagogically stimulated by the simultaneous training of the left and right brain hemispheres on the

other. Young people can reasonably indulge in their passion for games and older people improve their

mental fitness. In contrast to other serious games, SiGame convinces with its easy use, so that children

and older people will not have any problems and easily find their way within the app. Due to the fact that

SiGame addresses everyone it also works as a kind of “bridge” between the world of the deaf and the

world of the hearing as it is a very easy way not only to get in touch with sign language but also to learn

it.

2. Features

With SiGame the users can train their knowledge of signs by playing memory, a quiz or “one against

one”. By connecting to their facebook-account users can compare their scores with their friends. SiGame

also includes a training programme for learning signs with a learning curve which enables the user to

check his progress. Of course SiGame also has a kind of “dictionary” for signs where users can look up

Page 13: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

S i gnT ime GmbH

Schottenring 33 • A-1010 Wien • [email protected] IBAN: AT063258500003425709 BIC: RLNWATWW0BG

ATU 63863019 • Handelsgericht Wien • FN 304045a

specific signs. Besides the three different sign languages the user can choose between the written

languages English, German, French and Spanish.

SiGame has been launched in the end of 2014/beginning of 2015 in the Apple App Stores and Google

Play Store. SiGame will be open to test at the workshop. We will provide a QR-code so every participant

can download it on its mobile device including the basic package of signs.

3. Research

The design of the avatar was developed in cooperation with deaf persons. The challenge was to find a

design that avoids the “uncanny valley” effect - when features look and move almost, but not exactly,

like natural beings, it causes a response of revulsion among some observers. The aim was to design an

appealing character everybody can connect with.

Great emphasis was laid on the choice of the signs for SiGame e. g. which subject areas should be

covered by the signs, so that they are interesting and relevant to the daily life of as many users as

possible. The process of choosing the signs and according glosses could be further improved e. g. in a

possible follow-up project as this was limited in this project by the available budget and time. Many signs

are not gender sensitive for example in Austrian sign language the sign for “boy” refers through its

iconicity to a small horn and the sign for “girl” refers to an ear ring. Therefore every sign chosen for

SiGame had to undergo a “gender check”. Research was also done in teaching methodology for an

educational game in sign language as existing solutions for educational games in spoken language cannot

just be transferred to educational games in sign language. The logical structure and also the design of

the game have been tested by a representative group of deaf and hearing persons.

Concerning the technical features of SiGame it was necessary to test out how signs can be displayed on

smartphones in the best possible way so that they can be understood by the users. Therefore versions

were tested different in sizes of signs, colours and speed of signing. Also different possibilities for the

users to influence these settings have been tested.

www.sigame-app.com

Sign Time GmbH

Dr. Georg Tschare, CEO

Schottenring 33

1010 Vienna

Austria

Phone: +43 (0)660 / 800 10 12

E-Mail: [email protected]

Page 14: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

Automatic Translation System to JSL (Japanese Sign Language) about Weather Information

Shuichi UMEDA†, Makiko AZUMA, Tsubasa UCHIDA, Taro MIYAZAKI Naoto KATO, Seiki INOUE , and Nobuyuki HIRUMA

NHK (Japan Broadcasting Corporation) Tokyo, Japan

Key words: JSL(Japanese Sign Language) , Weather Information, Motion Capture, JSL Corpus

1. Motivation

NHK is Japan’s sole public broadcaster. In

response to social demands for barrier-free delivery

of information, NHK has been promoting the use of

sign language in television. The percentage,

however, of signed broadcasting programs is still

low (fig.1).

fig.1 Japanese daily sign language news program

Since it is difficult to keep sign-language

interpreters on stand-by for 24 hours, including for

late-night and early-morning programs, NHK aims

to use sign-language CG translation systems in

broadcasting emergency weather information, as the

first step in expanding our sign-language

broadcasting services.

2. Features

NHK has its own laboratory for research on

broadcasting technologies, and has been involved

mainly in research on machine translation from

Japanese to Japanese Sign Language (JSL) (fig.2).

Aside from the difference in the word order

between Japanese and JSL, grammatical structure

for JSL has not been elucidated. Thus, rule-based

translation methods, such as those used in the

ATLAS Project, could not be used in translating

fig.2 Prototype of machine translation system

E-mail: † [email protected]

Page 15: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

JSL. NHK has therefore been carrying out research

on machine translation through trials using methods

that combine example-based and statistical

translation, based on a sign language corpus.

Limiting the corpus to sentences related to weather,

we have so far collected around 90,000 sentences.

Since translation accuracy is still insufficient and

translation takes time, however, the system still

cannot be used in broadcasting. At this point, we are

collecting data from signed weather broadcasts to

enrich the corpus and increase translation accuracy.

“Motions,” which form the basic elements of

movements, are being recorded using Vicon’s

optical motion-capture system. Thus far, we have

captured motions for 7,000 words.

A dictionary of sign language words has been

made available to the public through the following

website to solicit opinions from a wide audience

regarding the movements and quality of the CG

models: (http://cgi2.nhk.or.jp/signlanguage/index.cgi)

CG character posture is computed based on bone

structure and BVH format generally used in 3DCG.

The BVH format indicates the values for the angles

of each joint, and smooth connection between two

words is made by linear interpolation of the BVH

angular values.

Designation of BVH files and character models is

carried out using TV Program Making Language

(TVML), a program developed at NHK, while

DirectX is used for 3D rendering.

Recently, from our research results to date, we are

testing systems that have the potential for practical

use(fig.3).

The machine translation system currently being

researched still does not have sufficient translation

accuracy and speed to make it useful in

broadcasting. Thus, we created a prototype of a

video system for generating sign-language CG for

previously recorded fixed phrases inputted from

weather forecasts. Although translation of sentences

other than the fixed sentences is not possible, the

system has the potential for making accurate flash

reports on weather information. In anticipation of its

use in television broadcasts, we plan to carry out

website-based test operations (Perform a

demonstration of weather information system).

3. Science(Future Activity)

As has been pointed out also for sign language CG

systems of other countries, the unnaturalness of the

expressions in NHK’s prototype system needs to be

addressed. We are therefore carrying out research to

portray expressions naturally using CG.

And we conducted image analysis to elucidate the

causes of the unnaturalness of expressions arising

from the linear interpolation of joint angle values.

To resolve this problem, we are carrying out

research on interpolation methods that are based on

the inverse kinematics concept.

Since our final goal is to use the system for

television, we are also doing research on methods

for controlling playback speed and displaying

sign-language CG, and on methods for

synchronizing the display of sign-language CG with

the main line image and voice.

fig.3 Practical fixed phrases generator about weather information

Page 16: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

Inferring biomechanical kinematics from linguisticdata: A case study for role shift

Rosalee Wolfe, John C. McDonald, Robyn Moncrief, Souad Baowidan, Marie StumboDePaul University, Chicago IL, USA

{wolfe, jmcdonald}@cs.depaul.edu, {rkelley5, sbaowida, mstumbo}@mail.depaul.edu

Over the past two decades, researchers have made great strides in developing avatarsfor use in Deaf education (Efthimiou & Fotinea, 2007), automatic translation (Elliott,Glauert, Kennaway, & Marshall, 2000), (Filhol, 2012) interpreter training (Jamrozik,Davidson, McDonald, & Wolfe, 2010), validation of transcription (Hanke, 2010), andimproving accessibility to transportation and government services (Segouat, 2010)(Ebling, 2013) (Cox, et al., 2002). Creating lifelike, convincing motion continues to beone of the key goals of signed language synthesis research. Avatars that sign withsmooth, natural movements are easier to understand and more acceptable than thosethat move in an unnatural or robotic manner.

MotivationCurrent research efforts in sign synthesis either use libraries of motion captured signs(Awad, Courty, Duarte, Le Naour, & Gibet, 2010) or libraries of sparse key-frameanimations transcribed by artists (Delorme, Filhol, & Braffort, 2009). Entries fromlibraries are then procedurally combined to produce longer signed utterances. Anexcellent review of the current literature on sign synthesis can be found in (Courty &Gibet, 2010).

Sign synthesis based on motion capture produces outstanding natural motion. Themyriad tiny and subtle details in the data create smooth, naturally flowing movement inan avatar. However, it is difficult to maintain the same naturalness in the transitionswhen modifying the data to accommodate new utterances. The high temporal densityof captured detail that creates the beautiful movement also requires substantialresources to modify.

Applying linguistic rules to modify animation is easier with sparse sets of keys thatcorrelate well to the structure of linguistic models. Unfortunately the ease ofmodification is offset by a lack of realism in the animation. The linguistic parameterscontain no information about the subtle body movements which are not considered tobe linguistically significant, but are nonetheless required for natural motion.

The ideal system would combine the best aspects of both approaches. It would supportease of key modification while still producing natural, lifelike motion. This presentationdetails a step towards a new method that automatically layers biomechanical,sublinguistic movement under the motion dictated by linguistic data. The approach isdesigned to improve the quality of avatar motion without requiring researchers toacquire more data.

Page 17: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

ScienceThis presentation will discuss the theory of the new approach in the context ofgenerating role shifts. In a role shift, a signer uses a body turn to assume the role of aprotagonist in a constructed dialog (Lillo-Martin, 2012). From the linguistic information,an animation system can compute a global orientation that dictates the avatar’s posewhen assuming a role. Previous work (McDonald, et al., 2013) used range of motiondata to distribute the global orientation down the spinal column as local rotations, butthe timing of the transitions proved problematic. In a turn, the transition begins withthe eyes, followed by the neck, hips, spine, and shoulders. The eyes and head completetheir rotation before the remaining linkage begins movement.

A nontrivial problem occurred with the previous work because the eyes and head weredescendants of the hips in the transformational hierarchy. Whenever the hips rotated,the eyes and head rotated in concert. This induced an additional rotation on the headthat was not the intent of the animator.

A traditional approach for holding objects in a given orientation is to add lookatconstraints that apply a global rotation and ignore the transformation hierarchy. Withlookat constraints, the timing and orientation of the eyes and head were preservedthroughout the onset and duration of the role shift. Unfortunately, it proved difficult toblend between the global rotation in the lookat constraints and the local rotation usedwhen the avatar is in the narrator role. The visual result was a visible head bobble atthe end of the transition.

The transformational hierarchy also disrupted the staggered timing in the shoulders,which are supposed to remain stationary while the hips are starting their rotation. Butbecause they are descendants of the hips, they began their rotation synchronously withthe hips, defeating the attempt to stagger the timing.

FeaturesThe new system uses no lookat constraints, and all joints remain in the transformationhierarchy. In a preliminary step, the system computes the transition of each joint as aglobal orientation. It then computes compensatory motion to implement timing asanimation keys cast in local coordinates.

The present implementation is applied to a simple figure with controls to change theglobal orientation of the torso and the speed of the transition. We invite participants toa hands-on evaluation of the system at the conclusion of the presentation or any timeduring the course of the workshop.

Page 18: SLTAT 2015 programme - LIMSIstart to learn the language and communicate with deaf people. For this it was necessary a survey commissioned about sign language segregation and researches

BibliographyAwad, C., Courty, N., Duarte, K., Le Naour, T., & Gibet, S. (2010). A combined semantic and

motion capture database for real-time sign language synthesis. Intelligent Virtual Agents,, 432-438.

Courty, N., & Gibet, S. (2010). Why is the creation of a virtual signer challenging computer animation? Motion in Games, 290-300.

Cox, S., Lincoln, M., Tryggvason, J., Nakisa, M., Wells, M., Tutt, M., & Abbott, S. (2002). TESSA: a system to aid communication with deaf people. Proceedings of the fifth international ACM conference on assistive technologies (ASSETS 02) (pp. 205-212). Edinburgh, UK: ACM.

Delorme, M., Filhol, M., & Braffort, A. (2009). Animation generation process for Sign Language synthesis. International Conference on Advances in Computer-Human Interaction (ACHI '09) (pp. 386-390). Cancun, Mexico: IEEE.

Ebling, S. (2013). Evaluating a Swiss German Sign Language Avataramong the Deaf Community. Third International Symposium on Sign Language Translation and Avatar Technology (SLTAT). Chicago, IL.

Efthimiou, E., & Fotinea, S.-E. (2007). An envrionment for deaf accessibility to education content. International Conference on ICT & Accessibility, (pp. GSRT, M3. 3, id 35). Hammamet, Tunisia.

Elliott, R., Glauert, J. R., Kennaway, J. R., & Marshall, I. (2000). The development of language processing support for the ViSiCAST project. Proceedings of the fourth international ACM conference on Assistive technologies (ASSETS 2000) (pp. 101-108). Arlington, VA: ACM.

Filhol, M. (2012). Combining two synchronisation methods in a linguistic model to describe Sign Language. Gesture and Sign Language in Human-Computer Interaction and Embodied Communication, 192-203.

Hanke, T. (2010, June 14-16). An overview of the HamNoSys phonetic transcription system. Retrieved December 28, 2011, from Sign Linguistics Corpora Website: http://www.ru.nl/publish/pages/570576/slcn3_2010_hanke.pdf

Jamrozik, D. G., Davidson, M. J., McDonald, J., & Wolfe, R. (2010). Teaching Students to Decipher Fingerspelling through Context: A New Pedagogical Approach. Proceedings of the 17th National Convention Conference of Interpreter Trainers, (pp. 35-47). San Antonio, TX.

Liddell, S. (2003). Grammar, Gesture, and Meaning in American Sign Language. Cambridge, UK: Cambridge University Press.

Lillo-Martin, D. (2012). Utterance reports and constructed action. In R. Pfau, M. Steinbach, & B. Woll (Eds.), Sign Language: An International Handbook HSK 37 (pp. 365-387).

McDonald, J., Wolfe, R., Schnepp, J., Hochgesang, J., Jamrozik, D. G., Stumbo, M., & Berke, L. (2013). Toward Lifelike Animations of American Sign Language: Achieving Natural Motion from the Movement-Hold Model. Third International Symposium on Sign Language Translation and Avatar Technology (SLTAT 2013). Chicago, IL.

Segouat, J. (2010). Modélisation de la coarticulation en Langue des Signes Française pour ladiffusion automatique d'informations en gare ferroviaire à l'aide d'un signeur virtuel. Doctoral Dissertation, Université Paris Sud, Orsay, France. Retrieved February 21, 2013, from http://hal.upmc.fr/docs/00/60/21/17/PDF/these-segouat2010.pdf