Post on 12-Jun-2020
SLTAT 2015 programmeDates: April 9–10, 2015
Venue: ISCC, 20 rue Berbier du Mets, Paris, France
Day 1
Registration and welcome, 9:15 at ISCC
Coffee & biscuits served, opening at 9:45
Morning session
10:00 Invited talk on necessities, feasibility, untackled problems (Interpretis & Orange)
11:15 Talks on systems to be presented in afternoon session (motivations, features, science)
Lunch, 12:30 in the square René Le Gall
Coffee served at 13:30 in ISCC
Afternoon session, 14:00 at ISCC
System demos, coffee & biscuits during break
Evening event, 17:30 at ELDA
1. Presentation of a deaf University project by Mains Diamant
2. Aperitif served
Dinner, 20:00 at restaurant (tbd)
Day 2
Morning session, 9:30 with a 20-min coffee break
User feedback and discussion, from video interviews of the previous day
Lunch, 12:00 at the square René Le Gall
Coffee served at 13:00 in ISCC
Afternoon session, 13:30 with a 20-min coffee break
Plenary scientific discussions chaired by A. Heloir
Mot de la fin
16:30 Conclusive talk
17:00 Closing
Map and locations
1. ISCC for morning and afternoon sessions : 20 rue Berbier du Mets
2. Square René Le Gall for lunch breaks
3. ELDA for social event : 9 rue des Cordelières
Lists of systems presented
1
ProDeaf – 3D avatar based app for translation from Portuguese to Brazilian Sign Language (Libras)
M. Amorim, R. Kimura, J. Fernandes
2
A grammar knowledge based MT system for the language pair: Greek GSL‐
E. Efthimiou, S. E. Fotinea, K. Marmarellis, T. Goulas, O. Giannoutsou, D. Kouremenos‐
3
Exploring novel interaction methods for authoring sign language animations
A. Heloir, F. Nunnari
4
Augmenting EMBR virtual human animation system with MPEG-4 controls for producing ASL facial expressions
M. Huenerfauth, H. Kacorri
5
Learning sign language in a playful way with SiGame – The App
G. Tschare
6
Automatic translation system to JSL (Japanese Sign Language) about weather information
S. Umeda, M. Azuma, T. Uchida, T. Miyazaki, N. Kato, S. Inoue, N. Hiruma
7
Inferring biomechanical kinematics from linguistic data: a case study for role shift
R. Wolfe, J. C. McDonald, R. Moncrief, S. Baowidan, M. Stumbo
ProDeaf – 3D avatar based app for translation from Portuguese to
Brazilian Sign Language (Libras)
AMORIM, Marcelo*; KIMURA, Renato; FERNANDES, Juliana
(*) marcelo.amorim@gmail.com
Motivation
Worldwide, many people have some degree of hearing impairment. Only in Brazil,
according to IBGE, the Brazilian Institute of Geography and Statistics, in 2010 there
were over ten million hearing impaired people; despite of that, only few hearings
are fluent in Sign Language.
Thinking about this huge problem, was created an initiative in the Federal
University of Pernambuco, Brazil. The engine called ProDeaf is a platform for
communication between the deaf and the hearing, resulting from the need for
communication among hearing and deaf students.
Features
The Prodeaf’s engine is used in many contexts, as many are free for use and
appropriation to people who make use of Sign Language. The platform includes:
I. An app for mobile devices, which translates the text/speech from
Portuguese to Libras, besides a bilingual dictionary;
II. A website version, similar to the mobile app; a dictionary; and a
collaborative tool to register new signs;
III. Translator for websites, in order to make textual content accessible.
These solutions have different translation procedures, which may be automated or
assisted, but are always proceeded by professional translation activity, bearing
inherent linguistics and grammatical conceptions. Both automatic and assisted
translation are based on a methodology that has been designed and tested by a
multidisciplinary team composed of programmers, linguists, designers, translators
and deaf people, this team is also responsible for researching and negotiating the
software’s database, processes as well as its linguistics and grammatical
challenges, since Sign Languages are living and highly organic. Due to this work,
the platform currently has over 4,000 signs.
For the Sign Language Translation and Avatar Technology 2015, we will provide
the free app which by now can translate Portuguese do Libras. We will also show a
functional demo version of ProDeaf translating English to International Sign (IS).
Science
Through complex computer algorithms, it is possible to translate text and speech,
from Portuguese to Libras (the Brazilian Sign Language). The visual-spatial
representation is performed by 3D humanoid avatars, which are slightly distorted in
their visual design in order to highlight communicative features of Sign Languages,
such as facial expressions and handshapes.
On the research needed for the process of translation, a multidisciplinary team was
assembled, including deaf people, designers, programmers, Libras specialists,
linguists, among others. The team was focused on breaking the communication
barrier between deaf and hearing people, allowing people who don’t know Libras
start to learn the language and communicate with deaf people.
For this it was necessary a survey commissioned about sign language segregation
and researches about 3D natural human movements, besides others.
ProDeaf contributes to the state of the art of 3D technologies for sign language due
to its malleable structure. The signer avatar is fully articulated, and the translations
are done in real time and can be automated. This mechanics is made possible by
the ability of the avatar to understand XML files with configuration coordinates,
making the Libras signal according to the language parameters, namely: Hand
Setup, Guidance, Joint Point, Motion, and No Manual Features (such as facial
expression).
Segregation between the presentation layer and the structural layer of the signals
makes it possible to significantly expand the signal database, and make much
lighter visual representations, allowing its use in various scenarios.
Keywords: translation, Libras, sign language, 3D avatar.
Eleni Efthimiou1 <eleni_e@ilsp.gr>, Stavroula-Evita Fotinea1,Konstantinos Marmarellis1,2, Theodor Goulas1, Olga Giannoutsou1, Dimitris Kouremenos2
1 ILSP-Institute for Language and Speech Processing / ATHENA RC2 Electrical and Computer Engineering Dpt, National Technical University of Athens,
Athens, Greece
System demonstration proposal
A grammar knowledge based MT system for the language pair: Greek-GSL
We intend to demonstrate the current state of performance of an MT system forautomatic translation from written Greek to Greek Sign Language (GSL). The systemreceives as input parsed Greek sentences, chunked according to lexical and structuralinformation available for the Greek language and produces their translation output inthe form of structured GSL phrases, either as glossed sequences of structured signs oras HamNoSys coded structured phrases resulting from various matching rules betweenthe two languages. For synthesis at the output side, an electronic grammar of GSL isexploited in combination with appropriate information from a lexicon database. Thetranslation output feeds a SiGML driven signing avatar, so enabling visualization of theproduced GSL phrases.
Fig. 1: Basic MT system architecture
The MT system is open source and incorporates most of the widely accepted standards.It is programmed in Java to allow for quick and efficient design and development,compatibility with all system platforms, while XML technology has been utilised toprovide structured documents in a reusable format. The system exploits twodistinguished modules and four language resources repositories.
The two modules are: i) A robust parser of Modern Greek that provides structural chunks of written
utterances decorated with part-of-speech (POS) tags, which maintain major grammarproperties of the input sentences and serve as input to the transfer module.
ii) A transfer module which performs matching of grammar structures and lexical itemsbetween the languages of the translation pair.
The resources repositories exploited in the MT process are the following:a) A morphological computational lexicon of Modern Greek,b) A robust analysis grammar component of Modern Greek,c) A bilingual Modern Greek-GSL lexicon, and d) A GSL grammar component which provides a formal representation of the core
phenomena of GSL grammar.
The system may handle affirmative sentences with stative and motion predicates,placement of arguments in the signing space, variation due to classifier use,lexicalized/prosodic and structural negation and interrogation, as well as themechanisms GSL exploits for the declaration of Tense/Aspect values.The system’s performance is rather robust in respect to its core grammar. This isverified by extensive testing with test suites of sentences structured to test differentlevels of complexity and rule combinations. In parallel, the system is been constantly enriched with new features which enableextension of coverage in respect to the grammar phenomena handled. However, in the current state of the system, which is more mature than the initial one,the scientific goal is to place emphasis on optimization of the system’s grammarwriting, targeting more efficient rule definitions that prompt for wider generalizations,and thus narrow the need for rules with hapax application.
Fig. 2: System input and output screens displaying information from rule application and lexicon check
Indicative group referencesEfthimiou, E., Fotinea, S. E. and Sapountzaki, G. 2006. Processing linguistic data for GSL
structure representation. In Proceedings of the Workshop on the Representation andProcessing of Sign Languages: Lexicographic matters and didactic scenarios,Satellite Workshop to LREC-2006 Conference, 49-54.
Kouremenos, D., Fotinea, S. E., Efthimiou, E., and Ntalianis, K. 2010. A Prototype GreekText to Greek Sign Language (GSL) Conversion System. In Behaviour & InformationTechnology Journal (TBIT), (TBIT), 29, 5, 467-481, DOI:10.1080/01449290903420192.
Efthimiou, E., S.-E. Fotinea, Goulas, T., Dimou, A.-L., Kouremenos, D. 2014. FromGrammar Based MT to Post-processed SL Representations. In Universal Access in theInformation Society (UAIS) journal. Springer (to appear).
Exploring novel interaction methods for authoring signlanguage animations
System Presentation
Alexis Heloir∗
LAMIH-UMR CNRS 8201, Valenciennes, FranceSLSI group, DFKI-MMCI, Saarbrücken, Germany
alexis.heloir@univ-valenciennes.fr
Fabrizio NunnariSLSI group, DFKI-MMCI, Saarbrücken, Germany
fabrizio.nunnari@dfki.de
1. INTRODUCTIONWe are developing an online collaborative framework allow-ing Deaf individuals to author intelligible signs using a ded-icated authoring interface controlling the animation of a 3Davatar. The system that we present mainly focus on a novelarchitecture for authoring 3D animation of human figures.The user interface (UI) is assisted by novel gesture-based in-put devices. We claim that this tool can not only benefit tothe Deaf but also to the linguists by providing them with anovel kind of material consisting of intelligible sign languageanimation together with a fine-grained log of the user’s editactions.
2. MOTIVATIONCurrent signing avatar technology either relies on a symbolicrepresentation which encodes the avatar’s utterance or uses aconcatenative approach consisting of stitching together cap-tured or manually authored Sign Language (SL) clips. Inboth cases, enriching the signing avatar system requires ex-perts, either in the symbolic representation language or inMotion Capture of 3D animation. We believe that this is abig challenge to the adoption of signing avatar technologysince such experts are hard to find and expensive to train.We are tackling this challenge by proposing a dedicated soft-ware tool for authoring the animation of 3D avatars. We be-lieve that this attempt could not only increase the adoptionof SL avatar technology but also be benefical to the researchin SL linguistics.
For the Deaf1, such a tool, when online, has the potentialto enable multiple linguistic communities to illustrate andshare new concepts, invent new signs and develop dictio-naries. Eventually, it might also be an alternative to video
1We follow the convention of writing Deaf with a capitalized“D” to refer to members of the Deaf community [3] whouse sign language as their preferred language, whereas deafrefers to the audiological condition of not hearing.
recording, unlocking the possibility, for deaf individuals, toexpress their opinion anonymously, on the internet, usingtheir primary language. Such a tool would also put SL stud-ies back into the hands of the Deaf by allowing them todevelop large corpora of animation data. For the linguists,large corpora of intelligible animation data would be a veryvaluable research material. Firstly, linguists would have ac-cess to a concise multimodal 3D animation that would bemuch more “readable” than traditional motion capture sinceit would mostly convey this “essential” part of the anima-tion that makes the sequence understandable whereas theessence of the animation has a tendency to be buried un-der the dense stream of data generated by motion-capturesystems. Secondly, provided that the sequence of user’s editactions is logged, linguists would also have access to theactual authoring strategy adopted by the SL author. Thiscould shed new light on the user’s actual intent.
Digital character animation is a highly cross-disciplinary do-main that spans across acting, psychology, movie-making,computer animation, and programming. Not surprisingly,learning the art of traditional animation requires time, ef-fort and dedication. However, recent consumer-range tech-nology such as the one presented by Sanna et al. [4] hasproved to be capable of enabling inexperienced users au-thoring animations of human-like bodies and interactivelycontrolling physical or digital puppets. We are aiming ata similar goal for sign language animation. Our system isinnovative because not only does it allow novice users to nat-urally edit complex animations using natural input deviceslike the Kinect2 and the Leap-Motion3, but it also allowsthem to switch seamlessly between traditional space-timeconstraint edit and interactive performance capture record-ing. The use of low-cost devices, available off-the-shelf, al-lows and encourages the adoption of the system by a largenumber of users.
In the following, we briefly describe the system which willbe demonstrated during the SLTAT workshop. It consistsof a 3D animation authoring platform that supports gestureinput and that is capable to record user’s facial expressions,handshapes and arm movement.
2http://www.microsoft.com/en-us/kinectforwindows/ - 15Jan. 20153https://www.leapmotion.com/ - 15 Jan. 2015
Kinect
PerformanceCapture
PerformanceCapture
DataSimplification
DataSimplification
ManualEdit
ManualEdit
Cont
rol R
igCo
ntro
l Rig
Leap Motion
User
Straight-ahead
Pose-to-pose
AnimationAnimation
Figure 1: The authoring pipeline overview.
3. PROPOSED DESIGNThe contribution of the architecture we propose consists ofendowing the user with the capability to seamlessly recordand edit character animation using both pose-to-pose anima-tion and performance capture. The framework is depictedin Fig. 1.
In performance capture mode, the animator drives the an-imation like a puppeteer. The motion of both hands andface of the animator is tracked in real time and his live per-formance drives the animation of the avatar that is beingrecorded at a rate of 25 frames per seconds. The facial ani-mation is recorded by the Kinect. The motion of hands andfingers is recorded by the Leap Motion. The frame density islater reduced in order to allow a latter manual pose-to-poseedit.
In pose-to-pose edit mode, the user controls one hand at atime using the Leap Motion. In this mode, the user handposture is not directly followed by the system, rather, it in-tegrates sequences of consecutive relative edits that consistsinto small grab and release actions on one of the character’shand. This edit mode permits a much finer and more precisepositioning of the character’s body parts (hands, head, eyes,torso, hips). The user thus uses her or his own body to applyoffsets to the key poses resulting form the Data Simplifica-tion phase. The idea is to keep a correspondence betweenthe body parts of the author and the virtual character’s.However, in contrast with the direct control of a perfor-mance capture, the author performs a “relative” control onthe character current posture. For example, the movementof an author hand from an arbitrary position is used to applyan offset to the position of a hand of the virtual character.We can call this a form of body-coincident puppeteering.
Hence, the user doesn’t see the digital character as a mir-rored interpretation of himself: the screen is not anymorea mirror, it is rather a virtual window on the 3D editingspace. Concerning the camera control, we aim at an author-ing setup where the user focus only on character postureediting without the need of controlling the camera position;past studies already demonstrated that several solutions can
be applied to successfully enable depth perception in char-acter animation, thus eliminating the need of rotating thecamera viewpoint [1].
4. RESEARCH QUESTIONThis work is an intermediate step in a global project aimingat developing a crowd-sourced sign-language animation edi-tor for the many Deaf Communities. The prototypes we im-plemented so far only run on a single computer and cannotbe accessed by multiple users over the internet. However,using these prototypes, we could conduct two users stud-ies which let us assess and validate a number of interactionmetaphors that will be implemented in an online platform.The online platform will leverage modern CSS3/HTML5user interface technologies as well as the emerging XML3D4
technology [2]. We are currently working in close collabora-tion with the XML3D developers in order to guarantee thatall the features required by our animation platform can beimplemented in XML3D.
5. REFERENCES[1] M. Kipp and Q. Nguyen. Multitouch puppetry:
creating coordinated 3D motion for an articulated arm.page 147. ACM Press, 2010.
[2] F. Klein, K. Sons, D. Rubinstein, and P. Slusallek.Xml3d and xflow: Combining declarative 3d for theweb with generic data flows. IEEE Computer Graphics& Applications (CG&A), 33(5):38–47, 2013.
[3] C. Padden and T. Humphries. Deaf in America: voicesfrom a culture. Harvard University Press, Cambridge,Mass., 1988.
[4] A. Sanna, F. Lamberti, G. Paravati, and F. D. Rocha.A kinect-based interface to animate virtual characters.Journal on Multimodal User Interfaces, 7(4):269–279,Oct. 2013.
4http://xml3d.org/ - 15 Jan. 2015
Augmenting EMBR Virtual Human Animation System with MPEG-4 Controls for Producing ASL Facial Expressions
Matt Huenerfauth Rochester Institute of Technology (RIT)
Golisano College of Computing and Information Sciences 20 Lomb Memorial Drive, Rochester, NY 14623
matt.huenerfauth@rit.edu
Hernisa Kacorri The Graduate Center, CUNY
Computer Science Ph.D. Program 365 Fifth Ave, New York, NY 10016
hkacorri@gc.cuny.edu
1. Motivations Our laboratory is investigating technology for automating the synthesis of animations of American Sign Language (ASL) that are linguistically accurate and support comprehension of information content. A major goal of this research is to make it easier for companies or organizations to add ASL content to websites and media. Currently, website owners must generally use videos of humans if they wish to provide ASL content, but videos are expensive to update when information must be modified. Further, the message cannot be generated automatically based on a user-query, which is needed for some applications. Having the ability to generate animations semi-automatically, from a script representation of sign-language sentence glosses, could increase information accessibility for many people who are deaf by making it more likely that sign language content would be provided online. Further, synthesis technology is an important final step in producing animations from the output of sign language machine translation systems, e.g. [1].
Synthesis software must make many choices when converting a plan for an ASL sentence into a final animation, including details of speed, timing, and transitional movements between signs. Specifically, in recent work, our laboratory has investigated the synthesis of syntactic ASL facial expressions, which co-occur with the signs performed on the hands. These types of facial expressions are used to convey whether a sentence: is a question, is negated in meaning, has a topic phrase at the beginning, etc. In fact, linguists have described how a sequence of signs performed on the hands can have different meanings, depending on the syntactic facial expression that performed [8]. For instance, an ASL sentence like “MARY VISIT PARIS” (English: Mary is visiting Paris.) can be negated in meaning with the addition of a Negation facial expression during the final verb phrase. As another example, it can be converted into a Yes/No question (English: Is Mary visiting Paris?) with the performance of a Yes-No-Question facial expression during the sentence.
The timing, intensity, and other variations in the performance of ASL facial expressions depend upon the length of the phrase when it co-occurs (the sequence of signs), the location of particular words during the sentence (e.g., intensity of Negation facial expression peaks during the sign NOT), and other factors [8]. Thus, it is insufficient for a synthesis system to merely play a fixed facial recording during all sentences of a particular syntactic type. So, we are studying methods for planning the timing and intensity of facial expressions, based upon the specific words in the sentence. As surveyed in [7], several SLTAT community researchers have conducted research on facial expression synthesis, e.g., interrogative questions with co-occurrence of affect [11], using clustering techniques to produce facial expressions during specific words [10], the use of motion-capture data for face animation [2], among others.
2. Features To study facial expressions for animation synthesis, we needed an animation platform with a specific set of features: 1. The platform should provide a user-interface for specifying
the movements of the character so that new signs and facial expressions can be constructed by fluent ASL signers on our research team. These animations become part of our project’s lexicon, enabling us to produce example sentences so that our animations can be tested in studies with ASL signers.
2. The virtual human platform must include the ability to specify animations of hand, arm, and body movements (ideally, with inverse kinematics and timing controls), so that we can rapidly produce those elements of the animation.
3. The platform must provide sufficiently detailed face controls so that we can create subtle variations in the face and head pose, to enable us to experiment with variations in the movement and timing of elements of the face.
4. The platform should allow the face to be controlled using a standard parameterization technique so that we can use face movement data from human signers to animate the character.
Fig. 1: EMBR user-interface for controlling virtual human
The open-source EMBR animation platform [3] already supported features 1 and 2, listed above. To provide features 3 and 4, we selected the character “Max” from this platform, and we enhanced the system with a set of face and head movement controls, following the MPEG-4 Facial Action Parameter standard [5]. Specifically, we added controls for the nose, eyes, and eyebrows of the character. While the mouth is used for some ASL linguistic facial expressions, the upper face is essential for syntactic facial expressions [8]. The MPEG-4 standard was chosen because it is a well-defined face control formalism (thereby making any models we investigate of face animation more readily applicable to other virtual human animation research), and there are various software libraries available, e.g., [9], for automatically analyzing human face movements in a video, to produce a stream of MPEG-4 parameter values representing face movements over time.
At the workshop, we will demonstrate our system for constructing facial expressions using MPEG4 controls in EMBR; we will also
show animation examples synthesized by the system. Specifically, our laboratory has implemented the following:
• We added facial morphs to the system for each MPEG-4 facial action parameter: Each of these parameters specifies vertical or horizontal displacements of landmark points on the human face, normalized by the facial proportions of the individual person’s face. Thus, the morph controls for the Max character’s face had to be calibrated to ensure that they numerically followed the MPEG-4 standard.
• Prior researchers have described how wrinkles that form on the forehead of a virtual human are essential to the perception of eyebrow raising in ASL animations [11]. While computer graphics researchers study various methods for face wrinkling [0], to work within the EMBR engine, we increased the granularity of the wireframe mesh of the character’s face where natural wrinkles should appear, and wrinkle formation was incorporated into the facial morphs.
• To aid in the perception of wrinkles and face movements, a lighting scheme was designed for the character (see Fig. 2).
• Our laboratory implemented software to adapt MPEG-4 recordings of a human face movement to EMBRscript, the script language supported by the EMBR platform. In this way, our laboratory can directly drive the movements of our virtual human from recordings of human ASL signers. The MPEG-4 face recordings can be produced by a variety of commercial or research face-tracking software; our laboratory has used the Visage Face Tracker [9].
a. b. Fig 2: (a) forehead with eyebrows raised before the addition of
MPEG-4 controls, facial mesh with wrinkling, and lighting enhancements, (b) eyebrows raised in our current system
3. Science The primary goal of our implementation work has been to support our scientific agenda: to investigate models of the timing and intensity of syntactic facial expressions in ASL. As part of this work, it will be necessary for us to periodically conduct user-based studies with native ASL signers evaluating the quality of animations that we have synthesized.
As an initial test of our ability to synthesize animations of ASL with facial expressions using this new animation platform, we conducted a pilot test with 18 native ASL signers who viewed animations that were generated by our new system: full details appear in [6]. The animations displayed in the study consisted of short stories with Yes-No Question, WH-Question, and Negation facial expressions, based upon stimuli that we released to the research community in [4]. The participants answered scalar-response questions about the animation quality and comprehension questions about their information content.
In this pilot study, the participants saw animations that were driven by the recording of a human; we previously released this MPEG-4 data recording of a human ASL signer performing syntactic facial expressions in [4]. The hand movements were synthesized based on our project’s animation dictionary, which native signers in the lab have been constructing using the EMBR user-interface tool. Because the data-driven animations contained
facial expressions and head movement, they utilized the skin-wrinkling, lighting design, and MPEG-4 controls of our new animation system. As compared to animations without facial expression shown as a lower baseline, participants reported that they noticed the facial expressions in the data-driven animations, and their comprehension scores were higher [6]. While this pilot study was just an initial test of the system, these results suggested that our laboratory will be able to use this augmented animation system for evaluating our on-going research on designing new methods for automatically synthesizing syntactic facial expressions of ASL. In future work, we intend to produce models for synthesizing facial expressions, instead of simply replaying human recordings.
4. ACKNOWLEDGMENTS This material is based upon work supported by the National Science Foundation under awards 1506786, 1462280, & 1065009. We thank Andy Cocksey and Alexis Heloir for their assistance.
5. REFERENCES [0] Bando, Y., Kuratate, T., Nishita, T. 2002. A simple method
for modeling wrinkles on human skin. In Proceedings of Pacific Graphics ‘02.
[1] Ebling, S., Way, A., Volk, M., Naskar, S.K. (2011). Combining Semantic and Syntactic Generalization in Example-Based Machine Translation. In: M.L. Forcada, H. Depraetere, V. Vandeghinste (eds.), Proc. 15th Conf of the European Association for Machine Translation, p. 209-216.
[2] Gibet, S., Courty, N., Duarte, K., Naour, T.L. 2011. The SignCom system for data-driven animation of interactive virtual signers: methodology and evaluation. ACM Transactions on Interactive Intelligent Sys (TiiS), 1(1), 6.
[3] Heloir. A, Nguyen, Q., and Kipp, M. 2011. Signing Avatars: a Feasibility Study. 2nd Int’l Workshop on Sign Language Translation and Avatar Technology (SLTAT).
[4] Huenerfauth, M., Kacorri, H. 2014. Release of experimental stimuli and questions for evaluating facial expressions in animations of American Sign Language. Workshop on the Representation & Processing of Signed Languages, LREC’14.
[5] ISO/IECIS14496-2Visual, 1999. [6] Kacorri, H., Huenerfauth, M. 2015. Comparison of Finite-
Repertoire and Data-Driven Facial Expressions for Sign Language Avatars. Universal Access in Human-Computer Interaction. Lecture Notes in Computer Science. Switzerland: Springer International Publishing.
[7] Kacorri, H. 2015. TR-2015001: A Survey and Critique of Facial Expression Synthesis in Sign Language Animation. Computer Science Technical Reports. Paper 403.
[8] Neidle, C., D. Kegl, D. MacLaughlin, B. Bahan, and R.G. Lee. 2000. The syntax of ASL: functional categories and hierarchical structure. Cambridge: MIT Press.
[9] Pejsa, T., and Pandzic, I. S. 2009. Architecture of an animation system for human characters. In. 10th Int’l Conf on Telecommunications (ConTEL) (pp. 171-176). IEEE.
[10] Schmidt, C., Koller, O., Ney, H., Hoyoux, T., and Piater, J. 2013. Enhancing Gloss-Based Corpora with Facial Features Using Active Appearance Models. 3rd Int’l Symposium on Sign Language Translation and Avatar Technology (SLTAT).
[11] Wolfe, R., Cook, P., McDonald, J. C., and Schnepp, J. 2011. Linguistics as structure in computer animation: Toward a more effective synthesis of brow motion in American Sign Language. Sign Language & Linguistics, 14(1), 179-199.
S i gnT ime GmbH
Schottenring 33 • A-1010 Wien • office@signtime.tv IBAN: AT063258500003425709 BIC: RLNWATWW0BG
ATU 63863019 • Handelsgericht Wien • FN 304045a
Learning sign language in a playful way with SiGame – The App
SiGame is the world´s first game app for sign language. Not a human being but an artificial character -
the avatar SiMAX - signs and guides the user through the app. He acts as game partner and teacher at
the same time and guarantees pleasurable language acquisition for deaf and hearing people. The fictional
character was specially developed for this app - this is also a world first.
SiGame is based on an avatar developed for SiMAX, a semi-automatic translation machine for sign
language. The software is operated by a human translator who only needs to adjust translations
suggested by a “learning” database. The resulting translation is then signed by an avatar and delivered
as video. Thanks to a powerful 3D software the avatar is able to display facial expressions – especially
those which are connected with grammatical meanings in sign language – a feature that is unique. It
furthermore displays emotions and body language – all in a very smooth way, significantly advancing the
quality of existing avatar based systems. SiMAX is currently at a prototype stage and therefore not yet
open for testing. The first application of the avatar is SiGame.
1. Motivations
You can play SiGame in different sign languages such as American Sign Language, International Sign and
German Sign Language. For the first time, deaf people are offered the opportunity to play a game in their
mother tongue. Additionally, SiGame enables learning a different sign language in order to connect
internationally or preparing for holidays or a congress abroad.
The game addresses everyone: Hearing and deaf, young and old, game lovers and beginners. With the
game, people working in social services can easily acquire additional skills on the one hand, and children
are pedagogically stimulated by the simultaneous training of the left and right brain hemispheres on the
other. Young people can reasonably indulge in their passion for games and older people improve their
mental fitness. In contrast to other serious games, SiGame convinces with its easy use, so that children
and older people will not have any problems and easily find their way within the app. Due to the fact that
SiGame addresses everyone it also works as a kind of “bridge” between the world of the deaf and the
world of the hearing as it is a very easy way not only to get in touch with sign language but also to learn
it.
2. Features
With SiGame the users can train their knowledge of signs by playing memory, a quiz or “one against
one”. By connecting to their facebook-account users can compare their scores with their friends. SiGame
also includes a training programme for learning signs with a learning curve which enables the user to
check his progress. Of course SiGame also has a kind of “dictionary” for signs where users can look up
S i gnT ime GmbH
Schottenring 33 • A-1010 Wien • office@signtime.tv IBAN: AT063258500003425709 BIC: RLNWATWW0BG
ATU 63863019 • Handelsgericht Wien • FN 304045a
specific signs. Besides the three different sign languages the user can choose between the written
languages English, German, French and Spanish.
SiGame has been launched in the end of 2014/beginning of 2015 in the Apple App Stores and Google
Play Store. SiGame will be open to test at the workshop. We will provide a QR-code so every participant
can download it on its mobile device including the basic package of signs.
3. Research
The design of the avatar was developed in cooperation with deaf persons. The challenge was to find a
design that avoids the “uncanny valley” effect - when features look and move almost, but not exactly,
like natural beings, it causes a response of revulsion among some observers. The aim was to design an
appealing character everybody can connect with.
Great emphasis was laid on the choice of the signs for SiGame e. g. which subject areas should be
covered by the signs, so that they are interesting and relevant to the daily life of as many users as
possible. The process of choosing the signs and according glosses could be further improved e. g. in a
possible follow-up project as this was limited in this project by the available budget and time. Many signs
are not gender sensitive for example in Austrian sign language the sign for “boy” refers through its
iconicity to a small horn and the sign for “girl” refers to an ear ring. Therefore every sign chosen for
SiGame had to undergo a “gender check”. Research was also done in teaching methodology for an
educational game in sign language as existing solutions for educational games in spoken language cannot
just be transferred to educational games in sign language. The logical structure and also the design of
the game have been tested by a representative group of deaf and hearing persons.
Concerning the technical features of SiGame it was necessary to test out how signs can be displayed on
smartphones in the best possible way so that they can be understood by the users. Therefore versions
were tested different in sizes of signs, colours and speed of signing. Also different possibilities for the
users to influence these settings have been tested.
www.sigame-app.com
Sign Time GmbH
Dr. Georg Tschare, CEO
Schottenring 33
1010 Vienna
Austria
Phone: +43 (0)660 / 800 10 12
E-Mail: georg.tschare@signtime.tv
Automatic Translation System to JSL (Japanese Sign Language) about Weather Information
Shuichi UMEDA†, Makiko AZUMA, Tsubasa UCHIDA, Taro MIYAZAKI Naoto KATO, Seiki INOUE , and Nobuyuki HIRUMA
NHK (Japan Broadcasting Corporation) Tokyo, Japan
Key words: JSL(Japanese Sign Language) , Weather Information, Motion Capture, JSL Corpus
1. Motivation
NHK is Japan’s sole public broadcaster. In
response to social demands for barrier-free delivery
of information, NHK has been promoting the use of
sign language in television. The percentage,
however, of signed broadcasting programs is still
low (fig.1).
fig.1 Japanese daily sign language news program
Since it is difficult to keep sign-language
interpreters on stand-by for 24 hours, including for
late-night and early-morning programs, NHK aims
to use sign-language CG translation systems in
broadcasting emergency weather information, as the
first step in expanding our sign-language
broadcasting services.
2. Features
NHK has its own laboratory for research on
broadcasting technologies, and has been involved
mainly in research on machine translation from
Japanese to Japanese Sign Language (JSL) (fig.2).
Aside from the difference in the word order
between Japanese and JSL, grammatical structure
for JSL has not been elucidated. Thus, rule-based
translation methods, such as those used in the
ATLAS Project, could not be used in translating
fig.2 Prototype of machine translation system
E-mail: † umeda.s-hg@nhk.or.jp
JSL. NHK has therefore been carrying out research
on machine translation through trials using methods
that combine example-based and statistical
translation, based on a sign language corpus.
Limiting the corpus to sentences related to weather,
we have so far collected around 90,000 sentences.
Since translation accuracy is still insufficient and
translation takes time, however, the system still
cannot be used in broadcasting. At this point, we are
collecting data from signed weather broadcasts to
enrich the corpus and increase translation accuracy.
“Motions,” which form the basic elements of
movements, are being recorded using Vicon’s
optical motion-capture system. Thus far, we have
captured motions for 7,000 words.
A dictionary of sign language words has been
made available to the public through the following
website to solicit opinions from a wide audience
regarding the movements and quality of the CG
models: (http://cgi2.nhk.or.jp/signlanguage/index.cgi)
CG character posture is computed based on bone
structure and BVH format generally used in 3DCG.
The BVH format indicates the values for the angles
of each joint, and smooth connection between two
words is made by linear interpolation of the BVH
angular values.
Designation of BVH files and character models is
carried out using TV Program Making Language
(TVML), a program developed at NHK, while
DirectX is used for 3D rendering.
Recently, from our research results to date, we are
testing systems that have the potential for practical
use(fig.3).
The machine translation system currently being
researched still does not have sufficient translation
accuracy and speed to make it useful in
broadcasting. Thus, we created a prototype of a
video system for generating sign-language CG for
previously recorded fixed phrases inputted from
weather forecasts. Although translation of sentences
other than the fixed sentences is not possible, the
system has the potential for making accurate flash
reports on weather information. In anticipation of its
use in television broadcasts, we plan to carry out
website-based test operations (Perform a
demonstration of weather information system).
3. Science(Future Activity)
As has been pointed out also for sign language CG
systems of other countries, the unnaturalness of the
expressions in NHK’s prototype system needs to be
addressed. We are therefore carrying out research to
portray expressions naturally using CG.
And we conducted image analysis to elucidate the
causes of the unnaturalness of expressions arising
from the linear interpolation of joint angle values.
To resolve this problem, we are carrying out
research on interpolation methods that are based on
the inverse kinematics concept.
Since our final goal is to use the system for
television, we are also doing research on methods
for controlling playback speed and displaying
sign-language CG, and on methods for
synchronizing the display of sign-language CG with
the main line image and voice.
fig.3 Practical fixed phrases generator about weather information
Inferring biomechanical kinematics from linguisticdata: A case study for role shift
Rosalee Wolfe, John C. McDonald, Robyn Moncrief, Souad Baowidan, Marie StumboDePaul University, Chicago IL, USA
{wolfe, jmcdonald}@cs.depaul.edu, {rkelley5, sbaowida, mstumbo}@mail.depaul.edu
Over the past two decades, researchers have made great strides in developing avatarsfor use in Deaf education (Efthimiou & Fotinea, 2007), automatic translation (Elliott,Glauert, Kennaway, & Marshall, 2000), (Filhol, 2012) interpreter training (Jamrozik,Davidson, McDonald, & Wolfe, 2010), validation of transcription (Hanke, 2010), andimproving accessibility to transportation and government services (Segouat, 2010)(Ebling, 2013) (Cox, et al., 2002). Creating lifelike, convincing motion continues to beone of the key goals of signed language synthesis research. Avatars that sign withsmooth, natural movements are easier to understand and more acceptable than thosethat move in an unnatural or robotic manner.
MotivationCurrent research efforts in sign synthesis either use libraries of motion captured signs(Awad, Courty, Duarte, Le Naour, & Gibet, 2010) or libraries of sparse key-frameanimations transcribed by artists (Delorme, Filhol, & Braffort, 2009). Entries fromlibraries are then procedurally combined to produce longer signed utterances. Anexcellent review of the current literature on sign synthesis can be found in (Courty &Gibet, 2010).
Sign synthesis based on motion capture produces outstanding natural motion. Themyriad tiny and subtle details in the data create smooth, naturally flowing movement inan avatar. However, it is difficult to maintain the same naturalness in the transitionswhen modifying the data to accommodate new utterances. The high temporal densityof captured detail that creates the beautiful movement also requires substantialresources to modify.
Applying linguistic rules to modify animation is easier with sparse sets of keys thatcorrelate well to the structure of linguistic models. Unfortunately the ease ofmodification is offset by a lack of realism in the animation. The linguistic parameterscontain no information about the subtle body movements which are not considered tobe linguistically significant, but are nonetheless required for natural motion.
The ideal system would combine the best aspects of both approaches. It would supportease of key modification while still producing natural, lifelike motion. This presentationdetails a step towards a new method that automatically layers biomechanical,sublinguistic movement under the motion dictated by linguistic data. The approach isdesigned to improve the quality of avatar motion without requiring researchers toacquire more data.
ScienceThis presentation will discuss the theory of the new approach in the context ofgenerating role shifts. In a role shift, a signer uses a body turn to assume the role of aprotagonist in a constructed dialog (Lillo-Martin, 2012). From the linguistic information,an animation system can compute a global orientation that dictates the avatar’s posewhen assuming a role. Previous work (McDonald, et al., 2013) used range of motiondata to distribute the global orientation down the spinal column as local rotations, butthe timing of the transitions proved problematic. In a turn, the transition begins withthe eyes, followed by the neck, hips, spine, and shoulders. The eyes and head completetheir rotation before the remaining linkage begins movement.
A nontrivial problem occurred with the previous work because the eyes and head weredescendants of the hips in the transformational hierarchy. Whenever the hips rotated,the eyes and head rotated in concert. This induced an additional rotation on the headthat was not the intent of the animator.
A traditional approach for holding objects in a given orientation is to add lookatconstraints that apply a global rotation and ignore the transformation hierarchy. Withlookat constraints, the timing and orientation of the eyes and head were preservedthroughout the onset and duration of the role shift. Unfortunately, it proved difficult toblend between the global rotation in the lookat constraints and the local rotation usedwhen the avatar is in the narrator role. The visual result was a visible head bobble atthe end of the transition.
The transformational hierarchy also disrupted the staggered timing in the shoulders,which are supposed to remain stationary while the hips are starting their rotation. Butbecause they are descendants of the hips, they began their rotation synchronously withthe hips, defeating the attempt to stagger the timing.
FeaturesThe new system uses no lookat constraints, and all joints remain in the transformationhierarchy. In a preliminary step, the system computes the transition of each joint as aglobal orientation. It then computes compensatory motion to implement timing asanimation keys cast in local coordinates.
The present implementation is applied to a simple figure with controls to change theglobal orientation of the torso and the speed of the transition. We invite participants toa hands-on evaluation of the system at the conclusion of the presentation or any timeduring the course of the workshop.
BibliographyAwad, C., Courty, N., Duarte, K., Le Naour, T., & Gibet, S. (2010). A combined semantic and
motion capture database for real-time sign language synthesis. Intelligent Virtual Agents,, 432-438.
Courty, N., & Gibet, S. (2010). Why is the creation of a virtual signer challenging computer animation? Motion in Games, 290-300.
Cox, S., Lincoln, M., Tryggvason, J., Nakisa, M., Wells, M., Tutt, M., & Abbott, S. (2002). TESSA: a system to aid communication with deaf people. Proceedings of the fifth international ACM conference on assistive technologies (ASSETS 02) (pp. 205-212). Edinburgh, UK: ACM.
Delorme, M., Filhol, M., & Braffort, A. (2009). Animation generation process for Sign Language synthesis. International Conference on Advances in Computer-Human Interaction (ACHI '09) (pp. 386-390). Cancun, Mexico: IEEE.
Ebling, S. (2013). Evaluating a Swiss German Sign Language Avataramong the Deaf Community. Third International Symposium on Sign Language Translation and Avatar Technology (SLTAT). Chicago, IL.
Efthimiou, E., & Fotinea, S.-E. (2007). An envrionment for deaf accessibility to education content. International Conference on ICT & Accessibility, (pp. GSRT, M3. 3, id 35). Hammamet, Tunisia.
Elliott, R., Glauert, J. R., Kennaway, J. R., & Marshall, I. (2000). The development of language processing support for the ViSiCAST project. Proceedings of the fourth international ACM conference on Assistive technologies (ASSETS 2000) (pp. 101-108). Arlington, VA: ACM.
Filhol, M. (2012). Combining two synchronisation methods in a linguistic model to describe Sign Language. Gesture and Sign Language in Human-Computer Interaction and Embodied Communication, 192-203.
Hanke, T. (2010, June 14-16). An overview of the HamNoSys phonetic transcription system. Retrieved December 28, 2011, from Sign Linguistics Corpora Website: http://www.ru.nl/publish/pages/570576/slcn3_2010_hanke.pdf
Jamrozik, D. G., Davidson, M. J., McDonald, J., & Wolfe, R. (2010). Teaching Students to Decipher Fingerspelling through Context: A New Pedagogical Approach. Proceedings of the 17th National Convention Conference of Interpreter Trainers, (pp. 35-47). San Antonio, TX.
Liddell, S. (2003). Grammar, Gesture, and Meaning in American Sign Language. Cambridge, UK: Cambridge University Press.
Lillo-Martin, D. (2012). Utterance reports and constructed action. In R. Pfau, M. Steinbach, & B. Woll (Eds.), Sign Language: An International Handbook HSK 37 (pp. 365-387).
McDonald, J., Wolfe, R., Schnepp, J., Hochgesang, J., Jamrozik, D. G., Stumbo, M., & Berke, L. (2013). Toward Lifelike Animations of American Sign Language: Achieving Natural Motion from the Movement-Hold Model. Third International Symposium on Sign Language Translation and Avatar Technology (SLTAT 2013). Chicago, IL.
Segouat, J. (2010). Modélisation de la coarticulation en Langue des Signes Française pour ladiffusion automatique d'informations en gare ferroviaire à l'aide d'un signeur virtuel. Doctoral Dissertation, Université Paris Sud, Orsay, France. Retrieved February 21, 2013, from http://hal.upmc.fr/docs/00/60/21/17/PDF/these-segouat2010.pdf