Smil for Mms

71
Viewing Multimedia Messages with 3GPP SMIL in a Size Driven Mobile Terminal ASMO SOINIO Master of Science Thesis Supervisors: Professor Kaisa Sere (ÅA) and M.Sc. Reijo Siira (NMP) Software Engineering Laboratory Faculty of Chemical Engineering Åbo Akademi University August 2004

Transcript of Smil for Mms

Page 1: Smil for Mms

Viewing Multimedia Messages with 3GPP SMIL in a Size Driven Mobile Terminal

ASMO SOINIO

Master of Science Thesis Supervisors: Professor Kaisa Sere (ÅA) and M.Sc. Reijo Siira (NMP)

Software Engineering Laboratory Faculty of Chemical Engineering

Åbo Akademi University August 2004

Page 2: Smil for Mms

ii

ABSTRACT

Mobile messaging has become a huge business all around the world. Currently, the richness of multimedia messages is limited by the language used for describing media timing and interactions, for which Synchronized Multimedia Integration Language (SMIL) is a de facto standard. A proposed, richer, new SMIL version for multimedia messages is the Third Generation Partnership Program (3GPP) SMIL.

This thesis studies the use of 3GPP SMIL for scene description in multimedia messages. Nokia Series 40 mobile platform is used as the target.

Multimedia Messaging System (MMS), SMIL, and the Series 40 platform are presented in sufficient detail. Then the problems that may arise are discussed and solutions presented.

The conclusion is that 3GPP SMIL is a feasible standard, and can be utilized on a Series 40 mobile phone. The 3GPP SMIL profile is much more complex than the SMIL profile currently used in multimedia messages, and provides a richer user experience.

Interoperability will be an issue, especially for messages sent from a PC. Small screen size of the terminal and the smaller processor and memory capacity will cause problems. User interface design for a 3GPP SMIL viewer is non-trivial, because there are so many actions the user should be able to take. A user interface that supports all of the features is presented.

Keywords: Synchronized Multimedia Integration Language (SMIL), Multimedia Messaging Service (MMS), mobile phones

Page 3: Smil for Mms

iii

REFERAT

Mobila meddelande har blivit en stor business runt hela världen. Numera är mångfalden hos multimediameddelanden begränsat av språket som används för att beskriva växelverkan mellan och synkronisering av medierna. Synchronized Multimedia Integration Language (SMIL) har blivit industristandarden för detta ändamål och Third Generation Partnership Program (3GPP) SMIL är ett förslag för kommande versionen av SMIL för multimediameddelande.

Detta diplomarbete undersöker användningen av 3GPP SMIL för att beskriva meddelanden. Som målplattform används Nokia Series 40.

Multimediameddelandeservice, SMIL och Series 40 plattformen beskrivs i tillräcklig detalj. Sedan diskuteras problem som kan uppstå vid implementering av en 3GPP SMIL -presenterare, samt lösningar till dessa.

Arbetets slutsats är att 3GPP SMIL är en utförbar standard, och kan utnyttjas i en Series 40 mobiltelefon. 3GPP SMIL-profilen är betydligt mer omfattande än den som nuförtiden används i multimediameddelanden, och tillför en allt rikare användarerfarenhet.

Samverkan mellan olika slags terminaler kommer att orsaka problem, speciellt med meddelanden som skickas från en persondator, för att då har den mottagande terminalen mycket mindre skärm samt mindre processor- och minneskapacitet.

Användargränssnittsdesign för en 3GPP SMIL -presenterare är utmanande för att det finns så många operationer användaren måste kunna utföra. Ett användargränssnitt som stöder alla funktioner presenteras.

Sökord: Synchronized Multimedia Integration Language (SMIL), mobiltelefon, multimediameddelande

Page 4: Smil for Mms

iv

CONTENTS

ABSTRACT........................................................................................................... II

REFERAT ............................................................................................................III

CONTENTS ......................................................................................................... IV

PREFACE............................................................................................................. VI

ABBREVIATIONS.............................................................................................VII

1 INTRODUCTION ..........................................................................................1

2 MULTIMEDIA MESSAGING SERVICE...................................................2

2.1 FROM SMS TO MMS – VIA EMS? ............................................................2 2.2 THE FEATURES OF MMS............................................................................4 2.3 MMS ARCHITECTURE AND MULTIMEDIA MESSAGE DELIVERY .................5 2.4 STRUCTURE OF A MULTIMEDIA MESSAGE .................................................7 2.5 USE CASES FOR MMS................................................................................8

3 SMIL ..............................................................................................................11

3.1 HISTORY ..................................................................................................11 3.2 SYNTAX AND STRUCTURE ........................................................................11 3.3 FUNCTIONAL AREAS AND MODULES.........................................................12 3.4 SPATIAL DESCRIPTION .............................................................................13 3.5 TEMPORAL DESCRIPTION .........................................................................15 3.6 OTHER SMIL FEATURES..........................................................................23 3.7 ACTUAL SMIL PROFILES .........................................................................27

4 MMS VERSIONS .........................................................................................30

4.1 MMS WITHOUT SCENE DESCRIPTION .......................................................30 4.2 OMA MMS.............................................................................................31 4.3 3GPP MMS.............................................................................................33

5 SERIES 40 PLATFORM .............................................................................36

5.1 USER INTERFACE .....................................................................................36 5.2 MEMORY AND PROCESSOR.......................................................................39 5.3 SOFTWARE...............................................................................................40

6 DEALING WITH TERMINAL CONSTRAINTS ....................................42

6.1 DISPLAY SIZE...........................................................................................42 6.2 PROCESSING POWER.................................................................................45 6.3 MEMORY .................................................................................................48

7 USER INTERFACE DESIGN.....................................................................49

7.1 AN ADVANCED USER INTERFACE .............................................................49 7.2 SIMPLIFYING THE USER INTERFACE..........................................................52 7.3 AN ALTERNATIVE TO MEDIA SCROLLING..................................................53

Page 5: Smil for Mms

v

8 DISCUSSION................................................................................................55

9 SUMMARY ...................................................................................................57

10 REFERENCES .............................................................................................58

A SVENSK SAMMANFATTNING................................................................60

A.1 INTRODUKTION ........................................................................................60 A.2 MULTIMEDIAMEDDELANDESERVICE ........................................................60 A.3 SMIL.......................................................................................................61 A.4 3GPP MULTIMEDIAMEDDELANDE............................................................63 A.5 SERIES 40 PLATTFORM.............................................................................63 A.6 3GPP SMIL PÅ EN SERIES 40 TERMINAL ................................................63 A.7 ANVÄNDARGRÄNSSNITTSDESIGN.............................................................64 A.8 DISKUSSION.............................................................................................64

Page 6: Smil for Mms

vi

PREFACE

This work is dedicated to my Päivi; for making me whole, for always being by my side, and for bringing light to the occasional hours of darkness.

This thesis was written during 2003-2004 for Nokia Mobile Phones. It has been

preliminary study for a software project, and thus I have been able to work full-time on the thesis. I would like to give my warmest thanks for this opportunity, especially for my bosses Teija Pääkkönen and Reijo Siira. The latter acted as the supervisor of the thesis and helped a lot with finding a proper subject, forming the structure of this work, and with proofreading. Thank you!

My supervisor from the Åbo Akademi University’s side has been Kaisa Sere. I thank her for advice and comments. Jaakko Arvilommi was of great help with proofreading, and Thomas “Toma” Andersson helped with guiding me through the mysteries of the Swedish language.

I would still like to send my general greetings to (in alphabetical order): Anna, Hese, Honkkari, Härtsi, isä, Jenki, Jesse, Keso, Leski, Marcus, Olavi, Osku, Paksa, Pauski, Pete, Tanja, Turku Terror, the good people of DaTe and äiti.

Salo, August 13th (a Friday), 2004

Asmo Soinio

Page 7: Smil for Mms

vii

ABBREVIATIONS

3GPP Third Generation Partnership Program

CDMA Code Division Multiple Access

CSD Circuit Switched Data

DSP Digital Signal Processor

EMS Enhanced Messaging Service

GPRS General Packet Radio Service

GSM Global System for Mobile Communications

HTML Hypertext Markup Language

MCU Microprocessor Control Unit

MIME Multipurpose Internet Mail Extension

MM Multimedia Message

MMS Multimedia Messaging Service

MMSC MMS Center

MMSE MMS Environment

MSISDN Mobile Station Integrated Service Digital Network Number

OMA Open Mobile Alliance

RAM Random Access Memory

SMIL Synchronized Multimedia Integration Language

SMS Short Message Service

SVG Scalable Vector Graphics

UCS Universal Character Set

URI Uniform Resource Identifier

URL Uniform Resource Locator

VAS Value Added Service

W3C World Wide Web Consortium

WAP Wireless Application Protocol

XML Extensible Markup Language

Page 8: Smil for Mms

1

1 INTRODUCTION

Mobile messaging has become a part of the daily lives of many people, and a huge business all around the world. Short Messaging Service (SMS) has been an unforeseen success, and the more advanced Multimedia Messaging Service (MMS) is catching on as well. To make multimedia messaging even richer, Synchronized Multimedia Integration Language (SMIL) was introduced to describe relationships between the medias in a message. The first version of SMIL for MMS was greatly simplified in order for it to be easy to implement on different platforms, ensuring conformance. A proposed next SMIL version for MMS is 3GPP SMIL, defined by Third Generation Partnership Program (3GPP).

The purpose of this thesis is to study whether 3GPP SMIL is a valid standard for MMS scene description. This is done by studying the problems that arise when it is implemented on a size driven mobile terminal, and by presenting solutions to those problems.

The Nokia Series 40 mobile platform is the target for this study. It was a natural choice due to the needs of the mandator of this thesis, but also because it is a widely spread platform and known for its good user interface. It is a clearly size driven platform, whereas so-called smart phones, mostly built on Symbian OS, are more feature driven, and have better capabilities. It should be noted that every mass product also fundamentally has to be price driven, thus making it infeasible to create a product that would be both small and among the most powerful.

This thesis starts with presenting MMS, its architecture, its relation to SMS, and some use cases. Chapter 3 presents the SMIL language. It introduces practically all features of SMIL, focusing on those included in the 3GPP SMIL profile. Chapter 3 can also be used, and has actually been used in our department, independently as an introduction to the SMIL language. Chapter 4 lists the current MMS versions, their media types and formats, and their relation to SMIL. Chapter 5 introduces the Series 40 platform, with focus on the features related to viewing SMIL presentations.

Chapter 6 lists terminal constraints that may cause problems when implementing a 3GPP SMIL viewer, and proposes solutions to these problems. User interface design is among these problems, and Chapter 7 is devoted to discussing a user interface that enables all of the 3GPP SMIL features on Series 40 platform.

Chapter 8 discusses the problems and solutions presented, and Chapter 9 shortly summarizes the whole thesis.

The ideas in Chapters 6 and 7 are my own, except for the alternative scrolling mechanism presented in sub-section 7.3, which was a result of the discussions during the design process of the actual SMIL viewer software specification at Nokia.

Page 9: Smil for Mms

2

2 MULTIMEDIA MESSAGING SERVICE

This chapter will give an overview of the Multimedia Messaging Service (MMS), focusing on the overall architecture and the end-user experience. Differences between the versions of MMS conformance documents and specifications are discussed in the later chapters. The Short Messaging Service (SMS) and Enhanced Message Service (EMS) are also briefly introduced, as they are the predecessors of MMS and their features were a base for defining it.

Technical details about transmission of the multimedia message (MM) data between two mobile devices or a server and a mobile device are mostly outside the scope of this thesis. For such information, please refer to the 3GPP and OMA documents listed in the references.

2.1 From SMS to MMS – via EMS?

2.1.1 Short Message Service

SMS was introduced to Global System for Mobile (GSM) networks, and commercially launched in 1992. The service allows transfer of textual messages with maximally 140 octets of data. The number of characters with different encoding standards is shown in Table 2.1. Transfer of the messages is based on store-and-forward principle, i.e. if the receiving device is not available, the message will be stored in the service center until it can be delivered or the validity period of the message expires. Most new phones also support concatenation of several messages (with a few octets less payload), but with most operators, the user has to pay for each of them separately.

SMS has been an enormous success and nowadays hundreds of billions of short messages are sent yearly. SMS has also been implemented on other network technologies like GPRS and CDMA, and messages can be sent across national borders and to networks using different network technologies than the sender’s network.

Despite the limited amount of data, there is a wide range of additional services that use SMS, for example news, email and weather services. Short messages are also widely used for public voting and feedback in TV and radio shows.

Many manufacturers have expanded the SMS standard on application-level with possibilities for richer media. Nokia’s Smart Messaging is one such expansion and it enables sending of black and white images and monophonic ringing tones. It is an open specification, but has not been adopted by other manufacturers.

Table 2.1 The amount of data in a short message

Encoding Amount of data in one short message GSM alphabet, 7 bits 160 characters 8-bit data 140 octets UCS2, 16 bits (supports Asian characters)

70 characters

Page 10: Smil for Mms

3

2.1.2 Enhanced Messaging Service

EMS is an application-level extension to SMS messaging introduced by Ericsson and standardized by 3GPP [3GPP23.040] in 1999. It adds richer media to SMS, but requires no infrastructure update from the network operators. The standardization of EMS has been evolutionary but [Le Bodic] divides it to two distinguishable steps: Basic EMS introduced in [3GPP23.040] release 99 and extended EMS introduced in [3GPP23.040] release 5 (June 2002).

Basic EMS messages can contain simple text formatting, monophonic melodies, black and white pictures, and black and white animations (up to four pictures, 16 x 16 pixels each). Such messages can be concatenated, but an enhanced element (melody, image or animation) cannot be spread over several segments. Therefore, the maximum size of an element in Basic EMS is about 140 bytes. This limits for example the melodies to a few seconds. The standard also specifies some predefined animations and melodies that can be used without sending the full representation of them.

Extended EMS breaks many limitations of the Basic EMS and has the following additional features:

• The elements can be distributed over many message segments, and the maximum size for an element is limited to 255 messages (about 34 kilobytes). In practice, elements bigger that 8 messages (about one kilobyte), should be avoided

• There is support for compression of objects, to cope with these bigger elements

• New media types: o Up to 64-colour bitmap images o Up to 64-colour bitmap animations o vCards (phonebook entries, business cards) o vCalendar data (calendar entries) o Polyphonic MIDI melodies o Vector graphics

• Text background and foreground color formatting • Hyperlinks

The first EMS-capable devices were introduced in 2001, and there are many manufacturers supporting the basic version of EMS. However, according to [Le Bodic] in 2002 there were no commercial products supporting extended EMS. In addition, the original author of EMS, Sony Ericsson, lists in their EMS guide [SNE-EMS-Dg] (published in August 2003) only the features of Basic EMS. Therefore, it can be assumed that there is no wide support, if any, for extended EMS at the time of writing this thesis.

The fact that EMS uses standard SMS as the transfer method makes the service available in almost all of today’s mobile networks, but SMS was never designed for multimedia content transfer and its bandwidth is very limited. In addition, operators do not currently have different billing schemes for EMS and SMS, which can make the costs of sending an EMS message quite high. This limits, for example, the usage of big color images that would take up tens of message segments.

Multimedia Messaging Service (MMS) on the other hand can take advantage of the high-bandwidth of GPRS and UMTS network technologies, and there have been

Page 11: Smil for Mms

4

commercial products supporting MMS since second quarter of 2002. Therefore, it is likely that commercial use of EMS will be limited to Basic EMS, at least if the introduction of MMS continues rapidly and successfully.

The rest of this chapter will focus on MMS.

2.2 The features of MMS

As the name implies, Multimedia Messaging Service (MMS) allows the transfer of truly multimedia content. To the end-user, it might not seem that different from SMS or especially EMS, but the technology behind these services is very different. MMS was designed to enable transfer of any kind of content and is not tied to a single transport technology: It can make use of the advantages of third generation mobile networks (3G), but can also be used in standard GSM networks (2G) with Circuit Switched Data (CSD). It provides interoperability with Internet electronic mail (email) and adopts many of the transport protocols and message formats that are already in use on the Internet. Such features as group sending, delivery and read-reply reports, message priorities, and message classes have been adopted from Internet messaging systems.

The following sub-section will give an overview of the quite complicated standardization work done to realize MMS and the rest of this chapter will focus on the details of MMS.

2.2.1 MMS specifications

The definition of MMS has required a great amount of work from different standardization organizations. Main responsibles have been the Third Generation Partnership Program (3GPP) and the Open Mobile Alliance (OMA). Earlier versions of the OMA documents were done by WAP Forum, which was merged into OMA in June 2002. In addition, standards by many other organizations play a role in the realization of the service, especially definitions of media formats and Internet messaging standards.

[Le Bodic] clarifies the division of work: 3GPP focuses on the high-level service requirements, architectural aspects of MMS and content formats, and OMA focuses on technical realization of MMS on the basis of WAP and Internet transport protocols. The documents defining the fundamental service are listed in Table 2.2.

Table 2.2 Documents that define MMS (not an exhaustive list)

Author Documents 3GPP • Multimedia Messaging Service (MMS); Stage 1 (TS 22.140)

[3GPP22.140] • Multimedia Messaging Service (MMS); Functional description; Stage 2

(TS 23.140) [3GPP23.140] OMA • Multimedia Messaging Service: Architecture Overview [OMA-

MMSARC] • Multimedia Messaging Service: Client Transactions [OMA-MMSCTr] • Multimedia Messaging Service: Encapsulation Protocol [OMA-

MMSEnc]

Page 12: Smil for Mms

5

2.3 MMS architecture and Multimedia Message delivery

This chapter introduces the high-level architectural elements of the MMS and the basic delivery technique. These are mostly specified in [3GPP22.140], [3GPP23.140] and [OMA-MMSArc].

The Multimedia Messaging Service Environment (MMSE), illustrated by an ellipse in Figure 2.1, includes all the MMS specific network elements under the control of a single MMS provider (often a mobile network operator). Outside it are the MMS User Agents – the mobile devices capable of viewing, composing and handling multimedia messages (MM) – and a Wired Email Client representing the interoperability with Internet Email.

MMSRelay

MMS UserAgent

MMSServer

MMS UserAgent

User Databases e.g. profiles,

subscription, HLR

External Server

Wired EMailClient

2G MobileNetwork A

3G MobileNetwork A

MMSE

MobileNetwork B

Roaming MMSUser Agent

Messagestore

Internet /IP Network

MMS VASApplications

Figure 2.1 MMS Architectural Elements [3GPP23.140]

The heart of the MMSE is the MMS Relay/Server, often referred to as the MMS Center (MMSC). It is in charge of storing and managing messages, reports, and notifications. The MMS Server provides storage services and operational support for the system and the MMS Relay transfers messages to and from other MMSCs and other messaging systems (SMS centers and email servers). The MMSC might also do content adaptation according to a MMS user agent’s capabilities (for example support for colors or screen size) or when sending a message to a legacy messaging system (i.e. SMS).

The MMS VAS Applications element is a server that acts much like a user agent, i.e. it can receive and send messages. It provides machine-to-person services to the end-users and may additionally be able to create Charging Data Records (CDR) for service specific charging. See section 2.5.2 for examples.

The delivery of a multimedia message through these elements is clarified by the following use case from [OMA-MMSArc], adapted to suite the elements in Figure 2.1. This use case example concerns person-to-person messaging between two mobile terminals.

Page 13: Smil for Mms

6

1. User activates MMS User Agent. 2. User selects or enters MM target address. 3. User composes/edits MM to be sent. 4. User requests that MM is sent. 5. MMS Client submits the message to its associated MMS Relay. 6. MMS Relay resolves the MM target address. 7. MMS Relay routes forward the MM to the target MMS Relay (included in

External Servers). 8. The MM is stored by the MMS Server associated with the target MMS

Relay. 9. Target MMS Relay sends a notification to target MMS User Agent. 10. Target MMS User Agent retrieves the MM from the MMS Server. 11. Target MMS User Agent notifies target user of new MM available. 12. Target user requests rendering of received MM. 13. Target MMS User Agent renders MM on target user’s terminal.

It might be that the user agent is configured so that steps 10 and 11 would be in

reverse order, meaning that the MM is retrieved only after the user has approved the retrieval. This might happen especially when the user is roaming (using another network than the home network) because of billing reasons.

Some or even all of the actual contents might also be transferred using streaming protocols. In streaming, data chunks are directly rendered on the recipient’s device without waiting for the whole message to be retrieved. These data chunks can then be discarded. This enables the user to start viewing a message before it has been fully transferred, and saves memory of the receiving device.

The data transfer between the user agent and the MMSC can be implemented on top of the Wireless Application Protocol (WAP). This configuration can use the whole range of wireless networks from 2G to 3G. In this architectural configuration, an additional network element – a WAP Gateway – is introduced as illustrated in Figure 2.2. The communication between the gateway and the MMSC is done using Hypertext Transfer Protocol (HTTP) over an Internet Protocol (IP) network whereas the user agent and the gateway communicate utilizing WAP protocol stack and Wireless Sessions Protocol.

MMS Relay/Server

WAP Gateway

Wireless Network

Internet /IP-network

HTTP Payload

WSP Payload

MMS User Agent

Figure 2.2 MMS data transfer over WAP [3GPP23.140]

Page 14: Smil for Mms

7

2.4 Structure of a Multimedia Message

A multimedia message is divided into headers and a message body. This is done according to [RFC-2822], which specifies a message as an envelope and contents. The envelope contains so-called header fields with information like recipient address, address of the user who sent the message, date and time when the message was sent and the message’s subject. The addressing scheme of MMS combines email [RFC-2822] and MSISDN addresses [ITU-E.164]. Examples of possible address fields in multimedia messages are shown in Figure 2.3.

To: 0401234567/TYPE=PLMN To: +358501234567/TYPE=PLMN To: Joe User <[email protected]>

Figure 2.3 Examples of MMS addresses [OMA-MMSEnc]

The RFC 2822 representation restricts the contents of a message to a single part of US-ASCII text. This format is therefore extended with the Multipurpose Internet Mail Extension (MIME) [RFC-2045, RFC-2046, RFC-2047, RFC-2387]. It allows the representation of multiple non-textual parts in one message. The types of these parts are described with a content type, also known as MIME-type, which is composed of a media type, a media subtype and optional parameters. Examples of content types are listed in Table 2.3.

Table 2.3 Examples of MIME content types

MIME content type Description image/jpeg An image with the format JPEG text/plain; charset="us-ascii" Text with the character set US-ASCII application/octet-stream A sequence of octets with an unknown

structure multipart/mixed The basic multipart subtype, a message

composed of one or more parts

Even though MIME can represent binary data, it is a textual format with human-readable field names and values. In order to decrease the size of the data sent over the mobile network, the WAP Forum has defined a straightforward translation from a MIME message to a binary format [WAP-230WSP: Section 8.5]. In this format the most often occurring character strings are replaced with a short binary value.

A multimedia message is meant to be a presentation, not just a pile of unrelated files. To achieve this, a multimedia message may contain a file that describes the graphical layout and temporal synchronization of the message’s elements, called scene description. The binary MIME-transformation includes a Start header field that defines which of the parts is that description. The model of a multimedia message is shown in Figure 2.4. The de facto standard for the format and language of a scene presentation in MMS is Synchronized Multimedia Integration Language (SMIL), but also XHTML and WML are mentioned in the specifications. SMIL can describe slideshows with timed and interactive events, and can include links to parts

Page 15: Smil for Mms

8

of the slideshow and to external resources. Chapter 3 will focus on the features and different versions of SMIL.

Figure 2.4 Model of a multimedia message [OMA-MMSEnc]

2.5 Use cases for MMS

Even though advertisement, at least nowadays in Europe, views MMS as a picture messaging service only, there is much more to it. The messages can contain many kinds of media files and can even be interactive in the sense that the user can browse through them, select links and so on. As the type of data sent via MMS is not limited, it could be possible to send arbitrary data and install a third party plug-in that handles it.

Due to the fact that MMS is such a wide and, in many senses, unlimited standard, it is extremely hard to predict the killer application, or to give even a nearly exhaustive list of the actual use cases. The ones listed here represent according to [Le Bodic] the basis that standardization organizations used when determining high-level requirements for MMS.

2.5.1 Person-to-person

The main use case for SMS, covering about 80% of the revenues by operators, is person-to-person messages. Therefore it is likely that it will also be the main application for MMS, even though creating the media used in MMS is not that trivial for an end-user.

Picture or video messaging is the use case that has been mostly advertised so far. Many modern mobile phones have a camera either built-in or available as an accessory. The first phones with cameras were only able to capture still photos, but

Page 16: Smil for Mms

9

video recording is possible in most of the newer models. These images and video clips can be instantly sent in multimedia messages to a phone supporting MMS or to an email address. A typical scenario is a subscriber on a trip taking a photo with a mobile phone, adding some text and sending it to friends back home like an instant postcard. Figure 2.5 shows an example of this case.

Figure 2.5 An example multimedia message, viewed with RealPlayer 8.

Voicemail can also be sent via MMS, as the standard supports AMR voice clips. In GSM systems voicemails can usually be stored on the operator’s servers if the device you are trying to reach is out of reach and the receiver will be notified by an SMS. With MMS there are at least two ways for voicemail: The network operator could upgrade the usual voicemail system by sending the voice mails directly to the receivers phone as MMS, possibly with some extra data about when the message was sent, who sent it and so on. The user could also record a voice memo, attach it to an MMS and send it to the receiver directly, without disturbing the receiver by making a call.

An operator could provide storage services for a user’s own content. When the user has created some content with his phone that he would like to store or share, he sends it to the operator’s server where it is stored. A web interface could be provided to the contents so that the user can easily share these with his friends by sending the Uniform Resource Locator (URL) of his contents.

2.5.2 Machine-to-person

Another category of MMS usage is the messages sent by servers to a user or a list of users. These are also referred to as value added services (VAS). In this category, a service provider provides the contents. The user often needs to subscribe to a service in order to receive the content.

Page 17: Smil for Mms

10

An example service could be a weather service; the user subscribes to a weather report for a given region by sending a multimedia message (or short message) with the region’s name to the server and it responses with a graphical weather report for the region. The server could also send these reports daily until the user cancels the service. These kinds of services will most likely be the ones that most benefit from the advanced features of 3GPP SMIL, as professional designers create the content on a workstation.

Another already implemented and moderately used service that uses MMS is the purchasing of additional applications for mobile phones. Application archives containing Java Midlets or Symbian software can be ordered with short messages and are then delivered to the user in multimedia messages. Most phones that support such additional software can also handle the installation from a multimedia message.

Figure 2.6 shows an example of an advanced advertisement presentation. When the user focuses on the buttons, the picture on the left changes and shows the phone from different viewpoints. There is also a sub-menu, which is shown when the fourth item is focused. (Used by courtesy of Grassel Guido / Nokia)

Figure 2.6 An advertisement multimedia message, viewed with InterObject SMIL Viewer

Page 18: Smil for Mms

11

3 SMIL

This chapter will introduce Synchronized Multimedia Integration Language (SMIL) [W3C-SMIL2], how and where it is used, its profiling mechanism, and differences between some of the current profiles. Every major feature of the language will be presented but the parts that are relevant for MMS will be emphasized.

SMIL, pronounced ‘smile’, is an XML [W3C-XML] based language for describing multimedia presentations. It was created for describing multimedia contents on a PC, but was also adopted into MMS for scene description.

3.1 History

Multimedia presentations have been around for a long time, but a common, open format for describing such presentations has not existed. Temporal elements such as audio and video have also increasingly been taken into use on the Web. The World Wide Web Consortium (W3C), an organization that develops common Web protocols, recognized the need for a declarative format for expressing media synchronization. A working group focusing on the design of such a language was established in 1997, and it gave rise to SMIL 1.0 [W3C-SMIL1] in 1998. Another working group was founded to continue on the subject, and SMIL 2.0 [W3C-SMIL2] became a W3C recommendation in September 2001.

Usage of SMIL on the Internet is still quite limited, even though major media players like RealPlayer and QuickTime have been supporting it almost since its advent. Multimedia components on the Web are still mostly done using proprietary players like Flash by Macromedia. SMIL has been selected as one of the scene description languages for MMS, and it seems to have gained de facto status in that application. Therefore it is likely that if phone to email messaging with MMS becomes successful then SMIL players will spread to the PCs of average users. This might also push forward the usage of SMIL on the Internet.

3.2 Syntax and structure

The idea of SMIL is to enable description of multimedia presentation where audio, video, text, and graphics are combined in a timed fashion. It is a language for describing how and when the contents are shown with the actual contents in separate files; kind of the glue that holds it all together. One of its merits is that the language can be easily authored with a simple text editor.

The two major versions of SMIL, 1.0 and 2.0, are syntactically mostly similar but the newer version adds a lot of functionality. Particularly the addition of profiling, the ability to easily define sub-sets of the full SMIL 2.0 language, is valuable in the context of mobile devices. Therefore, this work will focus solely on SMIL 2.0, and the term SMIL is used to refer to SMIL 2.0. In the few references to SMIL 1.0 the full version is stated.

To one who is familiar with Hypertext Markup Language (HTML) the syntax of SMIL will open up immediately. Like any XML based language, SMIL consists of elements that may contain attributes and other elements, forming a tree structure. The root element in SMIL is, unsurprisingly, smil and it is always present in a well-formed SMIL document. The root element may contain two sub elements, head and body, like in HTML. The head element contains header information concerning the

Page 19: Smil for Mms

12

whole document, and the body describes the actual contents. A simple SMIL document is shown in Figure 3.1. It describes the example presentation in Figure 2.5. Notice that the two lines containing the region elements are divided to two rows because of layout reasons and there is no new line character in those places in the actual SMIL document.

Figure 3.1 A simple SMIL document

Table 3.1 The functional areas of SMIL

Functional area Modules Description of modules Timing 19 Temporal descriptions and events Time Manipulations

1 Controls the rate or speed of time for media elements; fast forward, rewind etc.

Animation 2 Manipulates media element properties with time Content Control 4 Selects media contents depending on systems

capabilities Layout 4 Positioning of elements Linking 3 Hyperlinking and navigation Media Objects 7 Description of media objects and their parameters Metainformation 1 Description of data in a SMIL document Structure 1 The basic structure of SMIL documents Transitions 3 Transitions between media objects; fades, wipes

etc.

3.3 Functional areas and modules

The specification of SMIL is divided into ten major functional areas, listed in Table 3.1. Each of the areas is composed of several modules, adding up to 45 different modules. These modules consist of a number of semantically related SMIL elements, attributes, and attribute value definitions.

<smil xmlns="http://www.w3.org/2001/SMIL20/Language"> <head> <layout> <root-layout width="170" height="208" /> <region id="Text" width="100%" height="25%" left="0%" top="75%" fit="scroll" /> <region id="Image" width="100%" height="75%" left="0%" top="0%" fit="slice" /> </layout> </head> <body> <par dur="8000ms"> <img src="lapland.jpg" region="Image" /> <text src="lapland.txt" region="Text" /> </par> </body> </smil>

Page 20: Smil for Mms

13

This organization enables other parties to easily define a SMIL profile with selecting which of the modules to implement. There are dependencies between the modules, meaning that a module builds on the functionality produced by some other module and cannot be used without it. SMIL modules are also used to integrate SMIL features into other XML based languages, like in the XHTML + SMIL profile. The next three sections will discuss SMIL features in general, and section 3.6.6 will define which features are included in which SMIL profiles.

3.4 Spatial description

SMIL allows the definition of complex, layered layouts by defining regions that hold the visual contents of the presentation. These spatial definitions are always in the head-section of a SMIL document, under the layout element, and apply to the whole presentation. The regions are defined without any content, and actual visual layouts during the presentation are determined by how and when some visual components are visible in these regions.

3.4.1 Region

The base for layouts in SMIL is the region element. A region is a rectangular area that can contain a number of visual components. Regions can be placed arbitrarily, also on top of each other or partly or totally outside the visible area. The previous example in Figure 3.1 defines two regions with dimensions and position relative to the element containing the region (discussed in next section), an identifier and a fit parameter. The layout of these regions is shown in Figure 3.2.

Figure 3.2 The regions defined by the SMIL document in Figure 3.1

The fit attribute specifies how a graphical element is displayed in this region. The possible values are: fill, hidden, meet, scroll and slice. The effect of these values is demonstrated in Figure 3.3 (page 14). Note that the difference between fill and slice is that with slice the aspect ratio of the original image is preserved and parts of the image are not shown, whereas with fill the image is just scaled to fit the region. If the fit attribute is not defined, it defaults to hidden.

The identifiers, specified with the id attribute, are unique character strings used to refer to these regions (or to any other elements in a SMIL document). The attribute

Page 21: Smil for Mms

14

regionName is actually the primary reference for regions, and many regions may share the same name, resulting in that a media placed in this destination will be shown in all of the regions simultaneously (if possible). The id attribute is used only if no region with the given regionName is found.

Dimensions and position for a region may also be given as exact pixel values, like "100 px", where the unit qualifier px (pixels) can also be omitted. If the dimensions of a region are omitted, partly or fully, the region will be placed according to the limits of the parent region. Dimensions can also be defined using the right and bottom attributes of the region instead of the width and height, or any combination of these.

If two regions overlap, their order may be defined with the z-index attribute that is defined as an integer. A region with a bigger z-index value is stacked on top of a region with a smaller z-index value. The color of the parts of the region that are not filled by the media can be defined with the backgroundColor1 attribute.

The smaller image The bigger imageThe region

slice

scroll

Fit attribute Result with the bigger image

Result with the smaller image

hidden

meet

fill

Result with the bigger image

Result with the smaller image

Fit attribute

Figure 3.3 Effect of the fit attribute

1 The name for this attribute in SMIL 1.0 is background-color. A SMIL 2.0 player may also

support that name, but this use is deprecated. All attributes in SMIL 2.0 have ‘lower camel case’ names, but in SMIL 1.0 hyphenated names are used.

Page 22: Smil for Mms

15

3.4.2 Other layout elements

In the example earlier, the first child of the layout element was a root-layout element. This defines the size of the main visual element of the presentation, and in a PC viewer usually the size of the application windows used to show the presentation. There may be only one root-layout element in a SMIL document, and it has no children. Note that this is actually not completely logical: In the example the region elements that are siblings of the root-layout are conceptually clearly ‘inside’ the root-layout and should therefore be its children. The pixel sizes of the regions are defined regarding to the size of the root-layout.

A SMIL presentation can also be defined to use many separate windows using any number of topLayout element. Regions that are children of a topLayout element will be placed in that separated window. The HierarchicalLayout Module, among some other things, extends the basic layout model with support for hierarchical region layouts: regions nested inside other regions. An example of these features is shown in Figure 3.4.

The layout element

The resulting layout

Figure 3.4 Use of topLayout and nested regions [W3C-SMIL2: Chapter 5.9]

3.5 Temporal description

The layout section of a SMIL document specifies regions that describe where graphical objects can be shown. This is defined inside the head element. In the body of the document, the rendering of media objects is specified using these regions and a number of synchronization elements and attributes. The two basic timing elements

<layout> <topLayout width="640px" height="480px" /> <region id="left" top="0%" left="0%" width="50%" height="100%" /> <region id="right" top="0%" left="50%" width="50%" height="100%"> <region id="inset" top="25%" left="25%" width="50%" height="50%" /> </region> </topLayout> </layout>

Page 23: Smil for Mms

16

are seq and par. Both of these function as containers for media objects or other timing elements. SMIL temporal description is very complicated (about a hundred pages in the specification), and this section only covers the basic cases.

3.5.1 Sequential and parallel containers

The objects inside a seq element are played sequentially in the order that they are specified. Each of the objects is rendered in turn and the rendering of an object starts when the rendering of the previous has ended. An example of the timeline of a seq element is shown in Figure 3.5. It also shows how rendering of a media object can be limited using the dur (duration) attribute and delayed with the begin attribute. These will be discussed in more detail later in this section. Also note that the audio elements have no region attribute like the image and text in the example in Figure 3.1, because audio objects do not have any graphical elements.

Figure 3.5 An example seq element and its timing

The par element was already introduced in the first SMIL example, where its function was to allow the image and the text to be rendered simultaneously. All the elements inside a par element will start rendering at the same time, if not delayed by some special attributes. These two basic containers can be freely nested to create timing schemes that are more advanced. Figure 3.6 illustrates this behavior: Sounds one and the sequential container start at the same time, sound five after one second, and sounds three and four start sequentially after sound two. The body-tags have been omitted from the rest of the examples.

<body> <seq> <audio id="sound1" src="FourSecSample.amr" /> <audio id="sound2" dur="3s" src="ReallyLongSample.amr" /> <audio id="sound3" begin="1s" src="ThreeSecSample.amr" /> </seq> </body>

Page 24: Smil for Mms

17

Figure 3.6 Nested seq and par elements and their timing

3.5.2 Duration attribute

The dur attribute was introduced in the previous examples. It is used to specify how long a media object is active. If not specified the media’s implicit duration will be used. This is zero for images, so if an image is as a single element of a seq container, it will not be shown if no duration is specified. Duration attribute can also be defined for containers, and the children of a container always end when the duration of their parent ends. In this case, an image will be shown for the whole duration of the parent container.

If the duration of a video object is set to be longer than its implicit duration, its audio will end normally and the last frame will be shown for the rest of the specified duration. If the duration of an audio object is extended, it will be silent after its implicit end, but active in the sense of element timing.

An example of these features is shown in Figure 3.7. The duration of the parallel container is set to five seconds, which limits the duration of all of the media objects. Image two is ignored because its implicit duration is zero. The active duration for sound one is six seconds, even though the sample is only three seconds long, resulting in that sound two is never played. The video element has an implicit duration of three seconds, but this is extended to six seconds by its specified duration and finally limited to five by the container’s duration.

<par> <audio id="sound1" src="FourSecSample.amr" /> <seq> <audio id="sound2" dur="3s" src="ReallyLongSample.amr" /> <par> <audio id="sound3" src="ThreeSecSample.amr" /> <audio id="sound4" src="FourSecSample.amr" /> </par> </seq> <audio id="sound5" begin="1s" src="FiveSecSample.amr" /> </par>

Page 25: Smil for Mms

18

Figure 3.7 Duration attribute example

Durations are defined as SMIL clock values, including a number and a qualifier (h, min, s or ms). The number may also include fractions, or define hours, minutes, seconds, and milliseconds. If no qualifier is given, it defaults to seconds. Some examples are given in Table 3.2. The value "indefinite" is used when duration should be based on the elements inside a container or the elements parallel to the element.

Table 3.2 SMIL clock values

Clock value Value "12s" 12 seconds " 5.3 " 5.3 seconds (leading and trailing spaced ignored) "0.001h" 3.6 seconds (1/1000th of an hour) "02:30:03.36" 2 hours, 30 minutes, 3 seconds and 360 milliseconds "01:20" 1 minute and 20 seconds " 6 s " Invalid, spaces are not allowed inside the definition "02:100:10” Invalid, only two characters allowed for minutes

3.5.3 Repeating elements

The playback of an object can be repeated by defining either the repeatCount or repeatDur attributes. These can be applied to both media objects and containers. The former of the attributes defines how many times an object is played, with a numeric value or "indefinite" meaning that the object is repeated until parent time container ends. The repeatDur attribute is used to determine how long an object should be repeated, using a clock value or "indefinite". The object will repeat as many times as is needed to fill this duration. Figure 3.8 illustrates these

<par dur="5s"> <img id="image1" src="RedSquare.jpg" region="LeftImage"/> <seq> <img id="image2" src="lapland.jpg" region="RightImage"/> <audio id="sound1" dur="6s" src="TheeSecSample.amr" /> <audio id="sound2" src="AnotherSample.amr" /> </seq> <video id="video1" dur="6s" src="ThreeSec.3gp" region="Video"/> </par>

Page 26: Smil for Mms

19

behaviors. The sequence will be repeated until stopped by the user. Note also the dur attributes for the two first elements.

Figure 3.8 Repeating playback example

3.5.4 Begin and end attributes

As shown in the previous examples, the begin attribute can be used to delay the rendering of an object related to the start time of its parent (parallel container) or the ending time of the previous object (sequence). This can be extended by using identifiers to relate the time to the beginning or end of an arbitrary object. For example begin="video1.end+1" will make an object start one second after video1 has ended. It is also possible to define a list of begin values, separated by semicolon, for example begin="1s; video1.end+2min; 5min". In this case, the element will start, or restart if already rendering, at each of these times.

Chapter 3.5.2 introduced the dur attribute, used for defining durations for objects. The same thing can be achieved with the end attribute, which defines the ending time for an object related to the same moment as the beginning time. In addition, the relative values can be used. Figure 3.9 shows two codes that define similar timing schemes.

<seq repeatCount="indefinite"> <audio dur="2.5s" src="TwoSecSample.amr" repeatCount="2" ... /> <video dur="2s" src="ThreeSec.3gp" repeatDur="3s" ... /> <audio src="TwoSecSample.amr" repeatCount="2.5" ... /> </seq>

Page 27: Smil for Mms

20

Figure 3.9 Use of end and dur tags – these two codes result in similar timing

3.5.5 Event based timing

Nothing introduced so far has really justified the existence of both end and dur attributes, because everything could have been defined using only one of them. This is because the timing components so far have been static. To achieve dynamic timings, events can be used in begin and end attributes.

SMIL Events always occur on a SMIL object, i.e. a media object, container or top-level window. The format of a reference is familiar from many programming languages: begin="image1.activateEvent" will make the object begin when an activateEvent occurs on object image1. If no object name is given, the event is assumed to happen on the object itself.

Table 3.3 lists the SMIL events specified for SMIL 2.0 Language Profile (more information about it follows in section 3.7.4). The general SMIL documentation also mentions events click, load and repeat (same as repeatEvent). Note that the events endEvent and beginEvent have in practice2 the same results as the synchronizing attribute values end and begin introduced in the previous sub-section.

To delay the beginning or ending of an object, a clock time can be added after the event, for example "img.inBoundsEvent+1s". The repeatEvent can be given an integer argument so that the media object responds to a specific repeat time, not every repeat.

2 The only differences occur when using negative begin or end times, which are not covered in this

work

<par dur="10s"> <img id="img1" begin="2s" dur="5s" ... /> <video begin="img1.begin+3s" dur="4s" ... /> </par>

<par end="10s"> <img begin="2s" end="7s" ... /> <video begin="img1.begin+3s" end="img1.end+2s" ... /> </par>

Page 28: Smil for Mms

21

Table 3.3 Events in SMIL 2.0 Language Profile

Event name Description activateEvent The media object is activated by the user, e.g. by

clicking on the object focusInEvent The media object gets the keyboard focus, e.g. is

selected by the user by some means focusOutEvent The media object loses the keyboard focus beginEvent The object (could be container) begins playback endEvent The object ends playback repeatEvent The objects playback repeats (due to repeat attribute,

not due to multiple begin times) inBoundsEvent The mouse pointer, or some other implementation

specific “cursor”, enters this media element’s area outOfBoundEvent Opposite of the inBoundsEvent topLayoutCloseEvent A top-level window (topLayout) is closed topLayoutOpenEvent A top-level window (topLayout) is opened The example in Figure 3.10 shows how easily play and stop-buttons can be

implemented for a video using the activateEvent. The video starts playing when playImg is clicked, and stops when either the video itself or stopImg is clicked.

Figure 3.10 Play and stop buttons for a video using events

The use of the argument for the repeatEvent is demonstrated by the example in Figure 3.11. The image is shown at each repeat of the video, but the audio is played only one second after the video begins for the third time.

<par dur="indefinite"> <img id="playImg" src="play.png" ... /> <img id="stopImg" src="stop.png" ... /> <video begin="playImg.activateEvent" end="activateEvent; stopImg.activateEvent" ... /> </par>

Page 29: Smil for Mms

22

Figure 3.11 Using repeatEvent with an argument

3.5.6 Other timing related elements and attributes

If better control of element timing is needed when using events, the attributes min and max can be used. They set the lower and upper bound for an element’s duration. These override the value of the end and dur attributes. In the example in Figure 3.12, the video is viewed at least once, and at most three times. If the user clicks on the stop image before ten seconds, the video will be played once and then stopped.

Figure 3.12 Example of min and max attributes

SMIL also includes a third time container that was not introduced with seq and par, because it is only useful with event timing. This container is excl, and it is otherwise like a parallel container, but allows only one of its children to play at any given time. This is useful when there are many media objects from which the user should select one that is played. The example in Figure 3.13 defines a video for which the user can select the audio of the preferred language. When the user selects a language, the new audio will replace any previous selection. The audio definitions are defined inside parallel containers in order to preserve the synchronization with the video – the audios all begin in sync with the video, even though they are not active at that time.

<par> <video id="vid" src="ThreeSec.3gp" repeatCount="4" .../> <img dur="2s" src="repeated.png" begin="vid.repeatEvent" ... /> <audio src="LastTime.amr" begin="vid.repeatEvent(3)+1s" .../> </par>

<par> <img id="stop" ... /> <video src="TenSeconds.3gp" end="stop.activateEvent" min="10s" max="30s" repeatCount="indefinite" ... /> </par>

Page 30: Smil for Mms

23

Figure 3.13 Example of an excl container [W3C-SMIL2: Chapter 10.3.2]

The clipBegin and clipEnd attributes can be used to limit the playable part of a continuous media object. For example clipBegin="2s" clipEnd="5s" would make an audio to be played only between offsets two and five seconds.

3.6 Other SMIL features

The two previous sections introduced the spatial and temporal descriptions of SMIL. This section will show how actual media objects and links are defined, as well as some special tricks and goodies.

3.6.1 Media object definitions

The inclusion of media objects into a SMIL presentation has already been introduced in some of the previous examples. The syntax follows closely the syntax of an img tag in HTML. The seven media object elements in SMIL are listed in Table 3.4. They all have similar syntax, and any type of media can be defined with any of these. Different elements have been introduced only to improve readability. The generic reference is used when the group of a media object is unclear.

Table 3.4 SMIL media object elements [W3C-SMIL2: Chapter 7.3.1]

Media object element

Description

ref Generic media reference animation Animated vector graphic or other animation format audio Audio clip img Still image text Text reference textstream Streaming text video Video clip The most important attribute of these elements is source, src. It defines the

object’s Uniform Resource Identifier (URI), which is used by the SMIL viewer to fetch the content. The URI could be an HTTP address (like

<par> <video id="vid1" .../> <excl> <par begin="englishBtn.activateEvent" > <audio begin="vid1.begin" src="english.au" /> </par> <par begin="frenchBtn.activateEvent" > <audio begin="vid1.begin" src="french.au" /> </par> <par begin="swahiliBtn.activateEvent" > <audio begin="vid1.begin" src="swahili.au" /> </par> </excl> </par>

Page 31: Smil for Mms

24

“http://www.abo.fi/image.jpg”) or in the case of a multimedia message the random content id given to a file (like "cid:rBWJlxq1YW"). Some SMIL viewers also support the data URL scheme [RFC-2397]. It allows the insertion of small data directly to the URL, and can be handy for embedding small texts or very small images inside a SMIL document, for example "data:,A%20brief%20note". Note the coding of the two spaces (%20).

As mentioned earlier, the region attribute is used for graphical object to define the region in which the object is rendered, and the id attribute is used to give any element in SMIL a specific identifier. The alt attribute should be used to define an alternative text for an object, and a URI to a longer description should be defined with longdesc.

There is also an additional brush ‘media object’, which can be used to paint solid color in a region. This color is defined by color attribute. A brush does not have the src attribute, and it is not interchangeable with the other media object elements.

3.6.2 Linking

A SMIL document may contain links to other SMIL presentations, a specified time in a SMIL presentation or external files like HTML documents. As with many SMIL elements, the syntax is closely related to the syntax of similar features in HTML. The basic linking element is a, which contains the elements that can be used to open the link. The element a always has an attribute href, which defines the destination of the link. The hash separator (#) is used to define a specific element in a SMIL presentation; the destination presentation will be played from the beginning of the element with the given id. The link may be opened to a region of the current presentation, an HTML frame or a new SMIL or browser window. This is specified with the target attribute. In the example in Figure 3.14 the first link would restart the presentation from the beginning of video1. The second link could be handled by opening a web browser with the given address. The third link would open the presentation another.smil to the region region_A. The last link would open third.smil and start playing it from the beginning of the element video2.

Figure 3.14 Linking example

The element area can be used to associate a link to only a part of a visual media object. This element is a child of the media object element, and it can contain the

<par> <seq> <video id="video1" ...> ... </seq> <a href="#video1"> <img src="jump_to_first.png" ...> </a> <a href="http://www.abo.fi/help.html" target="new"> ... <a href="another.smil" target="region_A"> ... <a href="third.smil#video2"> ... ...

Page 32: Smil for Mms

25

same attributes as an a element, with some additional details. These include coords, which defines the coordinates of the area, begin and end that specify the time when the link is active and shape to define some other shape than rectangular.

3.6.3 Content control

One valuable SMIL feature, especially for mobile devices, is the content control. It can be used to select over a number of layouts or media objects based on system properties. It is based on the switch element, which allows only one of its child elements to be chosen, the first one which is acceptable. It can be used anywhere in a SMIL document. There are twelve defined test attributes, for example systemBitrate, systemLanguage and systemScreenSize. For a numeral attribute, any number bigger than the limit will be accepted. Figure 3.15 shows two examples of switch: The first one selects an appropriate audio file based on preferred language and the second one selects the layout based on screen size. Note that this is the only case when multiple layout elements are allowed in a SMIL document.

Another content control related element is prefetch, which suggests the player that the file specified by its src attribute should be fetched, usually from a server, even though not yet displayed. When the file is rendered, the data will be directly available. This element gives the author the ability to control content download so that the presentation will be shown smoothly.

Figure 3.15 Selecting content and layout with the switch element

<par> <video src="video.mpg" .../> <switch> <audio src="finnish.au" systemLanguage="fi"/> <audio src="dutch.au" systemLanguage"nl"/> <!—- English as the default, no tests --> <audio src="english.au" /> </switch> </par>

<head> <switch> <layout systemScreenSize="1024X1280"> ... define a big, complicated layout ... </layout> <layout systemScreenSize="480X640"> ... define a smaller layout ... </layout> <layout> ... define a small and simple layout ... </layout> <switch> </head>

Page 33: Smil for Mms

26

3.6.4 Transitions

To easily allow enhanced slide shows, SMIL supports transition effects at the beginning and end of a media objects. Transitions are defined in the head section of a SMIL document, and then applied to media objects using the transIn and transOut attributes. There are over a hundred different transitions supported by SMIL, categorized by types and subtypes, but only four of the transitions are mandatory. These are the default subtypes of barWipe, irisWipe, clockWipe and snakeWipe. Figure 3.16 illustrates the use and timing of irisWipe. Note the fill-attribute, which makes the first image stay visible until the transition has finished. Without it, the first image would disappear when the second becomes active.

Figure 3.16 irisWipe transition with subtype rectangle

3.6.5 Metadata

There are two SMIL elements for defining data about the document or its parts, also known as metadata. These elements are meta and metadata and they are always in the head part of a SMIL document. The meta element is a simple element that defines data about the whole document using the attributes name and content. This could be for example that “Publisher” (name) is “W3C” (content). The metadata element acts as the root element for a Resource Description Framework tree, which may include data about any of the elements in the SMIL document.

... <transition id="iris1s" type="irisWipe" subtype="rectangle" dur="1s" /> ... <par> <img id="redSquare" transIn="iris1s" dur="4s" fill="transition" ...> <img id="lapland" transIn="iris1s" dur="4s" ...> ... </par> ...

Page 34: Smil for Mms

27

3.6.6 Animation

The SMIL animation modules define a complex set of functionality and cover about fifty pages of the documentation. The animation elements allow the manipulation of virtually any attributes of a SMIL element as a function of time, and the same syntax is used to define animations in Scalable Vector Graphics (SVG).

3.6.7 Time manipulation

The time manipulations allow control of speed or rate of time for a SMIL element. They can be applied to both time containers and single media objects. They could be used to make a part of a presentation be played with double speed, or to be played first forwards and after that backwards. The speed of time can also be defined to accelerate and decelerate. It should be noted that some media formats might not support playing backwards or different playback rates.

3.7 Actual SMIL profiles

The last two sections presented the fundamental features of the full SMIL language, which is far from suitable for mobile or other resource limited devices. The profiling mechanism introduced in Section 3.3 supports the definition of sub-sets of the full language. When designing SMIL, W3C also defined two SMIL profiles: SMIL 2.0 Language Profile and SMIL 2.0 Basic Profile. Phone and network manufacturers have defined a greatly limited set to be used in MMS, so-called MMS SMIL, and 3GPP has after that defined the much richer 3GPP PSS SMIL Profile for the same purpose. These four profiles will be introduced in this section. Their relations are shown in Figure 3.17.

Figure 3.17 Relations between the SMIL profiles presented in this section

3.7.1 MMS SMIL

MMS SMIL was introduced in the MMS Conformance Document, created by Nokia and Ericsson in 2001. The specification has since then been renewed by the original

Page 35: Smil for Mms

28

authors accompanied by six other mobile industry companies, and was transferred to Open Mobile Alliance (OMA) in 2002 [OMA-MMSCon].

Even though the MMS Conformance Document has been updated several times, the SMIL definition in it has not changed significantly. MMS SMIL defines a presentation as a collection of slides, which all have the same layout, as in Figure 3.18. Each of these slides is presented by a par element in the SMIL, containing at most one image, one text, and one audio element. The size of the layout is set using the root-layout element. Each slide defines its duration in the dur attribute of the par element

This limited functionality allows easy implementation of an MMS viewer for a resource-constrained device, as there is no interactivity (except the possibility for the user to go to next or previous slide) and the timing is simple. One important feature is also that adapting the original layout to fit the display of the receiving device is simple because of the simple layout. These presentations are still viewable on a viewer supporting a richer set of SMIL. MMS SMIL is not SMIL Host Language Conforming, as defined in [W3C-SMIL2: Chapter 2.4.1].

Figure 3.18 An MMS SMIL presentation [OMA-MMSCon]

3.7.2 SMIL 2.0 Basic

The smallest set of modules that are required for SMIL 2.0 Host Language Conformance is the set defined by SMIL 2.0 Basic Profile. It was designed by W3C as a language for resource constrained devices, but was too rich for the first MMS implementations, which led to the creation of MMS SMIL. On the other hand, 3GPP – with future MMS implementations in mind – considered the features of SMIL 2.0 Basic too limited, and defined a richer profile that is presented in the next section. This leaves SMIL 2.0 Basic outside the scope of this work. Many PDA and PC SMIL players support it, some as the intermediate stage before full SMIL 2.0 Language Profile support.

3.7.3 3GPP PSS SMIL Profile

Introduced as a part of 3GPP’s Packet-switched Streaming Service (PSS) specification [3GPP26.234], the 3GPP PSS SMIL Profile - or just 3GPP SMIL - was first designed for scene description in streaming services. It was later defined as the mandatory scene description language in the MMS specifications by 3GPP [3GPP26.140]. This section will focus on 3GPP SMIL release 5; the one defined in release 4 does not include the BasicTransitions module.

Page 36: Smil for Mms

29

The 17 modules included in 3GPP SMIL are listed in Table 3.5. Even though these numerically cover only about one third of the full set of modules, all the basic features are included. From the features presented in the previous sections only the following are not included in 3GPP SMIL: Multi-window layouts, nested regions, exclusive (excl) container, animations (on SMIL attributes, animations as media objects are included), brush media object and time manipulations.

This means that supporting 3GPP SMIL in a SMIL viewer is a lot more complex than supporting MMS SMIL. Whereas MMS SMIL is quite static in its nature, having only a number of slides that should advance in a timed fashion, 3GPP is dynamic and much more complex because of the nested timing elements, events and links to external resources and beginning of specific elements in the presentation. This will be discussed in detail in Chapter 6.

Table 3.5 SMIL modules in 3GPP PSS SMIL (release 5)

SMIL Module Contents BasicContentControl switch element SkipContentControl Allows compatibility between different SMIL versions PrefetchControl prefetch element BasicLayout Basic regions (no hierarchical regions) BasicLinking a and area elements LinkingAttributes A lot of attributes for the above elements, like target BasicMedia Basic media elements MediaClipping clipBegin and clipEnd attributes MediaAccessibility alt, longDesc and other accessibility attributes for

media MediaDescription author, title and other descriptive attributes for

media Metainformation Metadata about a SMIL presentation Structure smil, head and body elements BasicInlineTiming begin, end and dur attributes for media MinMaxTiming min and max attributes for media BasicTimeContainers par and seq elements RepeatTiming repeatDur and repeatCount attributes for media EventTiming Support for events in begin and end attributes BasicTransitions Transitions between two visual media objects

3.7.4 SMIL 2.0 Language Profile

Only six of all of the SMIL modules are not included in the SMIL 2.0 Language Profile, among those the Time Manipulations module. The profile was created by W3C for Web clients that support SMIL. Currently, there are no players with support for all of the features, but many PC players come close. This profile is inappropriate for small, resource constraint mobile devices, and will probably be used only in the PC world. The older SMIL 1.0 is a subset of the SMIL 2.0 Language Profile, meaning that SMIL 2.0 players will also be able to play the old SMIL 1.0 files.

Page 37: Smil for Mms

30

4 MMS VERSIONS

This chapter introduces the variety in MMS features caused by its ongoing development. For this I have divided MMS to three different versions: MMS without scene description, MMS according to the OMA’s MMS Conformance Document (‘OMA MMS’) and MMS as specified by the specifications of 3GPP (‘3GPP MMS’). Focus will be on describing in which ways 3GPP MMS is more complex than OMA MMS, because OMA MMS is roughly the current implementation level in Series 40, and 3GPP MMS is the object of this work. The relationship between all of these versions is shown in Figure 4.1.

There lies a possible source of misunderstanding in this naming: MMS as a whole is specified by both OMA and 3GPP, and most of their MMS documents apply to all of the versions listed here. The document that separates OMA MMS and 3GPP MMS is OMA’s MMS Conformance Document. It originates from Nokia and Ericsson but has later been adopted as part of the OMA MMS documents.

It should also be noted that in this comparison it is crucial to take into account the versions of the specifications listed – future versions the OMA Conformance Document will probably introduce more advanced features. The current OMA Conformance Document is meant to be an intermediate phase to enable conformance between phones from different manufacturers even in this early stage of MMS evolution.

Figure 4.1 Relations between the MMS versions introduced in this chapter

4.1 MMS without scene description

Some of the early MMS phones, though not the first one (Ericsson T68i), send multimedia messages without a scene description. This stage of MMS is not actually specified in any document, it is merely something that happened to be implemented in practice. These messages most often include one textual part and additional images or audios as attachments, like an email message. The encoding of these messages is done according to the MMS specifications [OMA-MMSEnc], using

Page 38: Smil for Mms

31

MIME as introduced in Section 2.4. The formats for media are not specified, but the ones listed in MMS Conformance Document (see next section) are widely used.

Many of the first MMS phones ignore the SMIL scene description when viewing a multimedia message, but still send MM with a valid SMIL. Such implementations do not belong to this category.

4.2 OMA MMS

Media formats, media codecs, scene description, and other features of OMA MMS are defined in MMS Conformance Document v2.0.0 [OMA-MMSCon], a part of OMA MMS 1.1 specification. As already mentioned in Chapter 3.7.1, Nokia and Ericsson created the document in order to improve interoperability of first MMS implementations, because the features specified by 3GPP documents were not all feasible in the time span that MMS was supposed to be commercially introduced. The contents of the document have not changed significantly since the first release in 2001.

The media formats for both OMA MMS and 3GPP MMS are listed in Table 4.1 (on the next page). The mandatory scene description of OMA MMS is MMS SMIL, which was introduced in Chapter 3.7.1. The standard includes the most widely used image formats, as well as Wireless Bitmap (WBMP) that is mostly used in WAP pages. It is worth noticing that interoperability is guaranteed only for images with a maximum resolution of 160x120 pixels. This is because images taken for example with digital cameras can as unpacked contain several megabytes of data, which would be inappropriate in a mobile phone with a memory size of a few megabytes, or even less. AMR, a codec specified by 3GPP for speech, is the only audio-codec supported. Text is supported without any formatting. For sending phonebook entries the vCard format must be supported and, in a phone with a calendar, calendar notes in vCalendar format. The total size of a message is limited to 30 kilobytes.

OMA MMS does not mention MIDI, a format for synthetic audio, or any video formats, even though such might be useful in modern phones. This is because OMA MMS was designed to be possible to implement even on the low-end products.

The level of features in OMA MMS is supported by most of the newest phone models, naturally with the exception of the most price-driven phones without MMS. However, because of its limitations, many manufacturers have added some features to OMA MMS. Most phones with a built-in or accessory camera can optionally send the photos in the full resolution, for example 352x288 or 640x480. Many phones that support polyphonic ringing tones support receiving of SP-MIDI files. As the newest phones with cameras can be used to record video clips, such phones can usually send these clips in multimedia messages. The formats defined for 3GPP MMS are widely used for these kinds of enhancements.

Page 39: Smil for Mms

32

Table 4.1 Media formats in 3GPP MMS and OMA MMS

3GPP MMS (3GPP TS 26.140 V5.2.0)

OMA MMS (MMS Conformance Document v2.0.0)

AudioAMR x xMPEG-4 AAC-LC / 48 kHz, mono & stereo x -MPEG-4 AAC-LTP o -SP-MIDI, format 0 or 1 x -

VideoH.263 Profile 0 level 10 1 -H.263 Profile 3 level 10 o -MPEG-4 Visual Simple profile L0 o -

PIMvCalendar 1.0 - 2vCard 2.1 - x

Still imagesJPEG, baseline DCT x xJPEG, progressive DCT o -

Bitmap graphics 3GIF87a x xGIF89a x xPNG x xWBMP - x

Vector graphicsSVG Tiny profile 4 -SVG Basic profile o -

TextXHTML Mobile Profile (no images) x -US-ASCII x xUTF-8 x xUTF-16 x xUCS-2 Unicode x -

Scene description3GPP PSS SMIL Profile x -MMS SMIL 5 xXHTML Mobile Profile o -

x = Mandatoryo = Optional

1 = Mandatory for terminals supporting media type video2 = Mandatory if the phone has got a calendar3 = Interoperability guaranteed for 160x120 pixels4 = Mandatory for terminals supporting media type "2D vector graphics"5 = MMS SMIL is a subset of 3GPP SMIL and thus included in support for 3GPP SMIL

Page 40: Smil for Mms

33

The newest version of Nokia Series 40 MMS viewer, which is in use for example in the model 6220, supports OMA MMS with the following additional features:

• The maximum message size is 100 kilobytes • There is no specified limit for image resolution • SP-MIDI format for receiving and sending polyphonic ringing tones • Video clips can be used (H.263 codec) • Digital rights management (DRM) is supported (Mobile DRM)

The two additional file formats of the Series 40 player, SP-MIDI and H.263 are

both mandatory in 3GPP MMS. Digital rights management has not been specified for MMS at all, but has been seen as an important feature because of the rich media contents in MMS. It allows a file to be encrypted and viewed only with a special key, so that the user is able to use the file but not copy it – if the content is forwarded it cannot be viewed because the key is not present. This is useful for example if the user purchases a ringing tone that should not be copied to anybody else. The DRM scheme used is Mobile DRM.

Version 1.2 of the OMA MMS specifications, which currently only exist as a candidate, takes into account the need for interoperability in richer media. The conformance document divides features into five categories, currently called Text, Image Basic, Image Rich, Video Basic, and Video Rich. This enables a manufacturer to easily specify the level of functionality in a phone by referring to these categories. In addition, version 1.3 is currently being discussed. It will most probably include 3GPP SMIL with some restrictions.

4.3 3GPP MMS

The term ‘3GPP MMS’ is in this thesis used to refer to MMS with the media formats defined in [3GPP26.140] version 5.2.0 and the scene description in [3GPP26.234] version 5.5.0. It is worth pointing out that [3GPP26.234] is a part of the 3GPP streaming specifications, and only the chapter defining 3GPP PSS SMIL is related to MMS. The specification also defines media formats and other details, but these are for streaming, not for MMS.

The specifications of 3GPP MMS are of a different type and style than those of OMA MMS. This is because the latter is defined by a document striving to conformance, where as the specifications for 3GPP MMS are general service specifications. It might be that not all of the features of 3GPP MMS will ever be fully utilized by commercial products, but it still serves as a good source of features to be studied, and the future conformance documents will probably require more of these features. This thesis work will present also the features specified as optional, not just the smallest set that enables calling a product 3GPP SMIL compliant.

The full listing of the media formats in 3GPP MMS is shown in Table 4.1 (page 32). The mandatory media types are audio, image, and text. The only media formats that are included in OMA MMS but not in 3GPP MMS are WBMP images and vCard and vCalendar files. For both video and vector graphics the 3GPP specification defines mandatory media formats in case the terminal supports the media type. The format defined for the latter, Scalable Vector Graphics (SVG), is an XML language for describing two-dimensional vector graphics. It is specified by

Page 41: Smil for Mms

34

W3C, and modularized like SMIL, so that profiles with limited functionality can be defined. If supported, the profile Tiny is mandatory and Basic optional.

3GPP MMS specifications do not state anything about the maximum size of a message, because it is clearly a figure that will change with time with the continuous improvement of network and hardware technologies.

The mandatory scene description language in 3GPP MMS is 3GPP SMIL; XHTML Mobile Profile is in addition mentioned as optional. 3GPP SMIL is the fundamental difference between OMA and 3GPP MMS, as it is much more complex to support than MMS SMIL. This is caused especially by the following details in 3GPP SMIL:

• The spatial layout can include an arbitrary number of arbitrary placed

regions, requiring the software to keep track of z-indexes of all of the items drawn, and then take care of the overlapping parts, or to do a lot of unnecessary redrawing. Especially partly transparent images or animations on top of video are tricky to render

• There can be many simultaneous video and audio objects, which can lead to very processor demanding rendering

• The timing of 3GPP SMIL is very complex, especially because of sporadic events, which make the whole timing dynamic. Much more advanced data structures are needed to handle all the timing relations than in MMS SMIL. Also the ability to link to the beginning of a specified SMIL element increases complexity of the timing of a presentation

• Besides the impact on timing in case of links within a presentation, the linking features add the need for a pointer or a cursor that points to the active element, so that the user can select a link. This if further complicated by the possibility to associate links to parts of a visual element, not just the whole element

• Transitions, even though the specification states that transitions may be implemented partly or not at all

Many of the problems of a mobile device can be easily handled with appropriate

use of the SMIL’s content control, using the switch element (introduced in Chapter 3.6.3) to control the behavior of the presentation depending on the properties of the devices. This is likely to be used in SMIL presentations created by professional designers, mostly in machine-to-person scenarios, but the MMS generator software in mobile phones will probably not be able to do this so elegantly, because the set of possible target devices is not known when the software is written.

It is understood that the whole feature set of 3GPP SMIL is very demanding, and will not be fully supported on all mobile devices. The 3GPP streaming specification contains a chapter about SMIL authoring guidelines [3GPP26.234: Annex B]. These are valid also in the scope of MMS. The major points are:

• The linking of a presentation should not depend on the area element,

valuable links should be done using the a element • Because the layout may be discarded by the target device if it is unsuitable,

the switch element should be used to define different layouts for the whole range of targeted devices

Page 42: Smil for Mms

35

• The fit attribute value "scroll" should only be used for text components, not for image or video

• Scaling of video or even images might not be possible on a constrained device, "hidden" fit and a suitable size in pixels is therefore recommended especially for video

• The events inBoundsEvent and outOfBoundsEvent assume that the terminal has a pointer device for focusing elements, and should thus be used with care

• XHTML as media element: o No images should be defined in the XHTML parts, these should be

included in the SMIL document o Tags that are not in XHTML Basic might not be rendered correctly

(3GPP MMS supports XHTML Mobile Profile, which is a superset of Basic)

The first phone supporting 3GPP MMS, Nokia 6600, was released in November

2003. It is built on the Series 60 platform and uses the Symbian operating system. The MMS viewer of the phone supports the full 3GPP MMS, even though there is some minor details that are not according to the specs. Because there are not yet any other products supporting this set of MMS features, the MMS composer does not support creating such messages, so they can only be used by for machine-to-person messages if the receiver’s phone is known to be a Nokia 6600. The phone also supports some additional features, like Mobile DRM.

The next chapter will introduce the Series 40 platform and Chapter 6 will focus on how the feature set of 3GPP MMS can be implemented, possibly with some adaptation, on the Series 40 platform. Chapter 7 focuses on user interface design.

Page 43: Smil for Mms

36

5 SERIES 40 PLATFORM

This chapter introduces the Nokia’s Series 40 mobile phone platform, with emphasis on the details that are related to viewing SMIL presentations. Platforms for similar usage profile by other manufacturers have most probably somewhat similar features.

The Series 40 platform is a size driven, but still feature rich platform. This means that small size is favored over excessive amounts of features. It is used in many of the mobile phones Nokia has released lately, for example the products 7210, 6220, and 6800, shown in Figure 5.1. The simpler, price driven Nokia phones are based on the Series 30 platform, and the most advanced one-hand operated feature driven Nokia phones are built on the Series 60 platform, which uses the Symbian operating system.

Figure 5.1 Examples of Series 40 products: Nokia 7210, 6220 and 6800

There is some variation among the features of the Series 40 phones. For example, the Nokia 6800 includes a full QWERTY-keyboard that can be revealed by flipping the face of the phone. This chapter will focus on the features of the newest models, because such a set of features is definitely available in the phones that 3GPP SMIL will be implemented to. Some ideas will also be given about how the platform might be developed in the future and which details are unlikely to change.

5.1 User interface

5.1.1 Display

Most of the Series 40 terminals so far have a display composed of 128x128 square shaped pixels and is capable of showing 4096 separate colors. The physical size of this display is about 27 x 27 mm. There are two models with a bigger display of 128x160 pixels – these are in some occasions referred to as Series 45 phones. Two of the most recently released models have a display with 65536 (64K) colors, and it is likely that such, and even better, displays will be used in all Series 40 products in the future.

The Series 40 platform is size driven, which limits the maximum physical size of the display. The face of the phones is not totally utilized yet, so some improvement can be done if the size of the other components in the phone can be reduced. It is

Page 44: Smil for Mms

37

likely that display technologies will be developed so that the size of the pixels is reduced, which will also increase the number of pixels that can be displayed. The extra pixels gained by reducing pixel size cannot however change the fact that it is a limited amount of information human eye can conveniently interpret from a given area. Smaller pixels will, at least after some point, only add to the clarity of the image, not to amount of information that can be shown simultaneously.

5.1.2 Keypad

There is some variation in the keypads of the Series 40 phones. All of the phones have the following keys:

• Number keys: 0-9, star and hash • Scroll up, down, left and right (4-way scroll key) • 2 softkeys (left and right) • Send key (dial) and end call key • Power key

Many of the Series 40 phones have separate volume keys; otherwise, volume is adjusted with scroll left and right. Some models have a full QWERTY-keyboard that can be flipped forth from under the normal keypad, but these are a minority. The model 6108 has a separated pen input pad enabling the inputting of characters using a stylus. It has been designed especially for inputting Chinese.

One feature that has been introduced to the newer models is that the 4-way scroll key, which can also be seen as one key with 4 different functions, can also be clicked inwards. This is often called a 5-way scroll key. It is used as a third softkey, and is assumed to be present in the next chapter when user interface for the MMS viewer is discussed. Softkeys are introduced in the next sub-section.

Page 45: Smil for Mms

38

Figure 5.2 shows the keypad of Nokia 6230. It has the standard keys listed above complemented with volume keys and the 5-way functionality in the scroll key.

Figure 5.2 The keypad of the Nokia 6230

5.1.3 User interface logic

The basic ideas of the Series 40 user interface logic are presented in this sub-section to give background information to the next chapter that also discusses the user interface of the 3GPP MMS viewer application.

The idea of the softkeys is that the functionality of these keys changes depending on the state of the current application. In Series 40 user interface the functionality for each key is shown in the bottom of the display, right above the particular key. Figure 5.3 exemplifies this with the camera layout for a Series 40 product that has three softkeys. The left softkey opens the options list, middle softkey captures the image, and the right softkey exits the camera application.

Figure 5.3 Camera viewfinder layout in a Series 40 product with three softkeys

Page 46: Smil for Mms

39

[NOK-S40UI] describes the logic of the user interface only with two softkeys (a newer version of the document will probably be released before this thesis is published), but this can easily be adapted to three softkeys. Basic usage of an application builds around the softkeys and the scroll keys.

The left soft key is used for positive and forward-going actions, like Select, OK, Options and Yes. If there are multiple possible actions, they are collected in an options list, which is accessible via the left softkey. The middle softkey contains the action that is most important, like Capture in the case of camera viewfinder. The right softkey is used for negative and back-stepping actions, like Exit, Delete, Back, and No.

It should be noted that the example layout, camera viewfinder, does not really work according to this logic in the case of two softkeys, because accessing Capture via an options list would not be appropriate. Therefore, in the case of two softkeys the Capture action is in the left softkey, and the options list is not accessible when viewfinder is active.

The scroll keys are naturally used to move the cursor or focus to the four possible directions, or to scroll the data visible if it does not fit the display. When entering a number (i.e. time or date), scroll up and down change the currently focused value, and scroll left and right move the focus. In a product without volume keys scroll left and right are used to adjust volume. This might cause problems in an application where both horizontal scrolling and volume adjusting should be easily accessible, like the MMS viewer, because the user interface might change significantly between products that have the volume keys and products that do not.

The other keys of the keypad are not used for basic application usage. The end call key always functions as a global exit and it should always end the currently active application. The send key is used for making a call or as a shortcut to activate a sending operation, like sending the current image taken with the camera. A long press of the power key always turns the phone off. The number keys, especially star and hash, are used in many applications to implement shortcuts for an advanced user. The keys 2, 4, 6, and 8 are also used as optional up, left, right and down controls in many games.

There are no touch screens in the Series 40 products; the only Nokia with a touch screen is the 7700, built on the Series 90 platform. Neither is there anything that would function like a mouse on a workstation. Therefore, the user interface does not have a concept of a pointer other than the cursor in text editing and the focus in a selection list or grid. This must be taken into account when designing the user interface for the MMS viewer.

5.2 Memory and processor

The total memory of a Series 40 product consists of Random Access Memory (RAM) and a flash memory, which is memory that does not need power to maintain its contents but is much slower to access, especially write, than RAM. The RAM is used for volatile, dynamical application data. The flash memory is used to store the software, data needed by the software, e.g. images, ringing tones and texts, and the user data, for example additional applications, multimedia messages, and calendar entries. There are also some smaller cache memories and a small unchangeable read-only memory. Some products also support extending the user data storage with a removable Multimedia Card (MMC).

Page 47: Smil for Mms

40

The memory sizes of the Series 40 products vary – the newer phones tend to have more memory as the capacity of memory chips is constantly growing and the price is getting lower. In general, memory is still quite an expensive component in a mobile phone, and a target of optimization because even small saving per unit will mean big amounts of money when the volume is up to hundreds of millions.

The size of the RAM in the current Series 40 products is between 4 and 8 mega bytes, depending on the features present. It is hard to say even for a specific product how much of this is available for an application, but the phones network related and other basic functionality will take up much of it. Something between a few hundred kilobytes in the older products and a couple of megabytes in the newer ones is a fair estimation. Compared to a PC this is not much, but the user interface and the applications are also much simpler, and more optimized for the specific purpose. There is not any memory to waste for the MMS viewer, and for example huge images (millions of pixels) and hundreds of slides might cause problems with memory, but there should be no problems in case of somewhat normal multimedia messages.

The size of the flash memory varies much more than that of RAM, but most of the current Series 40 products have 16 megabytes of it. The software and data needed by it takes most of this, and about one megabyte is left for user data. There are currently two models that have exceptional user data capacities: The 7600, which has got about 29 megabytes of flash memory available, and the 6230, which supports MMCs with at least 32 megabytes for user data. Both of them support playback of MP3 and AAC audio files, which require much space.

There are two processors in the Series 40 terminals, one Microprocessor Control Unit (MCU) and one Digital Signal Processor (DSP), a processor specially designed for high-performance, repetitive, numerically intensive tasks. The former is mostly used for executing standard application logic and the latter is used for audio and video processing. The speeds of the MCUs used are about 50-100 MHz, whereas the DSPs run at approximately 100-200 MHz. If the terminal is not doing anything else – e.g. a phone or data call – then most of the processing power should be available to the MMS viewer, even though some is naturally used by the basic network functionality. In practice, these figures mean that, for example, video can be decoded in real-time, but resizing of video or many videos and audios that should be decoded simultaneously will cause problems. Processor speed is fortunately a figure that will definitely improve a lot in the course of time.

5.3 Software

The Series 40 platform is based on the proprietary Nokia Operating System, which has been developed by Nokia especially for mobile phones. The general architecture of the system is based on clients and servers: Each of the phone’s resources is controlled by one or more servers, and these servers provide services to clients. A client could be a user application or a server needing access to some other resources than the ones it controls. A resource could be for example the SIM-card, loudspeaker, or microphone.

Besides such low level services, the software platform has servers and other software components that provide services like image viewing, video rendering, and audio file playback. This means that the MMS viewer does not have to implement such things, but merely controls the playback of media elements using the

Page 48: Smil for Mms

41

appropriate components. This also means that if support for some media type is implemented, it is available for all the applications in a terminal, not just the MMS viewer.

Page 49: Smil for Mms

42

6 DEALING WITH TERMINAL CONSTRAINTS

This chapter will discuss the problems that arise when multimedia messages according to the 3GPP standards are played on a mobile terminal. The Series 40 platform is used as reference for specific details, but the same kinds of limitations apply also to other platforms. The limitations are discussed and then solutions to the problems that arise from those presented.

Even though the task of implementing 3GPP MMS viewer is far from trivial, there are only a few very troublesome features. The hardest problems are caused by the need for interoperability between all the devices supporting SMIL. When composing a multimedia messages on a Series 40 terminal the features of the presentation will naturally be limited by the limitations of the viewer, and viewing such messages will not cause any problems. In addition, most other mobile phones have quite similar limitations to those of the Series 40 terminals and messages from those will not be problematic, even though somewhat bigger displays are common high-end products and presentations optimized for such may cause problems. The biggest issue is SMIL presentations created for a PC. The display size, runtime memory, and processing power available on the original target device of such presentations are enormous compared to those of a Series 40 terminal.

It is possible for the Multimedia Messaging Service Center (MMSC) to do content adaptation according to the destination device’s properties. This means that viewing messages sent via an MMSC with this feature is an easy task. Unfortunately, this adaptation is optional and the phone software must be able to work even without it in order for the phone to be able to function in all possible network environments.

If there is no way to render a multimedia message according to the scene description the phone can, as a last resort, show the user a list of the files included in the message, with the possibility to open files separately. There should be options to save the files to phone’s memory or to view individually with the Media Player, a standard component in the Series 40 software, or some other appropriate application. This way the user can at least view the data included in the message, even though much of the content might be uninteresting when separated from the presentation. Any unreferenced files in a multimedia message, meaning files that are not referred to by the presentation, should also be handled this way.

6.1 Display size

The display size of a Series 40 phone is smaller than that of many devices that the terminals share content with; the width of the display compared to a PC screen might be one tenth, or even less. It is likely that in the future the Series 40 displays will be a bit bigger, both physically and in pixels per inch. This will improve the situation, but the problem of viewing content designed for bigger screens will always exist.

There are many studies about viewing web-content on a mobile device, discussing both content and user interface adaptation. Regrettably, the methods used for HTML are not appropriate for SMIL content, because a HTML page is mostly static and consists of media and links laid out on one level, whereas a SMIL presentation has temporal behavior and the content is on many visual levels.

Page 50: Smil for Mms

43

6.1.1 A presentation that is bigger than the display

If a presentation is bigger than the display size of the rendering device, the MMS viewer application has the following four options for handling the rendering:

1. Just show the presentation in its original size so that some of the content is

not shown 2. Resize the presentation to fit the display 3. Show the presentation in its original size, but with scrollbars so that the

user can view the whole area 4. Use the last resort explained earlier, i.e. just show the content files

separately These approaches are in no way exclusionary. The last one should probably

always be available as an option, even if the presentation would not be bigger than the display.

Using the first approach is appropriate only if the presentation is slightly bigger than the display, maybe up to ten pixels, so that only some pixels near the border are missed. In case of a bigger presentation, the user might miss much of the content.

Figure 6.1 exemplifies methods two and three. It shows a presentation designed to be viewed in full screen on a Series 60 phone, and how it looks when scrolling versus resizing is used to fit is on the Series 40 display. The softkey texts and the header in the rightmost example will be discussed later on.

Figure 6.1 Viewing a presentation designed for a bigger screen (the screens are printed approximately in their physical size)

Resizing the presentation to fit the display is in many cases a good solution, especially because it is so seamless to the user. Nevertheless, in case the content includes many visual details that are crucial for the content, e.g. small text in images, such details might be invisible after the resizing. This can be seen in Figure 6.1 – the

Page 51: Smil for Mms

44

texts in the last two cases are mostly unreadable, even though the original presentation has only been resized by a factor of 0.62 and 0.46. A presentation for PC might be for example 800 times 600 pixels, and would totally lose details like small texts if resized to fit the Series 40 display. The MMS viewer cannot know the types of the contents in a presentation, so the user should be able to also select some other viewing method than resizing: scrolling or viewing the files separately.

In the example, the resizing is done so that the aspect ratio of the original presentation is preserved. In cases like it, where the aspect ratio of the original presentation is not extremely different from that of the target size, it could be more beneficial to fit the presentation to the whole area available in order to utilize the screen area more effectively. Of course, this will not look good if the original presentation is stretched so much that it totally loses its form, so the aspect ratio should not changed too much.

When the presentation is shown with scrollbars, the content can be shown in the size originally intended and no details are missed. Even this approach has some drawbacks: Some timed visual parts might be easily missed and the user interface becomes more complex. There are also limits to how big the scrollable area can be before it becomes unusable – presentations bigger that for example four times four display sizes should probably be reduced to that limit in order to maintain usability.

When scrolling is used, the user views only a part of the whole presentation at a time. This works fine for viewing a static presentation, or one that only responses to user input. If there are any components that have timed behavior, the user may miss some of the content, because for example an animation might be running in a part of the presentation that is outside the user’s viewport.

The possibility to scroll the whole presentation, adds one more feature to the user interface. This is problematic, because the user interface of any application in a phone should be as simple as possible, and there are already many activities in the MMS viewer that should be easily accessible to the user. This will be discussed more in Chapter 6.2.

Because resizing and scrolling are best suited for different kinds of content, it might be the best solution to provide the user with both of the methods. It would be convenient to start the MMS viewer with the whole presentation fitted to screen and provide zoomed viewing, i.e. natural size with scrollbars, as an optional view.

6.1.2 Optimizing the use of the display

The display size is a limitation for the MMS viewer, and therefore the usage of the display’s pixels should be optimized. In standard Series 40 applications, the header uses 14 pixel rows and softkey texts use 18 pixel rows, like in the camera layout in Figure 5.3 (page 38). This means that the area left for application specific free content is only 128x96 pixels. These could be hidden in order to use the screen area more effectively.

The header and the softkey texts are visible in all of the current native Series 40 application, and removing them from the MMS viewer would make it look quite different. It is questionable whether this is acceptable. From usability point of view the header is not so necessary – it would probably hold content like the timer and possibly an icon to notify the user if the presentation includes sound. These could be implemented by drawing them on top of the contents. Removing the header would increase the usable area width to 110 pixels.

Page 52: Smil for Mms

45

The softkey texts are an essential part of the whole softkey idea – without them the user would not know how the phone exactly responses to a key press in all the situations. It could still be usable to hide these texts when playing an SMIL presentation, if the viewer would pause the presentation and show the softkey texts in case any key was pressed. Scrolling, and possibly selecting, media objects should still be available while a presentation is playing, which complicates the matter.

6.1.3 A presentation that is smaller than the display

It is worth noticing that a presentation might define a root layout smaller than the display. The first idea about viewing such presentation might be to show them in the original, intended size. This is not the best solution, because the screen size is quite small and it would be foolish not to utilize it as fully as possible. Therefore, presentations that are smaller than the display should definitely be resized to fill as much of the display as possible, probably so that also the media elements are resized. This will help the user to see as much of the presentation as possible.

6.2 Processing power

The speed of the processors in a Series 40 device can be a limiting factor for rendering of audio, video, transitions, and possibly images if really fast pace is needed. This sub-section discusses some processor hungry situations and how they could be resolved.

6.2.1 Transitions

Transitions demand a large amount of processing power. They can be implemented with generating a mask – the state of the transition – and copying the appearing content on top of the old one using that mask. This operation has to be done many times per second in order for the transition to be smooth. Therefore, many simultaneous transitions are too heavy to calculate, and a simple solution is to skip any transitions if some given number of them is already active.

As the drawing of the mask and the result consumes most of the processing power, multiple transitions for small areas might be less demanding to render than one occupying the full screen. As a result, the optimal way to limit the number of simultaneous transitions would be to keep knowledge about the amount of processing power left at a moment and based on that decide whether a new transition should be skipped. This would though add unnecessary complexity to the software and the gain is probably not worth the trouble.

Rendering transitions is especially heavy for video, because each video frame drawn should be manipulated according to the transition. The current Series 40 hardware and software architecture, which has been optimized for the speed of standard video decoding, does not effectively support this. Therefore, transitions for video will most probably have to be skipped.

6.2.2 Video

Real-time decoding of video is a demanding process, because the codecs use complex algorithms to be able to reduce the size of the files with enormous ratios. The decoding has to be done in real-time, because as unpacked, a few seconds of

Page 53: Smil for Mms

46

even a small video would take up several hundreds kilobytes of space and the unpacking would take a remarkable long time. A video in a mobile phone usually includes about 10 to 20 frames per second, so there is also a demand to update the screen in a rapid pace. Normal viewing of a video requires much of the processing power on the current terminals and therefore operations like resizing, transitions, z-indexing, and simultaneous rendering of many instances, are non-trivial.

The current hardware and software architecture may also cause problems. If the separated Digital Signal Processor is used for decoding the video, which is often the case, it can operate separately from the Main Control Unit (MCU) and decode the video directly to the screen. This makes decoding very effective, but makes implementation of z-index and transitions complicated for video, as the MCU should be able to change each decoded frame before it is shown. It may also make decoding many videos simultaneously inefficient, if the architecture has been optimized for one video instance at a time.

To implement all the fit parameter values for video elements, it should be possible to resize a video to an arbitrary size. This is not feasible with the current processors, even though clever design of the decoder software component makes this more effective than resizing of each of the video frames separately like an image. Hence, fit parameter values have to be implemented with cropping the video to achieve the needed size, so that the center of the video is preserved. This way the intended layout can be maintained, but parts of the video will be invisible. Figure 6.2 exemplifies this.

Figure 6.2 Replacing video resizing with cropping

Drawing something on top of a video element demands that the elements on top are redrawn after each frame of video, probably to some buffer and then to the display. This can be an expensive operation, especially if the element on top of video is transparent. For elements on top of video, the overlapping areas might thus be invisible in order to make the processing feasible. There are two types of transparent images: binary transparent – meaning that selected pixels are fully transparent – and alpha transparent, meaning that each pixel has a visibility level value. Alpha transparency is more expensive to render than binary transparency. GIF animations and vector graphics may also include transparency. Figure 6.3 shows examples of these features.

Page 54: Smil for Mms

47

Figure 6.3 A transparent element on top of video

Decoding multiple videos simultaneously is not feasible with the current Series 40 hardware and neither does the current software support it effortlessly. Therefore, only one video should be active at a time. If the original timing of the presentation is preserved and simultaneous videos skipped, the user will miss some of the contents. If this should be avoided, the timing of the presentation needs to be redone. A possible solution to this is the following:

• If a video starts when some other video is active, freeze the former one to

the current frame • The new video is shown for the duration intended – this time does not

affect the timing of the whole presentation and other objects should not change their state, unless some interactive timing event is fired

• The former video continues and the presentation timing continues as usual

This solution works with interactive timing and allows the user to see all of the contents, but may result in a bit odd timings. Thus, the choice has to be made to either respect the original timing, or to always let the user see all of the contents intended.

It should be noted that in many cases the new video should replace the former one, as that is the timing specified by the SMIL. In such cases, there is naturally no need for special handling.

6.2.3 Audio

As video files, also audio files have to be decoded in real-time, as storing them unpacked would need great amounts of memory. This means that processing power can become a limiting factor, especially with high-quality audio like MP3 or AAC files with a high bit rate. Besides, as is the case with video, the current Series 40 audio architecture has been designed for decoding one signal at a time. This limitation will most probably be solved in the future, but for the current terminals, only one audio files can play at a time.

Page 55: Smil for Mms

48

The obvious solution for multiple simultaneous audios is that newer sounds override older ones. It would be good not to stop the older audios while this happens, so that they will continue as is there was no interruption after the overriding sample has ended. This way if there is for example background music and some short click that is played when some events occur, then the music will seem to continue normally even when it is interrupted for a while. The only problem is that the user will miss some of the contents if a presentation depends on playing many audios simultaneously, but this is hardly a common use case.

6.3 Memory

The amount of run-time memory available does not set any specific limitations for viewing 3GPP MMS – there is enough memory to handle normal messages sent from mobile phones. However, 3GPP MMS does not have the same content size and amount limitations as OMA MMS and it is therefore with any memory size possible to receive a presentation that will cause the MMS viewer application to run out of memory. The application should be aware of this, and monitor the amount of free memory. In case of an out of memory error, it should show an error message and let the user view the content files separately.

Page 56: Smil for Mms

49

7 USER INTERFACE DESIGN

This chapter shows that it is possible to design a user interface supporting 3GPP SMIL features in a small mobile terminal. The Series 40 platform is again used as a reference, but most of today’s mobile terminals have a somewhat similar user interface. Designing a truly useful and optimized user interface that complies with the conventions of Series 40 applications is out of the scope of this thesis, as it would require usability studies and creation of simulation software.

The added complexity in 3GPP MMS compared to OMA MMS has much impact on the user interface design. There is much more functionality, as there can for example be many scrollable items on the screen simultaneously.

Additionally, all the functions in a phone should be easily usable and have similar user interfaces. This requirement is non-trivial for MMS viewer application, because the content viewed needs interactions of many kinds. This sub-section proposes two different user interfaces for the MMS viewer in a Series 40 terminal with three softkeys and volume keys. The Nokia 6230, shown in Figure 5.2 (on page 38), is an example of such a device.

When viewing a 3GPP MMS the user should be able to:

1. Select any of the interactive objects (media elements that cause events that are used by other elements)

2. Select any of the links available 3. Vertically and horizontally scroll any of the visible media objects that have

a scroll bar 4. Pause or stop the presentation 5. Restart the presentation 6. Fast forward or rewind the presentation 7. Set audio volume 8. Mute the audio 9. Select viewing of the content files separately 10. Send (forward) the presentation as MMS 11. Select between resized and scrolled mode 12. Vertically and horizontally scroll the whole presentation

The last two actions are only needed if resizing and scrolling of the whole

presentation are both supported viewing modes and the currently active presentation does not fit the screen.

The next sub-section will present a user interface that includes all of these actions. Because it is quite advanced and might be considered too complex, a simplified version that lacks some functionality is presented afterwards.

7.1 An advanced user interface

The scrolling of the whole presentation is a valuable feature, because it improves MMS interoperability between the phone and devices with a bigger screen. It is therefore included in this user interface design.

To be consistent with other Series 40 applications and the user interface style of the whole system, the user interface of the MMS viewer should be built around using

Page 57: Smil for Mms

50

the softkeys and the four scroll keys. Number keys and the send key should not be needed for standard operation, but they may be used for shortcuts.

The logic of the whole Series 40 user interface is based on that the user is always capable of returning to the previous state with the right softkey, mostly labeled as “Back”. This should also be implemented in the MMS viewer. Also like in any Series 40 application, all the actions that are not needed instantly while viewing a presentation are accessed via an options list, which is opened with the left softkey. The actions that can be placed to the options list are 4, 5, 8, 9, 10, and 11, written with italic in the previous listing. Pausing the presentation should actually happen always when the options list is opened, so that action does not have to be visible in the list. Setting the volume is naturally handled with the volume keys. The hash key could be used as a shortcut to mute audio, like in some current Series 40 applications.

The user interface has not got a mouse pointer or a touch screen, but there should be means to select an active block, i.e. any element whose activateEvent has been used by some other element (more in section 3.5.5) or any element or part of an element that is a link (more in section 3.6.2). Additionally, multiple scrollable regions can be visible simultaneously, and the user should be able to scroll any of them. To achieve this the concept of focus must be introduced – the focus shows which element is the one that will be selected, opened, or scrolled currently and it can be moved around the screen. The focus could be visible for example as a rectangle around the active block, as shown in Figure 7.1.

Figure 7.1 Example visualization of the focus and the scroll bar

Because it is the most complex case, the features of the user interface will be presented assuming the presentation does not fit the screen and scrolling of the whole area has been enabled. If the presentation fits or is resized to fit the screen, some details become simpler.

The focus is moved using the four scroll keys so that a key press in any of the four directions moves the focus to the next active block in that direction. If that block is not fully visible, the presentation will be scrolled so that it is, if possible, fully visible. If there are no active blocks visible in the key press direction, the presentation is scrolled to that direction. This functionality is exemplified in Figure 7.2, in which the balls at the cities in the map are the active blocks, maybe links to other states of the presentation. The scroll right key is pressed between the screens.

Page 58: Smil for Mms

51

Figure 7.2 Moving the focus and scrolling the whole presentation – the scroll right key is used

The middle softkey is used for selecting the currently focused element, which means opening the link or firing the activateEvent of that media element.

So far, the actions have been quite clear, but there is also need to enable scrolling of any of the scrollable regions. To achieve this, these elements must also be focusable. When a scrollable region is focused, like in the second screen shot in Figure 7.1, the middle softkey can be used to enable region scrolling. This enters region-scrolling mode, in which the scroll keys are used to scroll the focused element. To quit this mode, the used must press the middle softkey again.

One exception has to be covered – selecting an element that is both scrollable and a link or a source for events. This case is probably rarely relevant in practice. It can be handled so that the middle softkey enables region scrolling and the link or event activation is placed in the Options list. This must also somehow be shown to the user, probably so that the scroll bar for such an element also shows a hint that the element is also selectable.

The action 6, fast forwarding and rewinding the presentation, is somewhat troublesome. In the existing OMA MMS viewer the user can skip to the next or previous slide with scroll up and down. The 3GPP SMIL does not include the concept of slide, so this is not anymore a useful action, and the scroll keys are fully occupied by other actions. Most PC SMIL players allow the user to scroll the timeline using the mouse – this way the user can easily jump to any part of a continuous presentation. A scroll bar visible on the screen could be a solution, so that it is selectable with the focus like the active areas. It would however use up valuable screen space, and not be that easily selectable when whole screen scrolling is enabled.

A better solution is to use some keys, for example 4 and 6, for seeking backwards and forwards in time. A first-timer will not easily notice the option, but it is conveniently available for an advanced user. This should be acceptable, as the function is not a vital one. A seek could be for example five seconds, or to the next element in the currently active sequential container, as exemplified in Figure 7.3 (on the next page). A good thing about the latter approach, which might seem a bit illogical with some presentations, is that it would work as changing from slide to slide when an OMA MMS compliant message is played, because each slide consists of a par-element, and those elements reside successively in a sequential container.

Page 59: Smil for Mms

52

Figure 7.3 Seeking forwards in time to the next sequential element – the arrows show the element that is activated with seek forwards at a given time

To deal with the complexity of this user interface, the different actions of the middle softkey should be clearly visible and the user should always be aware of the state of the focus and the scrollbars. The softkey text for the middle softkey could be shown and changed according to the focused element and whether scrolling mode is enabled or not. If optimized screen space usage is wanted, the softkey texts should be hidden. This is achievable if the graphical elements for focus and scroll bars are easily recognizable so that they show the user what is going on and what is the current action of the scroll keys and the middle softkey.

The user interface proposed here enables horizontal and vertical scrolling of both individual regions and the whole presentation, opening links and selecting active media elements. It enables viewing of 3GPP MMS messages with even complex features. A downside is its complexity. A simpler user interface with slightly limited functionality is presented in the next sub-section.

7.2 Simplifying the user interface

To simplify the user interface, two limitations are added to the functionality: The whole presentation is not scrollable, meaning that only resized mode is available, and the regions can only be scrolled vertically. The scrolling is probably mostly used with text components, and the natural way to render text is to wrap rows so that its width fits the destination region’s width. Thus, the limitation to vertical only scrolling is sensible.

The thing that made the previous interface complex was the fact that the middle softkey and the scroll keys had many different actions depending on what was focused and if region-scrolling mode was enabled. The two limitations make the set of actions smaller and achievable without such complex behavior. The concept of focus is also somewhat complex, at least to a user that is not familiar with the functionality of the Series 40 xHTML browser, but it cannot be skipped because active elements and links are a crucial part of the 3GPP SMIL and there has to be a way to select these.

The functionality is otherwise similar to the previous interface, but with the following changes:

• The middle softkey functions always activates the currently focused media, i.e. opens the link or fires an event

Page 60: Smil for Mms

53

• The scroll up and scroll down keys are only used to scroll the currently focused region up and down. If there is only one scrollable region, it does not need to be focusable at all – it is always be scrolled when these keys are pressed

• The scroll left and scroll right keys are always used to select the previous and next active block, respectively. The concept of next and previous could be according to the horizontal order of the blocks, or according to some other order

With these changes, the user interface logic becomes more simple and easier to

follow. One problem might be the sequencing of the active blocks, especially if there are plenty of those on the screen simultaneously. Additionally, if the places of the blocks only differ in vertical direction it is quite illogical that the focus is moved with the left and right scroll keys, not up and down.

If a presentation depends on horizontally scrolling a region, then this interface will not be able to deal with it, but this is a minor issue. The biggest problem might be that showing the presentation in its original size with scroll bars is not enabled – this might make some presentations designed for a bigger screen unusable. Nevertheless, as mentioned earlier, there is always the last resort of showing the content files separately, which partly solves this issue.

As the functionality of the middle softkey is more static in this version, the softkey texts are not that important. Still, in order to create a simple interface, they should be visible so that the user knows what the available options are. The text for the middle softkey might also change between something like “Activate” and “Open link”, depending on the focused element. The hiding of the softkey texts might be available via the options list.

As phone user interfaces should as be simple as possible, it might be reasonable to use this simpler interface. How relevant the limitations are depends on the content viewed – when showing messages from other Series 40 phones the limitations will be irrelevant, but SMIL presentations for PC’s might cause problems.

7.3 An alternative to media scrolling

Scrolling is a good method on for example PCs, but may not be the best for SMIL presentations on a Series 40 terminal. This sub-chapter presents another approach that can be used instead of scrolling.

Even though the SMIL standard has the region property fit="scroll", there are two clear disadvantages to implementing this as such on a mobile phone with a small screen. Firstly, especially for test media, the achieved usability is not very good. Let us use a standard MMS SMIL presentation (see Chapter 3.7.1) as an example. A slide will most probably have both an image and some text visible on the screen, so for example only a third of the screen size might be utilizable for viewing the text. Even with a small font only a few (three on Series 40), lines of text fit into the text region. If the text is for example a few sentences, the user will have to scroll down many screens to read the whole text, and may not see even one full sentence at a time.

Secondly, as seen in the previous two sub-chapters, scrolling of medias adds complexity to the user interface, as it increases the number of actions the user has to be able to perform while viewing a presentation.

Page 61: Smil for Mms

54

An alternative to scrolling in presentation mode is to let the user open the media to full screen mode, so that the presentation is paused in the background. Figure 7.4 illustrates this behavior. The media may still require scrolling, but the screen area is utilizes much more effectively. Especially text is much easier to apprehend when more, if not all, is visible.

Figure 7.4 An alternative to scrolling, exemplified with text media

The second advantage is that this approach makes the user interface more straightforward. For the advanced user interface, from Chapter 7.1, the state change from viewing presentation to the “scrolling” mode is more clearly visible. Selecting, or opening a link on, a scrollable item could be done with middle softkey in the full screen view mode, or as previously via the Options-menu when viewing the presentation.

If this approach used on the simplified user interface from the previous sub-chapter, all the four scroll keys can be used for moving the focus, because dedicated scrolling keys are not needed. This is a clear improvement, because using only left/right for moving the focus is far from intuitive, especially if the focusable elements are in a vertical sequence. Selecting a scrollable text element becomes somewhat less usable, as it in the standard scroll-approach is done with the middle soft key while viewing the presentation. That is however a much less needed action than scrolling a media.

This design can also be used for scrolling images. The only difference to scrolling text is that images may need to be scrolled also horizontally, which does not cause any problems because the scrolling happens in a separate user interface state and can use all of the scroll buttons.

Page 62: Smil for Mms

55

8 DISCUSSION

This chapter goes through the most important points of this thesis and discusses the results.

MMS is a strong competitor in future of mobile messaging. Its capabilities are currently limited by the simple scene description language profile used, so there is a clear need for 3GPP SMIL. The advanced features will be most valuable for content created by professionals on a workstation, used for services and advertisement provided by companies to end-users. However, also person-to-person content based on templates can benefit from 3GPP SMIL.

Many features of SMIL make it a particularly good scene description language. Possibly the most important is its profiling mechanism that makes it easy to create different kinds of SMIL profiles for different usage. This makes the same language usable for describing the current, very simple multimedia messages, but also for describing complex presentations for workstation usage. The content control features allow content creators to make the content adapt to system limitations. Moreover, even the somewhat limited 3GPP SMIL is very powerful and has very few limitations as to what can be described.

SMIL syntax is still quite simple, allowing simple presentations to be authored using a standard text editor. It should still be noticed that viewing complex 3GPP presentations on the current low-end mobile phones is not feasible, and thus the added features will decrease interoperability.

Besides MMS’s SMIL profile, also the media types and formats affect the complexity of a multimedia message viewer application. Fortunately, there are already standards for audio and video formats for mobile phones, and these formats are supported by most mobile phones supporting advanced medias, so this should not be a problem for MMS.

The Nokia Series 40 platform was used as the target platform for this thesis. The platform is rich enough for utilizing 3GPP SMIL, but some features will cause problems. Biggest difficulties will be encountered with presentations that have been designed for workstation usage, due to different level of many platform features, especially screen size. Besides this, video is a problematic feature, and the following limitations are probably needed for video: nothing can be shown on top of it (z-index is ignored), transitions are not supported, no simultaneous instances and possible no scaling support. Additional troublesome features are synchronous audios, rapid transitions, and other rapid events. Nevertheless, the resulting user experience can be very rich even with these limitations, particularly compared to that of current MMS SMIL.

The physical user interface of a mobile phone is quite simple and limited, especially compared to that of a PC. It was shown that it is still possible with the Series 40 user interface to support 3GPP SMIL features. However, in order to make the user experience as good as possible, it needs to be evaluated that which of the features presented are really needed and what are the priorities for using those features. The advanced user interface presented is probably too complex for a mass-market product.

As a whole, 3GPP SMIL is a good next step for MMS. It gives possibilities for richer user experiences, but is still feasible to implement on at least the Series 40 platform. The current lowest end platforms will presumably not be rich enough to support all of the needed features.

Page 63: Smil for Mms

56

Some issues may still become problematic, mostly regarding to interoperability. There are phones with different kinds of screen sizes and other features, and PC-to-phone messaging is increasing as well. 3GPP SMIL does have mechanisms to deal with this diversity, so that the presentation can for example use a simpler layout on a smaller screen, or leave out some medias depending on the platform characteristics. Much of the burden is on the content creators – the people designing the presentations or the templates to be used for phone-to-phone messaging.

It is also clear that because the content is richer, there will be more use cases, and thus more diversity in how different phones will handle some special cases. Therefore, the content creators will most probably not be able to use all of the features available on content targeted to multiple platforms, or at least the content should not be dependent on advanced features like SMIL events or viewing something on top of video. These kinds of guidelines will most probably be listed in future MMS documents.

Page 64: Smil for Mms

57

9 SUMMARY

This thesis studies viewing multimedia presentations according to the 3GPP SMIL standard on a size driven mobile phone. The Nokia Series 40 platform is used as the target.

The thesis shows that 3GPP SMIL is a feasible standard, and can be utilized on a Series 40 mobile phone. The 3GPP SMIL profile is much richer than the SMIL profile currently used in multimedia messaging service (MMS), and can thus provide a better user experience.

Interoperability will be an issue, especially for messages sent from a PC. Small screen size of the terminal is the biggest problem, but also the smaller processor and memory capacity may cause problems. Phone-to-phone messaging should not be so much effected by these matters.

User interface design for a 3GPP SMIL viewer is non-trivial, because there are so many actions the user should be able to take. The thesis includes a user interface design that proves, though it suffers from complexity, that such a design supporting all of the features can be made.

Page 65: Smil for Mms

58

10 REFERENCES

[3GPP22.140] 3GPP (June 2003), Technical Specification 22.140 V6.2.0: Multimedia Messaging Service (MMS); Stage 1

[3GPP23.040] 3GPP (June 2003), Technical Specification 23.040 V6.1.0: Technical realization of the Short Message Service (SMS)

[3GPP23.140] 3GPP (June 2003), Technical Specification 23.140 V6.2.0: Multimedia Messaging Service (MMS); Functional description; Stage 2

[3GPP26.140] 3GPP (December 2002), Technical Specification 26.140 V5.2.0: Multimedia Messaging Service (MMS); Media formats and codecs

[3GPP26.234] 3GPP (June 2003), Technical Specification 26.234 V5.5.0: Transparent end-to-end Packet-switched Streaming Service (PSS); Protocols and codecs

[ITU-E.164] International Telecommunication Union (May 1997), E.164: The international public telecommunication numbering plan

[Le Bodic] Le Bodic, Gwenaël (December 2002), Mobile Messaging technologies and services: SMS, EMS and MMS, John Wiley & Sons Ltd

[NOK-S40UI] Nokia Mobile Phones (January 2003), Nokia Series 40 UI Style Guide v1.0, available from http://forum.nokia.com

[OMA-MMSArc] OMA (November 2002), Multimedia Messaging Service: Architecture Overview, Version 1.1 (OMA-WAP-MMS-ARCH-v1_1-20021101-C)

[OMA-MMSCTr] OMA (October 2002), Multimedia Messaging Service: Client Transactions, Version 1.1 (OMA-WAP-MMS-CTR-v1_1-20021031-C)

[OMA-MMSCon] OMA (February 2002), MMS Conformance Document, Version 2.0.0 (OMA-IOP-MMSCONF-2_0_0-20020206C)

[OMA-MMSEnc] OMA (October 2002), Multimedia Messaging Service: Encapsulation Protocol, Version 1.1 (OMA-MMS-ENC-v1_1-20021030-C)

[RFC-2045] Freed, N. and N. Borenstein, N. (November 1996), Request for Comments 2045: Multipurpose Internet Mail Extensions: Part One: Format of Internet Message Bodies

[RFC-2046] Freed, N. and Borenstein, N. (November 1996), Request for Comments 2046: Multipurpose Internet Mail Extensions: Part Two: Media Types

[RFC-2047] Moore, K. (November 1996), Request for Comments 2047: Multipurpose Internet Mail Extensions: Part Three: Message Header Extensions for Non-ASCII Text

[RFC-2387] Levinson, E. (August 1998), Request for Comments 2387: The MIME Multipart/Related Content-type, The Internet Society

[RFC-2397] Masinter, L. (August 1998), Request for Comments 2397: The "data" URL scheme, The Internet Society

[RFC-2822] Resnick P. (editor) (April 2001), Request for Comments 2822: Internet Message Format, The Internet Society

Page 66: Smil for Mms

59

[SNE-EMS-Dg] Sony Ericsson (August 2003), Enhanced Messaging Service (EMS): Developers Guidelines, Fourth edition

[W3C-SMIL1] W3C (June 1998), Recommendation, Synchronized Multimedia Integration Language (SMIL) 1.0, http://www.w3.org/TR/REC-smil/

[W3C-SMIL2] W3C (August 2001), Recommendation, Synchronized Multimedia Integration Language (SMIL 2.0), http://www.w3.org/TR/smil20

[W3C-XML] W3C (October 2000), Recommendation, Extensible Markup Language (XML) 1.0 (Second Edition), http://www.w3.org/TR/REC-xml

[WAP-230WSP] Wireless Application Protocol Forum (July 2001), Wireless Application Protocol: Wireless Session Protocol Specification (WAP-230-WSP-20010705-a)

Page 67: Smil for Mms

60

A SVENSK SAMMANFATTNING

A.1 Introduktion

Mobila meddelanden har blivit en allt större business runt hela världen. Mångfalden hos multimediameddelanden begränsas av språket som används för att beskriva växelverkan mellan och synkronisering av medierna. Synchronized Multimedia Integration Language (SMIL) har blivit industristandarden för detta ändamål och Third Generation Partnership Program (3GPP) SMIL är ett förslag för kommande version av SMIL för multimediameddelandeservice (MMS).

Detta diplomarbete undersöker användning av 3GPP SMIL för att beskriva meddelanden. Som målplattform används Nokia Series 40.

MMS, SMIL och Series 40 plattformen beskrivs i tillräcklig detalj, och sedan diskuteras problem som kan uppstå samt lösningar till dessa.

A.2 Multimediameddelandeservice

A.2.1 Från SMS till MMS

Textmeddelandeservice (”Short Messaging Service”, SMS) introducerades kommersiellt 1992 för GSM. Nuförtiden skickas hundratals miljoner textmeddelanden per år, även mellan olika slags nät. Ett textmeddelande kan innehålla bara 140 oktett data, alltså 160 tecken med sju bitars kodning. Systemet används ändå för många olika ändamål, t.ex. nyhets-, e-post- och väderlekstjänster.

A.2.2 Egenskaperna hos MMS

Som namnet säger, stöder MMS överföring av verkligt multimedia innehåll. För användaren kan servicen verka ganska likadan som SMS, men teknologin bakom är mycket annorlunda. MMS har designats att inte vara beroende av någon viss transportteknologi – samma service fungerar så väl i tredje generationens nät (3G) som i standard GSM nät.

Ett multimediameddelande kan innehålla många filer, som är hopslagna enligt [RFC-2822]. Filerna kan innehålla vad som helst, vanligtvis olika typ av medier som t.ex. text, bild, ljud eller video. För att meddelandet skulle kunna bilda en presentation kan det innehålla en fil som beskriver den grafiska layouten och det tidsmässiga beroendet av medierna, en så kallad scenbeskrivning (”scene description”).

Multimediameddelanden kan skickas till e-post, ha flera mottagare samt innehålla prioriteter. Meddelandets storlek är i nuvarande systemen begränsat till 100 kilobyte, men detta kommer att förändras i framtiden.

MMS är definierad av två olika organisationer, Open Mobile Alliance (OMA) och Third Generation Partnership Project (3GPP). De viktigaste dokumenten finns listade i Tabell A.1.

Page 68: Smil for Mms

61

Tabell A.1. Dokumenten som definierar MMS

Av Dokument 3GPP • Multimedia Messaging Service (MMS); Stage 1 (TS 22.140)

[3GPP22.140] • Multimedia Messaging Service (MMS); Functional description; Stage 2

(TS 23.140) [3GPP23.140] OMA • Multimedia Messaging Service: Architecture Overview [OMA-

MMSARC] • Multimedia Messaging Service: Client Transactions [OMA-MMSCTr] • Multimedia Messaging Service: Encapsulation Protocol [OMA-MMSEnc]

A.3 SMIL

Förkortningen SMIL står för ”Synchronized Multimedia Integration Language”, alltså synkroniserat multimediaintegrationsspråk. Det är ett språk baserat på XML [W3C-XML] som används för att beskriva multimediapresentationer. Språket skapades för att beskriva multimedia i www, men har också tagits i bruk för multimediameddelanden. Nyaste versionen, SMIL 2.0, är definierad i [W3C-SMIL2].

A.3.1 Syntax och struktur

SMIL är lätt att förstå för en som är känner till elementär HTML. Språket presenteras här kort med ett exempel som innehåller de mest utnyttjade egenskaperna. Figur A.1 illustrerar koden för en fungerande SMIL presentation, och följande paragraf beskriver de olika taggarna och deras funktioner.

Figur A.1 Ett enkelt SMIL-dokument

<smil xmlns="http://www.w3.org/2001/SMIL20/Language"> <head> <layout> <root-layout width="170" height="208" /> <region id="Text" width="100%" height="25%" left="0%" top="75%" fit="scroll" /> <region id="Bild" width="100%" height="75%" left="0%" top="0%" fit="slice" /> </layout> </head> <body> <seq> <audio src="fem_sekunder.mid"/> <par dur="5s"> <img src="lappland.jpg" region="Image" dur="4s"/> <text src="lappland.txt" region="Bild" begin="1s"/> </par> <text src="slutet.txt" region="Text" dur="3s"/> </seq> </body> </smil>

Page 69: Smil for Mms

62

Ett SMIL-dokument börjar alltid med taggen smil, som kan innehålla attributet xmlns för att definiera versionen av språket. Inom huvudelementen fanns två olika element, med olika ändamål:

• Elementet head, innehåller definitioner som påverkar hela presentationen • Elementet body, definierar hur de olika medierna beter sig tidsmässigt

I exemplet innehåller head-elementet bara ett layout-element, som bestämmer presentationens visuella layout. Dess första underelement, root-layout, anger storleken av presentationen i bildpunkter. De två följande underelementen definierar båda en region, ett rektangulärt område som visuella medier kan bindas till. Regionen har följande attribut:

• id, regionens namn som kan användas för att binda medier till denna • width, height, left och top, bestämmer regionens storlek och

placering, i detta exempel relativt till hela presentationens storlek • fit, definierar hur synliga medier uppvisas i regionen

I elementet body bestäms de olika medierna och deras tidsmässiga beteende.

Medierna definieras med taggarna audio, img och text, som alla har attributet src, vilket ger adressen till mediets källa, här med filnamn. För synliga medier definieras en region.

I exemplet finns det dessutom två olika tidsmässiga attribut för medier, dur och begin. Det förstnämnda definierar varaktigheten för mediet, alltså hur länge mediet är aktiv, och det andra definierar tiden när mediet aktiveras. Alla tider i exemplet ges i sekunder.

Utöver media definitioner innehåller body-elementet två olika så kallade tidsbehållare (”time container”), nämligen seq och par. Dessa kan kombineras hur som helst inom varandra. Elementet seq gör att dess underelement, som kan vara medier eller tidsbehållare, spelas som en sekvens. Det andra elementet, par, gör att underelement spelas parallellt. Resultat av dessa element illustreras i figur A.2.

Figur A.2 Synkronisering av medierna i SMIL-exemplet (Figur A.1)

Förutom det som presenterades i exemplet, kan SMIL t.ex. definiera medier som beror av användarens indata, länkar till olika tider inom presentationen eller till webbadresser samt visuella övergångar mellan medier.

Page 70: Smil for Mms

63

A.4 3GPP multimediameddelande

Olika multimediameddelande versioner skiljer sig från varandra, förutom i SMIL egenskaper, även i vilka typer av medier som stöds. Målversionen för detta arbete är 3GPPs multimediameddelande dokument, version 5.2.0. Dessa definierar att SMIL-presenteraren skall stöda ljud-, video-, bild- och textmedier, samt SMIL profilen 3GPP PSS. Exakta formaten hittas i dokumenten.

A.5 Series 40 plattform

Målplattformen för detta arbete är Nokias Series 40, och som exakt terminal valdes Nokia 6230 (figur A.3). Telefonen har 8 Mbyte arbetsminne och två processorer: en centralprocessor och en digitalsignalprocessor. Skärmen är 128x128 bildpunkter, och tangentbordet består av en strömbrytare, volymknappar, en bläddringsknapp i 5 riktningar, två väljarknappar, en ring-knapp, en avslutningsknapp samt nummerknappar.

Figur A.3 Tangentbord hos Nokia 6230

Terminalens användargränssnitt bygger på att applikationerna kan fritt använda bläddringsknappen samt vänstra och högra väljarknappen, medan dom andra knapparna används mera sällan. Bläddringsknappen fungerar också som tredje, mellersta, väljarknapp. Volymknapparna har naturligtvis sin funktion även i SMIL-presenteraren.

A.6 3GPP SMIL på en Series 40 terminal

3GPP SMIL är betydligt mindre begränsat än det språk som för tillfället stöds av Series 40 terminaler. De största problemen orsakas av SMIL-presentationer som är gjorda för att uppvisas på en persondator. Då har den ursprungliga målterminalen

Page 71: Smil for Mms

64

mycket större skärm, mer processoreffekt och större minne. Största problemet är skärmstorleken.

När presentationen är större än skärmen, finns det två alternativa lösningar: att rulla hela presentationen, eller att minska presentationen så att den kan visas på en gång. Rullandet ger sällan bra resultat, för då ser man bara en del av presentationen som kan innehålla många olika aktiva element. Dessutom ökar den även komplexiteten av användargränssnittet. Då man minskar presentationen till aktuell skärmstorlek är hela presentationen åtminstone synlig, även om små detaljerna kan försvinna. Användarvänlighet bör föredras.

Som sista chans att visa problematiska presentationen är lista alla medier som finns med, och låta användaren öppna var och en av dem skilt. Då kan användaren åtminstone utnyttja informationen som finns i meddelandet.

A.7 Användargränssnittsdesign

Gränssnittsdesignen som presenteras här baserar sig på följande val (för att förenkla gränssnittet): Hela presentationen går ej att rulla och medier går endast att rulla i vertikal riktning. Det senare är naturligt, för rullandet behövs för det mesta för textmedia, som kan indelas på passliga rader.

Högra väljarknappen används för att avsluta programmet, vilket är standard hos Series 40 plattformen. Vänstra väljarknappen öppnar en meny som innehåller egenskaper som behövs mera sällan.

Series 40 telefoner har inte pekskärm eller muspekare. Konceptet fokus används för att kunna välja länkar, och fokusen visualiseras med en rektangel som ritas runt mediet. Alla medier som har länkar eller är rullbara kan fokuseras. Mittersta väljarknappen används för att öppna länken som är definierat för den fokuserade mediet.

Höger och vänster bläddringsknappar används för att flytta på fokusen. Med upp och ner knapparna rullar man det fokuserade mediet, om den är rullbar.

A.7.1 Ett alternativ till rullandet av media

Som alternativ till rullandet av medier, i en presentation, kunde följande användas: Man kan aktivera rullbara medier med mittersta väljarknappen, och då man väljer mediet stannar presentationen upp och endast det valda mediet öppnas i hela skärmen. Användaren kan då rulla mediet för att se den helt, och med hjälp av högra knappen komma tillbaka till presentationen. Med detta alternativ används skärmen mera effektivt, om det t.ex. finns en längre text. Dessutom behöver rullandet inte begränsas till endast vertikal riktning.

A.8 Diskussion

Detta arbete forskade i presentation av multimediameddelanden enligt 3GPP standarden på en Series 40 mobiltelefon. Slutsatsen är att 3GPP SMIL är en utförbar standard. 3GPP SMIL-profilen är mycket mer omfattande än den som nuförtiden används i multimediameddelanden, och tillför en allt rikare användarerfarenhet.

Samverkan mellan olika slags terminaler kommer att orsaka problem, speciellt med meddelanden som skickas från en persondator, för att då har den mottagande terminalen mycket mindre skärm samt mindre processor- och minneskapacitet.