EMOTITONES: A USER INTERFACE FOR MUSICAL...

EMOTITONES: A USER INTERFACE FOR MUSICAL COMMUNICATION

Shefali Kumar Friesen

Submitted in partial fulfillment of the requirements for the Master of Music in Music Technology

in the Department of Music and Performing Arts Professions in The Steinhardt School

New York University Advisors: Dr. Kenneth Peacock, Dr. Agnieszka Roginska

06/10/2011

Table of Contents

I. Background A. Foundations of musical communication 4 B. Game-changing technology 5

II. Limitations and Motivations A. Limitations or current technology 7 B. Motivations for designing a new interface 8

III. Factors influencing design of the user interface (UI) A. Emotion-based navigation and content categorization 10 B. Media-rich messaging 16 C. One to One Communication 21 D. Mobile delivery 28

IV. Development A. Interface and message flow 34 B. Application deployment 39

Application Screenshots 42

V. Discussion and Conclusions A. Summary 47 B. Expanding features 48 C. Future scope 50 D. Predictions 51 E. Limitations 52 F. Emotitones as a tool 52

Appendix 55

References 56

3

I.

FOUNDATIONS OF MUSICAL COMMUNICATION

“Music is a fundamental channel of communication: it provides a means by which

people can share emotions, intentions, and meanings.”

-Hargreaves, MacDonald, and Miell

A. A few words on communication models

The word communication has many different definitions to date. While previous

definitions specified valid communication channels, today we accept communication to

be: the “imparting or exchange of information, ideas, or feelings”, or simplified further,

“an act or instance of transmitting” (Miriam-Webster, 2011). Despite (or perhaps because

of) this broad definition, and the progressive connotation of the concept, researchers in

music continue to wrestle with demonstrating proof that music is a valid form of

communication.

At the core of accomplishing this task, many attempts have been made to prove

that music is a language and thus a form of communication. Researchers have made

efforts to apply linguistic principles such as semantics to music, claiming that for music

to be qualified as a language, it would have to follow the same rules (Hargreaves et al,

2005). As scholars discovered, however, music is applicable to structural semantics

(notes, chords etc), but lacks definable meaning (Kuhl, 2008). This “fluidity” of meaning

in music, has made applying the ‘language’ label a difficult theory to support; as while

4

some characteristics such as tempo and harmonic progressions are measurable and

definable, others such as emotional response, are personal and cultural - making them

very difficult to measure by any means (Kuhl, 2008).

While at the very least, the formal structure of music, as well as its tendency to

communicate meaning, makes for an elegant metaphor between it and language, later

communication models, influenced by semiotics and cognitive science, placed emphasis

on the usage of language rather than language as a system (Kuhl, 2008). In other words,

rather than proving that music is a language, it is more productive to examine the

comparisons between the system of music and the system of language. As discussed in

III, using this approach, there is support for music as a transmission of meaning,

emotion, and understanding.

From a historic perspective, music and language have fundamentally existed in all

human societies (Mithen 2006). Some archeologists assert that music and language even

existed in prehistoric societies (Blacking, 1973) and (Sloboda, 1985). With research in

favor of these findings, musical communication of today, is simply a continuation of

historic behavior.

B. Information technology and music technology

“Information technologies have profoundly affected communication processes by

simplifying queries, liberating energy, reorganizing semantic axes and points of view,

and reorienting the relationship between the production of meaning and our socio-

technical environment” (Tanzi, 1999).

It has always been the role of communication technology to enable more meaningful

human connections and to increase both the efficiency and frequency of information

5

exchange. Regardless of the innovator’s motivation, communication technologies alter

cultural behaviors and typically make the world more accessible.

Just as communication has been influenced by information technologies, musical

communication has also been influenced by music technologies. One example is the

advent of the radio broadcast and its resulting behavior: music dedication. This cultural

practice began as the Long Distance Dedication on the American Top 40 radio show in

1970. A DJ (Kasem) would play a mailed-in record while reading the accompanying

letter dedicated to a listener. The tradition has continued for over four decades on the

radio, and over mix-tapes, mix-cds, and digitally shared playlists.

Another game-changing music technology innovation is the mp3. The Moving

Pictures Expert Group made a break-through with their implementation of lossy data

compression/resolution and perceptual coding (http://mpeg.chiariglione.org/). Music

files could as a result, be relatively small without compromising on auditory quality.

Music is everywhere as a result, and implemented in online and offline communication

behaviors (discussed in III).

The ringback, while not quite a game-changer, facilitated a more intimate and

direct form of musical communication. In 2001, engineers found a way to manipulate

what was heard on the caller’s side of a ringing phone. For the first time, a mobile

phone owner could designate a unique song that each contact would listen to instead of

the typical ring. Each song selected would communicate the dynamic of the relationship

in some way.

Other examples of game-changers are the digital recorder, digital audio

workstations or DAW, mp3 players, and mobile music such as ringtones, all of which

either changed or introduced new ways of communicating with music.

6

II.

LIMITATIONS AND MOTIVATIONS

A. Limitations of current technology

As technology advances, so does communication - particularly in the age of social

networking. Short Message Service texts or SMS messages are the most prevalent form

of communication worldwide, and is remarkable in ability to communicate to anyone,

anywhere, anytime. However, as revealed by case studies (to be shared in coming

sections) text-based technologies often lead to misinterpretations of tone and meaning

in messages. Whether the absence of intonation in an email leads to error in sizing up a

relationship with a colleague, or a teenager misinterprets a “to the point”

communication style to mean her loved one is laconic and uninterested, it is clear that

the nature of today’s digital communication lacks the clarity of vocal communication, or

better yet, face-to-face interactions. “It is too easy to misunderstand something if you

can’t read the body language and the faces - a single word can easily be

misunderstood.” (Stald, 2008).

Aside from the lack of emotional cues in today’s communication channels, the

number of words allowed to convey an idea are often limited to 140-160 characters.

Even an eloquent communicator would have trouble conveying his or her mood

accurately.

Another limitation which will be explored in the next section, is that free, and

novel communication platforms tend to favor the broadcasting communicator, or the

contributions to the live “feed,” over the thoughtful, expressive communicator. The

7

rapid rate of information exchange has made communication fleeting in some ways.

The result, is many users who have a voice, but do not know what to say, or who to say

it to.

One of the coping strategies for emotional limitations is emoticons, or the faces

made from punctuation and or letters to convey mood. They date back to 1800s as a

concise way of representing emotions in theater (Bierce, 1912), but digitally they have

surfaced in a number of ways. For example, when text-based virtual realities called

MUDS (multi-user domains) began popping up in the 90s, they gave rise to an exclusive

and universally understood language made up of “idioms, acronyms, and iconic

emoticons,” which were created not only “to economize keystrokes, but also to help

define the contexts for conversations, establishing responsiveness and attentiveness,

communicating understanding, initiating play, describing actions in real life and

conveying mood, feeling and emotion” (Cherny, 1995).

Today, the prevalence of emoticons is significant. In instant messenger applications

from AOL IM to Blackberry’s BBM, emoticon menus are embedded and quick to call

upon in order to establish a mood, clarify context, and other various shared meanings.

B. Motivations for designing a new interface

The Emotitones name derived from the concept of emoticons. Emotitones are

auditory emoticons, in some ways, which although longer, and more complex, aim to

provide an enriching and non-text-based tool for expression (see expanding features in

Discussions). The logo for emotitones is an emoticon with an eighth note for a mouth.

8

The motivation of providing an emotionally-rich channel for communication, is

supplemented by the motivation to help the music industry, a business which has been

wounded and dysfunctional for over a decade.

The past decade has proved to be a time of resurgence; the beginning of a comeback of

sorts for the music industry. Even after several years, however, the model is still broken,

and solutions are needed to restore public opinion about the general value of music.

From advertising and sync licensing, to ringtones and radio, music has fulfilled many

commercial needs by tapping into the emotions of the end-user. In each case, however,

music is consumed passively. To demonstrate the greater value inherent in a song, the

listener must be actively engaged by music, or a musical experience. This engagement

should result in an action (or transaction) of some kind.

From an artistic perspective, lyrics are a powerful, but overlooked communication

tool. They convey common emotions, and situations relatable to any given person at

any given time. This is evident in dedications, musical greetings, mix-tapes, and music

sharing, and was a motivation for constructing the Emotitones platform.

As mentioned, musical interaction and communication are not novel concepts.

Emotitones aims to focus on the shared meaning between sender and receiver, instead

of performer and listener. This will facilitate a shift from passive listening to active

communication.

Given this motivation and the limitations of emotional expressivity in today’s digital

communication, this thesis proposes a user interface for musical communication called

Emotitones, and the elements contributing to the design of the UI.

9

III.

FACTORS INFLUENCING THE DESIGN OF THE USER INTERFACE (UI)

After committing to creating a solution for musical communication, there were several

decisions that went into the design of the Emotitones user interface. While prototyping

the user experience felt relatively natural, significant research went into each design

element. The first decision was to make the navigation and categorization, emotion-

based.

A. Emotion-based navigation and content categorization

Navigation is the backbone of user experience in many applications. It is a

prediction of user mindset- and must address the following questions: what will the

user be thinking from page to page? and what will motivate them / guide them to

complete the desired action?

With musical communication as the end goal, it was crucial to focus on the

message; not text-based information, but rather audio-based emotion. To choose

emotion-based navigation, it was necessary to find confirmation that emotional

expressivity in music exists; in other words, adequate research suggesting that music is

effective at conveying emotion (or concise ideas/sentiments) was required.

The most compelling research addressing the emotional expressivity of music,

comes from studies in cognitive neuroscience within the last five years. Researchers

know that it is possible to prime a stimulus for an expected result. In other words, by

presenting one concept to a subject, the expectation of another concept can be created.

10

The former concept is called a primer, and can be more or less effective based on how

related it is to the later concept. This phenomenon has been used to test several theories

on cognitive processing. Before highlighting these studies, some brief comments on

priming and the N400 are necessary.

The brain has at least two systems: the first is implicit and responds quickly and

automatically, and the other is an explicit system designed to interpret signals from the

implicit signal (Kubovy, 2008). Priming, an important factor in recent experiments

measuring the cognitive processing of music, is an effect of the implicit memory, in

which exposure to a stimulus influences response to a later stimulus (Kolb & Whishaw,

2003). Here is an example of priming described by Michael Kubovy, in the Library of

Congress lecture series Music and the Brain: table 1.1:

In table 1.1, the words being compared are SOFA and SOFA. In subject 1, because SOFA

is preceded by the word COUCH, it is said that it has been primed for an expected

result, where as SOFA preceded by DISK in subject 2, has not been primed for the

expected result. Similarly, CAR has been primed by TRUCK, but not CAR preceded by

11

subject 1 subject 2

CARPET CARPET

COUCH DISK

SOFA SOFA

HAMMER TRUCK

CAR CAR

PICTURE PICTURE

table 1.1

HAMMER. When a stimulus has been primed effectively, it can be accessed more

readily than the same unprimed stimulus, therefore leading to more streamlined

cognitive processing (Kohl, Whishaw, 2003). To measure such claims,

electroencephalography (EEG) was used to record the faint electromagnetic fields

emitted by the brain; more specifically, waves less than 100 Hz indicating brain activity

(Niedermeyer, Silva, 2004). The voltages were picked up by electrodes, amplified, and

recorded into a computer where they were analyzed.

Before measuring activity related to unexpected stimuli in experiments surrounding

cognitive processing, it is important to first measure results of expected stimuli. Kubovy

again illustrates an example: The word dog was displayed on a screen over and over,

while the subject’s brain waves recorded. The waveform of the stimulus was aligned

with the recorded waveform of the response, and an average was taken of the two,

resulting in the event related potential, or ERP. This is, in a sense an extraction of how

the brain is responding to a stimulus (Kubovy & Shatin, 2009). When dealing with

unexpected stimuli, negative voltages result in events recorded at various time intervals

after the introduction of the stimulus. For example, when dealing with the sentence “I

like my coffee with cream and dog” the unrelated and unexpected presence of dog

prompts a reaction in the brain while processing. This ERP occurs 400 milliseconds after

the unexpected stimulus, and is thus called the N400. The N400 is present in many of

the studies at hand (Kubovy, 2006).

Koelsch and his team achieved breakthroughs in their ability to make studies of

cognitive processing applicable to the processing of music. To do this, experiments used

music and language as two separate approaches to priming the same word, with the

results compared.

12

In one language priming study, for example, the word “wide” or “wideness” (from a

German translation) was primed by one sentence: “the gaze wondered off into the

distance”, while the second sentence was constructed with no association to the word

wide: “The manocles (hand cuffs) allow only a little movement.” The result was

expected: a higher N400 for wide/wideness proceeding the second (unrelated)

sentence.

To test this in the musical realm, Koelsch et al used a musical excerpt by Richard

Strauss (Opus 54 “Salome”) to evoke the feeling of the word “wide” or “wideness”. The

second musical excerpt was from a more dissonant and closed-feeling piece by Valpola

(from the E-minor piece for accordion). Just as the related prime in the language

experiment yielded a lower N400 than the unrelated prime, the musical prime by

Strauss resulted in a lower N400 in subjects than the musical prime by Valpola. This was

a result of the characteristics of each piece. Because the piece by Strauss exhibits a multi-

instrumental, consonant, and seemingly grand and expansive sonic quality, it evokes a

feeling of wideness more so than the Valpola piece which is dissonant, with fewer

instruments, with a seemingly closed feeling. The smaller value for an N400 peak is an

indication of less cognitive processing of a concept.

The illustration of the N400 responses (the peak on each chart as indicated on the

C2 chart) in figure 1.2 shows the cognitive responses to the concept of wideness using

language-based and music-based primes. The dotted line represents the ERPs (event

related potentials) responding to the unrelated primes, and the solid lines represent the

ERPs in response to the related primes. In examining the two charts comparing the

unrelated and related N400s for both music and language, it can be concluded that the

musical primes are just as effective at conveying the idea of wideness as the language

13

primes are. This is indicated specifically by the differences in values of the N400 peaks

comparing unrelated and related data points. The space between the two ERPs both for

music and language primed-stimuli show the readiness of the brain to receive

effectively primed concepts.

Koelsch and Steinbeis did a similar study recently, but instead of using excerpts of

related and unrelated music to prime words, musical chords with consonance and

dissonance, major and minor keys, and varying timbres were used as primers for

emotionally congruous and incongruous words. N400s were again measured and it was

found that the emotionally congruous words yielded a lower N400 in both musically

trained and untrained subjects (Steinbeis & Koelsch 2010). This was a study in “affective

priming effects of musical sounds on the processing of word meaning” and resulted in a

14

Figure 1.2

N400

Event Related Potentials

priming using Strauss piece priming using Valpola

powerful contribution to the discussion of musical communication. In a similar study

on duration, it was shown that this type of meaning, can be conveyed within a subject

hearing 250 milliseconds of music (Bigand et al, 2005), and in other studies, meaning is

conveyed in the changing of one semitone, or timbral characteristic (Sloboda et al, 2007).

The findings suggest that “musical mode can effect the processing of language on

affective level.”

On the subject of ideas or affect conveyed by music, it is worth mentioning, that

what all musical structures used in these studies have in common, is that they are

representative of concepts in one of three ways: 1) by imitation, 2) by association, or 3) a

sense of embodiment (Kubovy & Shatin, 2009). For example, when trying to prime a

subject to choose a circle over a square, a researcher may play a short musical excerpt

conveying something “smooth” sounding verses something “angular”. Because the

brain recognizes the embodiment of a circle, it can be primed by characteristics

embodying the same concept. This phenomenon of mixed metaphor has been studied in

great deal by experts in synesthesia - who would assert that sounds and music can

represent a concept so powerfully, that they can convey meaning in other senses entirely

(Cytowic, 2009). The notion that music is representative of concepts and emotions,

suggests that allowing a user to choose his or her own music to convey an idea based on

emotion and categorization, would be effective.

Many musical representations in today’s society most likely are a result of cultural

learning, (such as a listener’s association of fanfare with royalty). “By communicating

an emotion, however basic, music can refer to a variety of different affective states,

which are more or less, unanimously understood by listeners familiar with the musical

idiom” (Juslin, 2003). Music representation by association is most likely to be more

15

culturally dependent than other musical representations, and is dependent on the

listener drawing on allusions (Kubovy, 2006). With a rich cultural memory, listeners

have implicit knowledge of the music in his or her culture.

Even beyond cultural associations, recent studies have demonstrated that certain

emotions in music are recognized both by Western and non-Western listeners. These

subjects were able to classify western pieces as being happy, sad, or scary, without any

familiarity with the pieces (Fritz et al 2009). This idea of universal emotions also

supports emotion-based navigation.

Unlike the past decade in which the emotional expressiveness of music was

limited to theories and proofs dealing with structural comparisons between music and

language, currently there is clearly enough support to base navigation and content

categorization on emotion. The details and better picture of what this means will be

explored in the development section of this paper.

B. Media-rich messaging

The second UI decision to explore, is the enabling of media-rich messaging - in this case

attaching a musical excerpt to the message being communicated, or making the musical

excerpt the message itself. Research on this subject is lead by the communication

sciences, but is becoming more cross-disciplinary due to the rise of social-networking

technologies. Research by Weber & Mitchell provides a good starting point for this

discussion of the effect multi-media has on communication efficiency.

Observably, digital users adopt multi-media into their communication on a regular

basis. Whether the media comes in the form of an embedded video link, or photo

attachments, the content becomes a crucial part of the message. Taken a step further,

16

with ever-evolving social platforms and enabling technologies, users (particularly

young users) find the means to modify these media artifacts to make them their own.

“Young people’s own digital productions facilitate a blending of media, genres,

experimentations, modifications, and reiterations, which Mizuko ”Mimi” Ito describes

as a media-mix” (Weber & Mitchell, 2008). This customization of media can be seen in

remixed audio and video files pervasive around the internet. Common examples

include film footage that has been overdubbed with audio created by the user or taken

from another source; in the many renditions of unlicensed cover songs; and in the

personalized computer animations used as digital greetings. This consumable nature of

‘media-mix’ is described by Henry Jenkins as being “production”; not production for

commercial purposes, but for “interactive consumption,” in which a user consumes

media including images, audio, and video, to create their own media productions

(Weber & Mitchell, 2008). Best paraphrased:

“users merge digital technologies with commercial media narratives in the context

of specific communities, in effect fusing and remaking both the narrative and the tool.

From early scrap-booking practices in Studio-era Hollywood to the audio mix tapes of

the 1970s, to the fan fiction and textual poaching explored by cultural studies

researchers, we know that viewers and readers have long “re-mixed” or poached

commercial culture” (McPherson, 2008).

The limitations of plain text have become apparent to digital culture, leaving

media-less messages best for informational purposes only. While the production aspect

of communication is predominantly exercised by youth demographics, the interactive

consumption aspect of today’s digital communication has reached ubiquity. This can be

observed in social-networking sites such as Facebook and Twitter.

17

Historically, multi-media implementations have lead to new styles of

communication. MUDS, as mentioned earlier, led to the creation of an exclusive

language made of idioms, acronyms, and emoticons. Another, more complex example is

Machinima culture.

Machinima is the result of creating animated movies in real-time through video

game technology. Or more elaborately, visual narratives “created by recording events

and performances (filmmaking) with artistically created characters moved over time

(animation) within an adjustable virtual environment (3D game technology platform or

engine)” (Lowood, 2005). While the example seems to impose an esoteric knowledge

requirement on the user, the digital youth of today have a similarly fluent and complex

handle on multimedia implementation, and like machinima users, view interactive

capabilities with peers, equally valuable. Machinima users were able to exploit a

technology platform, in order to express themselves, while simultaneously creating a

subculture. This aspect of subculture, is also important in supporting specifically music-

rich messaging.

Media-rich messaging also supports the school of thought that multi-sensory

messages, lead to more emotionally-rich communication or experiences. “When

modeling a communication experience, designers tend to limit user interaction to visual

cues, occasionally accompanied by sound. But reality is actually multi-sensory and

packed with an array of complex emotional cues... “(Metros, 1999), and “...the more

modalities a medium uses, for example images and sounds, the more senses are

activated and the more effective is the feeling of presence” (Stald, 2008). Support for this

assertion that delivering information through more than one sensory experience is

18

effective, is found in the creation and implementation of earcons - an auditory tool that

has been used to convey information for decades.

Lemmens et al define earcons as “audio messages used in human-computer

interfaces to provide information and feedback.” While they are typically short (often

less than 500 ms), they create strong associations, acting as cues for specific tasks that a

user carries out. Both Windows and Apple computers have a history of using earcons

which tell a user when they have carried out specific functions such as booting up their

computer; opening files; saving files; and putting files in the trash. Earcons can confirm

that a task has been carried out successfully; inform when an error or something

unexpected has occurred; warn when something is failing, or needs attention, and

occasionally act as bells and whistles to an otherwise mundane task. A surprising

number of studies have been done on earcons, including methods of creating them; how

musical elements contribute to their efficiency; the resulting associations formed by

users; the psychological impact of positive and negative earcons; and how they relate to

their visual counterpart. Most compelling in the discussion of music being used to

create emotionally-rich communication through multi-sensory experiences, are the

elements that earcon designers consider when approaching each audio cue. For

example, in the study done by Lemmens et al it was asserted “the difference in affective

appreciation of the major and minor modes can be incorporated in the set of

transformations for earcons. The major/minor transformation can then be used

specifically to create affectively-charged earcons for use in affective human-computer

interfaces” (p. 2018). On the flip side, a study was done in 2010 on the potential hazards

of using dissonant warnings for technical errors - possible creating too strong a negative

visceral response in the end-user.

19

In another study called “Designing Earcons with Musical Grammers”, Hankinson

and Edwards recall early earcon designers who stayed away from compositions using

more than four notes as to avoid musical associations and affect. Their study, on the

contrary asserts that if used correctly, musical gestures and associated grammars

applied to earcons can provide the user with rich information. This notion is confirmed

by other researchers who have pinpointed the capability of conveying affect through

pitch, rhythm, and timbre.

The concise nature and observable impact of earcons have made them an

intriguing subject in examining audio-visual tools. Methods in cognitive processing

(similar to those mentioned earlier) allow researchers to observe how earcons (varying

in sonic quality) effect the brain’s ability to process information, as well as form

affective responses. Congruency also plays a role - or how closely an earcon matches the

concept it is trying to convey. Researchers use stimulus-response compatibility (SRC) to

label or describe efficient implementations which result in improved user performance

(strong stimulus-response mappings) (Lemmens et al, 2018). This is part of affective

computing research.

Despite the wide range of research, all are in agreement that earcons effectively convey

information, thus improving human-computer interaction. Best said in the context of

emotitones, “Earcons could be used, in any program employing emoticons, to more

easily differentiate between positively and negatively valenced emotions” (Lemmens et

al, 2024).

In a culture where digital users receive information, form impressions and share

perspectives through consumption of multimedia, Emotitones facilitates this

communication behavior further by allowing users to express themselves through the

20

multimedia content itself; more specifically, through musical content. Decisions

regarding the transmission of these messages begin with the consideration for who the

receiving audience is.

C. One to One Communication

This section of the user interface discussion concerns the communication channel

specifically, and the decision to enable a one-to-one channel verses a broadcast.

The strength in research favoring peer-to-peer communication over the type of

communication demonstrated in blogging, status-updates, and tweeting cultures, rests

in the assumption that a communicator is more invested in a message directed at one

user than in a broadcast to an undefined group of people.

The main difference between the two approaches, is that in direct communication, the

presence of the receiver / listener is crucial, and must be considered by the sender /

communicator. While communication theories defining these roles are progressing as

technology evolves, they stem from traditional models of communication, and are

adapted as needed.

The information transmission model by Shannon and Weaver was widely favored

in the mid-20th century. In this model, the communicator chooses a specific channel to

deliver a message to a targeted receiver (Hargreaves et al, 2005). Many musical

communication researchers consider this an oversimplification, arguing that

communication (musical or otherwise) involves creativity and interaction between the

performer (sender) and listener (receiver): the communication is "much more interactive

and re-creative than is suggested by the idea of information being passed from one

21

person (e.g. the performer) to another (the listener)” (Hargreaves et al, 2005). The

listener, they assert, has a role in defining or interpreting the message (or piece of

music), and therefore cannot be compared to a passive listener. Modern theories of

musical communication address this shortcoming, however they lack consensus in

terms of what roles the communicator and receiver play, and as to whether or not

musical messages have coded meanings (Kendell & Carterette, 1990).

The distinction between composer and performer was accepted however, and added as

an extra step in the communication chain.

This meant that the performer, had to first decode, then interpret musical meaning, then

re-encode the message before sending to the listener, where in which “each of these

processes is dependent on the shared implicit and explicit knowledge of all three

participants in the chain, and is influenced by the context and environment within

which the process takes place” (Hargreaves et al, 2005).

22

figure 1.3

Figure 1.3 shows the complex but elegant musical communication model by Juslin,

who addressed the uncertainty between the listener’s perception and his or her affective

response, as well as defined the composer’s role as a “causal” influence on the listener

(Juslin, 2003). His studies also examined the translation of intention (composer’s and

performer’s), and resulting affective response in the listener. Because the composer’s

intention is translated by the performer’s intention, the performance takes on acoustic

features that effect and shape the listener’s perception. The patterns that the listener

then recognizes and internalizes, formulate a response - possibly emotional, and thus

lead to a new mental state or experience (Hargreaves et al, 2005). In the case of

Emotitones, it is reasonable to say that the sender is, in effect, a second performer -

interpreting the original message of the composer and singer, and again encoding the

piece of music just before sending it to a receiver, who will be influenced by four

considerations, the original composition/writing, the original performance, the sender’s

added comments/impressions, and finally, the receiver’s own associations with the

piece.

Like Juslin, other scholars have been integral in addressing some of the subtleties

of communication chains. Speaking to the importance of the receiver, Johnson and

Laird’s model asserts that when a communicator codes a message to the receiver, the

message becomes symbolic, or a representation of what the sender wishes to send the

receiver. The receiver must then decode the message, and therefore must have a mutual

understanding of what the symbolic coding means (Hargreaves et al, 2005). While a

performer on stage may opt to take artistic liberties in favor of direct, clear

communication of a specific concept or idea, a user wishing to communicate an idea or

emotion to another person, will be unsuccessful should he/she opt to send a vague,

23

coded message with no regard for mutual understanding. This is the difference between

expression and communication. It is expected that designing the Emotitones UI using

one-to-one communication (or communication within a small group), will prompt the

sender to consider mutual understanding, and thus, will result in a more successful and

fulfilling communication exchange.

On a side note, it has been suggested (astutely) by Tanzi, that the first

communication decision belongs to the composer: “The composer must decide whether

to hold on to sonic memories” or to let “algorithms dispose him of them. Music is thus

ultimately cognitive and anthropological, not merely musical” (Tanzi, 1999).

While progress in neuroscience has put musical communication models in the

context of information processing in the cognitive system, as discussed in III, and others

have focused on modeling musical communication after language models using

semantics and semiotics (also covered in III), other studies focus on communication

models as influenced by digital technologies in the age of social-networking, resulting

in the highly expressive nature of communication.

The concept of expression as a form of everyday communication is a new

phenomenon, and one that indicates that a one-to-one channel for expressive

communication is a logical next step for a new technology interface. Today, the logistics

of message delivery (such as email platforms, and SMS platforms) are taken for granted,

as users live in total ubiquity of technologically-driven communication channels. In

other words, if a user sends a text message, he or she does not feel uncertain about

whether the message will reach the recipient. Studies today focus on other layers of

complexity; for instance, instead of computational thought being spent on how an

intended message gets from the sender to receiver, users must consider the construction

24

of the message itself, and what channel to use for the delivery of the message. These are

elements of new communication behaviors surfacing in the digital realm, and are being

studied by researchers in several disciplines.

With the tools for media-rich communication readily available, and the wide

choice in channels for message delivery, individual expressivity plays a much greater

role. Communication exchange cannot happen without a series of individual decisions,

each a part of the communicator’s preference and identity. The following are studies

speaking to the influence of expressivity and identity in communication.

As new behaviors in the digital age emerge, theories and observations regarding self-

perception and the formation of identity, are arrived at by applying traditional school of

thought to modern, practical situations.

Erving Goffman, is still quoted and studied today by digital theory researchers.

His “impression management” speaks to the tendency for individuals to monitor and

guide others’ impressions by altering their own settings, physical appearance, and

manners (Goffman, 1959). In today’s context, the performance of self “applies not only

to face-to-face interaction, but also to asynchronous and real-time interaction on the

internet. While Goffman could not have predicted the dynamics of computer-mediated

interaction, his model works because users, socialized in face-to-face interaction are

often conscious of applying the rules of such interaction to the cyber world” (Westlake,

2008). This is reminiscent of facebook user behaviors. Posting, tagging, and updating

status, are actions typically broadcast to all other “friends” on a user’s profile, with each

post carefully deliberated. Goffman labelled social interaction as being “dramaturgical”

in that it is like a theater performance. His metaphorical “front” stage and “back” stage

25

distinguished between people acting or conforming to social rituals at gatherings, and

people behaving when not playing a role and free to be themselves, respectively

(Buckingham, 2008). “While certain elements that Goffman defined as part of the ‘front

stage’ performance are absent in the computer-mediated interaction (visual cues such as

clothing and facial expression and aural cues such as tone), they are replaced in chat

and on websites by more “staged” elements such as font, photographs, music, and

graphics” (Westlake, 2008). These staged elements become the characteristics of a digital

individual, who can “tell stories of sorts (often non-linear and multi-voiced) and leave a

digital trail, fingerprint, or photograph” (Weber & Mitchell, 2008).

The “production” and “interactive consumption” discussed earlier are also

identity forming. Weber and Mitchell credit reflexivity as one explanation of how

consumption and production contribute to identity formation: “Firstly their own media

production (both through its processes and its outcomes) forces young people to look at

themselves, sometimes through new eyes, providing feedback for further modification

of their self-representations. Secondly, the source materials and modes of young

people’s media production are often evident or transparent; the choices and processes

that they use reveal and identify them in ways that they themselves might not even

realize” (Weber & Mitchell, 2008).

Digital artifacts used in remixing and in expression over social media channels

range in media type, duration, and format. Music-based examples have the most

relevance in consideration of a musical communication tool.

“Music is one of the most widespread and significant cultural objects that enhance

dimensions of people’s everyday life, and thus has become a significant component in

26

the domains of cognitive, emotional, and social functionality” (Hargreaves & North,

1999).

The concept of music being a one-to-one interaction already exists through music

sharing. Aside from those mentioned briefly in the introduction, peer-to-peer sharing

applications, mp3 websites, and social-networking sites allowing profile music, all

enable music sharing. While in some cases, these websites are used to display music in

the public domain, music preferences, or music choices are often shared between peers.

“Music represents a remarkable meeting point of the private and public realms,

providing encounters of self-identity with collective identity” (Hesmondhalgh, 2008).

The sophistication and method to sharing music in a meaningful way, Valcheva calls

“Playlistism.”

In making playlists, people characterize themselves, and express their personality

while capturing the emotional state they are in (Dijik, 2006). Ebane et al go so far as to

say playlists are a “reliable personality barometer and a locus for negotiations of

meaning, identity, and online presence” (Ebane, Slaney, & White, 2004). Anecdotally,

most would say this is true - music preferences have strong associations to subculture.

Frith has done many studies on this phenomenon, and made the conclusion that music

functions as a “badge” for social beings. This badge-like quality of music, “is claimed to

communicate value, attitude, and opinion to others and thus a means of identity

representation and self-expression” (Valcheva, 2009). Frith’s findings also assert that an

individual’s musical selection highlights some of the unconscious personality traits that

person has.

Several studies have examined the effects and functionality of music sharing

technologies (using playlists) including: iTunes (Voida et al, 2005), Napster (Brown et al,

27

2001), last.fm (Fitzpatrick 2008), Webjay (acquired and shut down by Yahoo), Push!

Music, and TunA (Bassoli et al, 2006) (Valcheva 2009). In the case of last.fm, the platform

allows users to share playlists, construct visualizations of musical taste, and express his

or her identity through musical subculture. While TunA allows users to stream other

users’ playlists in a “eavesdropping” manner, Push!Music is a novel system which

allows users to “push” songs while mobile, in an effort to share music preferences, and

make personal recommendations. This peer-to-peer interaction increases the value of

musical interaction by placing importance on the receiver; if a song is being sent as a

recommendation, the sender has taken the receiver into consideration.

Making emotitones deliverable to individuals goes one step further; the sender

must consider if the message in the song itself is what should be communicated, not just

the receiver’s potential affect to that style of music.

D. Mobile delivery

Thus far, current research / studies support a user interface which hosts emotion-based

navigation and content classification; media-rich messaging, and peer-to-peer

communication. The next UI element to consider is the method of delivery. After

surveying the current reigning information technologies, it was clear that Emotitones

would have to consider delivery over the mobile platform.

“.. seen in this very broad evolutionary perspective, the significance of the mobile

phone lies in empowering people to engage in communication, which is at the same

time free from the constraints of physical proximity and spatial immobility” (Geser,

2008).

28

One simple but powerful aspect favoring mobile devices is their worldwide

dominance; their ubiquity.

The economic research illustrating this world wide dominance of mobile devices and

mobile internet within the last two years alone is more than enough to make a decision

on this delivery method (there are over 4.6 billion mobile users in the world); however,

the design of the Emotitones user interface is based on neurological, technological and

sociological analyses, not on economics.

On the subject of the ever-present nature of mobile phones is Stald’s account: “it is

ubiquitous in youth cultural contexts as a medium for constant updating, coordinating,

information access, and documentation. At the same time, the mobile is an important

medium for social networking, the enhancing of group and group identity, and for the

exchange between friends which is needed in the reflexive process of identity

construction.” The mobile is “the ideal tool to deal with the pace of information

exchange, the management of countless loose, close or intimate relations, the

coordination of ever-changing daily activities, and the insecurity of every day

life” (Stald, 2008). Stald’s findings were based on quantitative and qualitative studies on

fifteen teenage to mid twenty-year old Danes and their mobile habits.

The mobile phone is first and foremost a communicative device, however due to

the increasing number of capabilities and functions it is responsible for i.e. email, GPS,

entertainment, news/reference, time keeping, etc, it is becoming an object of necessity;

one that is crucial for functioning in today’s society. Rich Ling asserts that mobile

devices change the approach to which daily life is organized and coordinated (Ling,

2004). In the traditional sense of time being the meter for the coordination of daily life,

Ling suggests: “Instead of relying on a mediating system, mobile telephony allows for

29

direct contact that is in many cases more interactive and more flexible than time-based

coordination” (Ling, 2004).

Aside from the urgent and necessary functions, the phone is also viewed as a

personal log for day to day experiences (Stald, 2008). Media capturing functionalities

allow users to document experiences through photos, notes, calendars and sound

samples/voice memos. As Stald found, the memories created and shared on mobile

devices inevitably lead to emotional connections felt with the phones back log of digital

files.

The emotive nature of the mobile phone in its ability to connect loved ones; to

function as a personal log; and in its ability to capture moments of communication and

experience, evokes the imagery of Marshall McLuhan’s “extension of man.” As a

medium, mobile users, particularly youth, have found several ways to personalize their

devices, indicating further that there is an unarticulated emotional attachment between

device and user. Some of these personalizations include background screen images, cell

phone cases, ringtones, alarm tones, gaming, photo ids, and so on; “through its basic

appearance, the decorative adaptations, the choice of ringtones, and other alerts, and

through screen background, the mobile itself provides signals about the user’s identity

or at least their self-perception. The use of language, spelling, their actual way of

interacting in dialogues, and the use of additional communicative elements and services

also reveal things about the user’s personal settings” (Stald, 2008).

The emotional accounts of young mobile users across studies range from keeping

in touch through MMS messages and sharing moods and every day events, to taking

video of crowning moments and engaging in full conversations over instant messenger.

These accounts inevitably strengthen relationships and identity. This kind of emotional

30

expressivity of mobile devices supports the case for mobile delivery, but perhaps a more

compelling case is the emergence of phatic communication.

The mobile phone (via social media) has enabled communication functions which

traditionally were only present in verbal communication. As observed in interpersonal

communication and linguistics, phatic communication, commonly referred to as “small

talk” occurs when an exchange exists merely for the purpose of confirming that a

channel exists and is functional. Originally derived by Russian linguist Roman

Jakobson, this type of communication is not meant to convey any specific information

or meaning, but instead, acts merely to utilize a channel, to check that the channel is

working, or to make a comment about that channel (Jakobson, 1959). These exchanges

have understood meaning that do not focus on the words themselves, but rather the

delivery and intention of the phraseology. As pointed out by Zegarac and Clark, despite

the meaningless nature of the words comprising a phatic message, the interpretation of

these messages have social effects (Nicolle & Clark, 1998). While there are many studies

in linguistics and communication sciences examining the content and intent of phatic

messages, Wang et al go further to define “phatic technologies” whose primary purpose

is to “establish, develop, and maintain human relationships”.

While much of phatic communication can be seemingly thoughtless, Ling describes

“grooming” messages (a type of phatic communication) which occur when a

communicator lets another communicator know that they are “there” for them and

actively listening; this exchange serves to nurture the relationship.

The constant messages sent in youth culture for the purpose of “being

thoughtful” (regardless of the lack of information in the message), has been compared

to phatic communication in linguistics. The behaviors of SMS users frequently follow

31

phatic communication patterns, enabling small talk more so than conveying meaningful

information (Ling, 2004). Behaviors such as “poking” on facebook, or pinging through

instant messenger also demonstrate the digital application of phatic communication.

Additional research on the subject has been on the rise within the last decade as

technology forms new communication behaviors, making devices such as the mobile

phone, crucial to understand. As it relates to the mobile phone, phatic communication

is observed (previously mentioned) as a social and emotive interaction without

conveying specific information, such as with the text “hey how are you?” or “what’s

up?” (Bilandzic et al, 2009). Another type of mobile phatic communication has been

observed in European countries, as well as in Africa, North America, Latin America,

and India (this is not an exhaustive list), and utilizes the ringing feature on mobile

devices or other sonic alerts to communicate a shared meaning with another user,

instead of the typical voice or text used to communicate (Kasesniemi et al, 2003).

Observed in the study on Danish youth behavior, mobile users exhibited what is called

“pilaris” by using the number of times a phone would ring to convey specific meaning

(Stald, 2008). Mobile users observed in Donner’s study in Rwanda, used “beeping”,

from SMS/text messaging and missed calls to communicate specific previously

determined meanings. According to the observations, there were three kinds of beeps

used: callback, pre-negotiated instrumental, and relational (Donner, 2007). Examples

given for “pre-negotiated instrumental” include “I’m thinking of you” or “Come pick

me up”. The behavior has spread so much so that an application was prototyped to

“support phatic communication in the hybrid space” (Bilandzic et al, 2009).

This behavior of using sonic alerts to communicate (only a small deviation from

the idea of communicating through musical clips), the emotional connection mobile

32

users feel to their devices, and the ubiquity and necessity of mobile devices, make the

case for incorporating mobile delivery in the Emotitones user interface.

33

IV.

DEVELOPMENT

A. Interface and message flow

The primary purpose of Emotitones, is to enable a emotionally-rich platform for

communication. Observing that the effects of music are highly visceral in most cases,

(especially when dealing with affect), and with adequate and current supporting

research (mentioned in III), the emotion-based navigation was implemented.

Emotion-based navigation revolves around the motivations of the expected

Emotitones user. The premise for sending an emotitone, is that a user desires a form of

expression beyond simple text communication, which typically constrains emotional

expressivity. This user has a pre-determined emotion or sentiment in mind when

visiting the platform, thus Emotitones navigation should be reflective of their emotional

motivations; informing the user on how to best express a given emotion. In other words,

from the moment a user logs in, to the time they send an emotitone, they will be

prompted to make functional decisions based on their emotions.

Part of these navigational decisions is making it easy for the user to find a suitable

piece of media content to represent their sentiment. In the UI, this is facilitated by

content categorization (database tagging) upon song clip ingestion, and a multi-

parametric search.

When dealing with content ingestion, or uploading content the Emotitones

database, song clips are chosen based on their ability to convey succinct ideas or

emotions. Ordinarily, this happens through eloquent song writing, in which the writer

creates relatable, empathic lyrics; or through effective composition, in which the

34

composer creates music evoking highly visceral responses in listeners. The beta catalog

of emotitones includes primarily vocal music in which it is requisite that the lyrics are

concise, annunciated, and well-articulated. While the eventual database will include all

types of music and sound conveying various emotions and ideas, the initial collection of

song clips are somewhat literal for the purpose of developing a successful proof of

concept. Once selected for the database, each emotitone is categorized and tagged

according to the emotion or sentiment describing the over-arching theme being

conveyed. This is to enable effective search for an appropriate emotitone.

The emotional categories for the emotitones beta define what is thought to be the

most inclusive categories describing common, and universal human sentiments. They

are: romance/encouragement/controversy/friendship/humor/spiritual/occasions/

musings/all.

The key difference between these sentimental categories, and ones typically used in

studies on music and emotion such as happy, sad, angry, and scared; is that the

sentiments must take into account shared meaning with the receiver - a consideration

that is absent in many studies which focus on the emotional reaction of only one

listener. For example, if happy was used instead of romance, it would be very difficult

to find music and lyrics appropriate for the relationship dynamic at hand. Vice versa, it

is hard to think of a romantic song clip that would not be appropriate for a sender

wishing to be romantic with the receiver (other than surface level characteristics such as

gender, and other subtleties -to be discussed later). Other than the occasional browsing,

it is hypothesized that users will send emotitones with a specific purpose and person in

mind.

35

To facilitate the finding of appropriate emotitones, a three-parametric search was

implemented, with the emotion-based “sentiment/occasion” category described above

being first.

The second parameter for the sender to decide on is the genre of music, which is also

part of the emotion-based search for an appropriate emotitone. It is hypothesized, that

the sender will have a genre preference based on his or her own musical preferences. As

explored earlier, these musical preferences stem directly from affect; from the visceral

effects of listening to a specific genre of music over time. The beta phase genres include:

rock / pop / hip hop / country / classics / world / other / all.

While resources were consulted (charts, mp3 stores etc), these genre categories

were chosen based on their inclusive and encompassing nature (of sub-genres), and

based on strong presence of subcultures.

The third emotion-based search parameter that the sender must decide on is

gender, the choices being:

male / female / all

This parameter was implemented with the anticipation of senders having specific

messages in mind, for specific people, thus having a preference in gender for the first

person voice. This is an emotional consideration, with the hypothesis being that a song

sent in the first person voice of the same gender, is more emotionally effective than one

communicated in the opposite gender of the sender. Support for this could be found in

surveying ringtone users as to which gender is preferred. This of course is only a

starting point.

Continuing with the navigation flow, after the user goes through the multi-

parametric search, they are then invited to preview the resulting clips if desired, or they

36

can proceed to sending the clip (or buying the full length song). On the send clip page,

the sender is given the opportunity to customize their message by adding text (and in

the future, photos or video). This is the last part of the implemented emotion-based

navigation.

The second design element of the UI discussed in III is media-rich messaging. While

music was always intended to be the content through which users could communicate,

decisions had to be made on catalog, duration, and file type.

The emotitones beta is limited in terms of its categories, and content. Conceptually,

the catalog will house audio and visual clips representing the largest catalogs in the

world. Only by giving the users exhaustive options, will they be able to communicate

fully using the platform.

In terms of clip length, a decision was made to cap duration at 30 seconds. Full

length clips were not considered as they are computationally expensive in terms of file

size and delivery time, and in consideration for the ever decreasing attention spans of

digital users.

The length of fifteen to thirty seconds is the range of length for most song choruses. The

ringtone edit of a song is typically this length, and most often the most emotive part of a

song, as well as the most concise in terms of idea or concept. Logistically, being able to

ask for ringtone edits from content providers is easier to accommodate as no further

editing is required. In the cases where new edits need to be made, this is handled in-

house using Audacity.

The desired file type for music clips is mp3 at 128 kbps. In the application, the

smaller the file size the better, and since the output speakers are likely to be of low

quality, any higher quality music files would be undetectable.

37

The conclusion that one-to-one communication was desired over broadcast (such

as Twitter) led to the design of the “send tone” interface; the last page in the emotion-

based navigation. As mentioned before, once a sender has selected a clip, he or she is

given the opportunity to enter the recipient’s mobile number along with a customized

text. While in beta, the mobile number entry is manual, the later stage versions of the

application will interface with the user’s native contact list. There is also an entry

prompted for the recipient email so that the receiver is notified that they have been sent

an emotitone, and to please check their device settings if they do not receive an

emotitone.

To encourage a dialog between sender and receiver, the receiver is given the

opportunity to ‘reply with an emotitone’. The hope is that in the app version of the

platform, a musical dialog can take place.

The last element of the interface discussed in III, was the decision to integrate

mobile delivery. The app version of emotitones will be mobile-based and self-contained

within the app, but the beta exists as a web to mobile platform. It is apparent as to why

mobile delivery makes sense (discussed previously), however the decision to make the

sender’s experience web-based was an issue of ease of use and adaptability. In other

words, browser-based search and navigation is easier, and most likely will lead to more

time spent on the site, and more users.

In terms of file type, Emotitones are delivered as MMS messages. MMS was the

only option for mobile-specific, media-rich delivery, starting from web, and not self-

contained within an app.

38

B. Application deployment

1. The development of the Emotitones platform revolved around three core issues: 1)

Where will the content be stored? 2) How will it be accessed? and 3) How will it be

delivered?

The first part of the storage issue refers to hosting. All sites must have a hosting

solution, and in the last few years, many have migrated to cloud-based computing. The

Emotitones demo was originally hosted on the Amazon EC2, however due to better

support and more flexibility, Rackspace Cloud was chosen for beta, with the Emotitones

server running on a linux-based Debian Box.

Storage within the application is another issue relating to database development. The

emotitones database has to be able to handle several functions: storage of mp3 files and

corresponding tags/metadata; multi-user access; multi-parametric search capabilities;

and sending, retrieval, and editing of data and files. The selected database system for

Emotitones is MySQL (Facebook, Google, Wikipedia), as it can handle the requirements,

plus large scale content ingestion.

Access of content is enabled through the post-login, online interface which

communicates to the Emotitones MySQL database. Because the application is a browser-

based interface with dynamic content, javascript was the selected as the development

tool, with AJAX to integrate with MySQL. Javascript is reputable for non-browser based

applications, and AJAX is a powerful server integration language.

In the cases where user access involves uploading content through forms, XML, a

widely used tool for data transmission, is used for ingesting information in machine-

readable form, while AJAX accesses the database repository. Emotitones has several of

39

these user-uploaded forms, some of which deal with music files, while others, simple

text. The safekeeping of tags, metadata and other information is dependent on the XML

coding.

The delivery of emotitones, is reliant upon integration with a third party API

allowing for MMS delivery over all major carriers in North America. The Hook Mobile

API uses M.A.X. 2.0 which is a Mobile API EXtension mobile utility platform. M.A.X.

runs on a REST-based interface, which stands for Representational State Transfer; an

architecture running over HTTP (web-based). The delivery of content from database to

mobile phone also involves short codes, which give access to carrier delivery over the

SMS and MMS platforms.

Provided that a receiver’s phone is MMS enabled, emotitones can reach any user

using this API integration. In the receiver’s MMS inbox, the subject displays “John Doe

(username) has sent you an Emotitone.” After clicking on the MMS itself, the receiver

can view the customized text, and press the play button to hear the emotitone. The

receiver is then given an option to reply with an emotitone, in which case they are

directed to the web-based interface. For security, the previews and emotitones are

forward-locked.

The architecture in review, includes a remote server, a requesting source, a

receiving source, and a database repository. The server houses a database of music clips

which have been edited, meta-tagged, and categorized by sentiment and/or occasion,

genre, and gender. Emotitones integrates with an API allowing for successful delivery

and receipt of MMS messages. Any web-enabled device is able to send emotitones, and

any North American MMS-enabled device is able to receive emotitones. The Emotitones

beta allows a user to do the following: create a login, browse clips, preview clips, select

40

and customize chosen clip with text, and send the clip via MMS to a receiver’s mobile

phone.

Other functions of the site include:

1) A daily analysis of logs in the database repository to display information such as

“Top 20 Emotitones chart” and “Today’s Top 5”; a back-end log of when emotitones

have been sent; and safe keeping of user information, for login functionality.

2) A submissions form to allow users (artists or labels with copyright permissions) to

upload edited emotitones to the database pending approval. The users are prompted to

tag and classify each clip such that the emotitone will display in the results of the multi-

parametric search. While full-length downloads are accepted, they are edited before

uploading to the database.

3) A suggestions page. Any user can fill out the online suggestions form if they think

would like to request a song for the Emotitones service. They are prompted for

categorization information, but not permitted to upload the file itself.

4) Aside from the multi-parametric search, users can search the emotitones database

using a keyword search. Each song clip has been tagged with 12 keywords. In most

cases the keywords include song title, artist name, chorus/hook phrasing, mood,

corresponding emotion, genre, and genre of vocalist.

5) Buy links. In most places where a user can listen to an emotitone, he or she also has

the option of purchasing the full length download. This was implemented from an

emotional perspective. For example, if a receiver feels moved by the message in an

emotitone that some one has sent, he or she may want the full length version of the

song, which has a new meaning attached to it.

41

APPLICATION SCREENSHOTS

42

V.

DISCUSSION AND CONCLUSIONS

A. Summary

Many novel systems, especially in the information technology and social-networking

spaces, pre-date the presence of research in full support of the concept being exhibited.

However, when approaching the Emotitones application, not as whole, but as a series of

UI decisions, it was evident that multi-disciplinary support existed. From a practical

standpoint, behaviors demonstrated by users of social, mobile, and communication

technologies, already include integration of multimedia for emotive purposes. While

platforms focusing on this phenomenon are in early stages of emergence, the digital

culture, and its “interactive consumption” has existed for over two decades, and the

mechanisms by which a communicator can express him/herself through digital media

are already integrated in existing platforms such as facebook, myspace, twitter, and

foursquare.

The development of Emotitones has been an uphill battle of cross-platform

development, licensing negotiations, and issues with multi-territory delivery (as well as

developmental cost considerations). These battles are worth fighting for the promise of

a new communication platform; one that empowers users with multimedia content,

namely music, to fulfill the emotional expressivity lacking in so many other

platforms.

The main difference, as articulated previously between these platforms and

Emotitones, is the added value placed on the listener, or end user. In a society where

shameless plugs, spam, mass marketing, and junk mail are easily transmitted over

47

every platform, the listener, or receiver is taken for granted. Even peer to peer textual

interaction lacks the empathy that face-to-face interaction between two strangers

requires. A person can get away with terse, laconic text-based communication, while

successful face-to-face communication must follow the rules of interpersonal

communication. When a user sends an emotitone, he or she must have the receiver in

mind. The main value of Emotitones is in the communication exchange.

Because it had less bearing on the interface design and development, the subject of

how Emotitones can help artists has not been discussed. The artist perspective has

always been a motivation for Emotitones. Music after all, is only possible with artistic

effort and follow through. While the platform enables emotionally-rich communication,

it is also a tool for artists to share and promote excerpts of work. A new release can be

sent as an emotitone, with an adjoining text such as “I wrote this song for my father

who has just passed”, and a link to the full length version of the song. Provided the

artist does not communicate in a way that is construed as spam, it could be an effective

way to reach fans on a more direct and visceral level than the typical release promotion.

B. Expanding features

The beta phase of Emotitones is only a small representation of the features that

will make it a powerful platform for communication. Here are additions for the next

phase:

1) Photo and video attachment capabilities: research supporting media-rich messaging

suggests that a multi-sensory experience is much stronger in evoking an emotional

response

48

2) World wide territories: limitations with the Hook Mobile API do not allow for

delivery outside North America, however some of the strongest mobile markets are

international such as Japan, China, Korea, Brazil, and the UK

3) Karaoke (customizable voice option): in countries such as Korea, the ability to sing

over instrumental versions of songs, is prevalent. Emotitones aims to give users the

option to record their own voice over a song clip, and send it. This may prove to be

extremely powerful in emotional connectivity.

4) Sound sampling: while enabling users to upload any audio file could result in

copyright infringement, users will be allowed to upload and send sounds recorded on

their mobile or pc recorder.

5) Foreign-language song selection: making emotitones delivery available to countries

outside North America is more valuable once there are song clips in the database local

to that region.

6) Editing tool: when artists and labels submit content, songs must be pre-edited to the

correct length. Implementing a dragging tool for editing duration, would make this

easier and more manageable.

7) Exhaustive emotional categories and genres: beta phase development only allowed

for less than a dozen emotional categories, and genres. Many more emotion based

categories and genres will be added in order to properly tag and classify music.

8) Community and user id: Emotitones can spark many conversations on subjects such

as song meaning, artists careers, song feedback in general etc. It is important to enable

the community with comment platforms, and to allow for more information to identify

who each user is. The research on subculture and identity suggests that musical identity

is very important to digital users.

49

9) Commerce: the business model for Emotitones was not discussed, but it exists and

revolves around premium content, royalties from full length music and other products,

as well as virtual gifting. The platform will be implemented in the next phase.

10) Unlocking song selection: gaming has gone through huge growth in the age of social

networking. In certain applications it is a camouflage for reward programs. In

emotitones, the ability to “unlock” song selection will be treated as a game, rewarding

users for loyalty or for their musical interest / knowledge.

11) “Short-hand” emotitones: As mentioned in the opening section, the derivation of the

word emotitone, comes from emoticon. It is possible to make a “short-hand” version of

audio excerpts such that they convey mood without lyrics, or lengthy passages. A short-

hand emotitone, would be one second in duration or less, and added as a menu in

instant messenger applications and the Emotitones community chat. The research

available concerning earcons as well as the cognitive processing of musical gestures

equal in duration suggest that the short-hand emotitones will be effective at

communicating affect in the context of social interaction.

C. Future scope

While the current focus of Emotitones is to enable musical communication, the

vision extends to multi-media communication in general. The future scope of the

platform includes being able to send any digital artifact that communicates an idea,

emotion or sentiment, whether it be a political speech fragment, a strong literary quote

coupled with a related painting, or a humorous video clip from a movie. Because people

of today are digital consumers and in most cases digital producers (unknowingly at

times), they must be enabled to share in a way that gives credit to the communication

50

embedded in each piece of media they have collected or created. These multi-media

pieces are in most cases, not meant to be consumed passively as they were created to

convey meaning- and thus represent meaning.

Such a platform revolves around the database itself, namely content ingestion

(having as much to choose from as possible) and optimized search functionality

(making it easy for users to find what they want). The search engine required is a

significant undertaking, and is part of the future scope of Emotitones.

As an improvement to musical communication, a lyrics database is needed. Users

will want the choice of sending the lyrics to an emotitone along with the audio file.

Whatever can be done to facilitate bricolage(see appendix), will make the service more

compelling.

D. Predictions

As with any newly developed platform, there is always a possibility that technology

will be used in a way other than that for which it was created. This happened with the

application called Chatroulette in that it became a tool for sexual exploitation. Its

intention was to facilitate world-wide impromptu video conversations (with the

motivation of wanting to make the world more accessible to people). Typically this kind

of malfunction happens when a service adds some kind of user-generated functionality.

It is unforeseeable as to how Emotitones could be misused, however, the user-generated

aspect of the platform will not be enabled for beta.

It is predicted that user growth will rely on the growth of the content database. If a

user tries the platform but is unable to find a music clip suitable for the emotion or

sentiment desired, they will most likely not return until there is a wider selection. Some

51

users however, will send an emotitone regardless because of its novelty. It is exciting to

receive an emotitone, even if the words are not exactly right. This is akin to greeting

cards, which historically are vague, and generic. It is up to the person giving the card to

“customize” it with a personal message.

E. Limitations

The Emotitones beta is limited in many ways. As mentioned, content selection must

reach critical mass before the platform truly enables musical communication. In

addition, some users could see MMS delivery as a downside. While SMS and MMS

have achieved relative ubiquity in today’s mobile market, the group of users who get

charged to receive MMS, may find it frustrating to receive emotitones. Senders are

warned several times, however, about standard messaging fees, and savvy (or just

literate) mobile device owners know how to disable MMS delivery to their phones. This

should not be a significant limitation, as there is no difference in cost between receiving

an emotitone, or receiving a photo in MMS.

Another major limitation is the login and navigation being web-based only. While

optimized for most mobile browsers, sending an emotitone from a mobile device is not

a satisfying user experience. Mobile apps were created as solutions to this problem, and

until emotitones exists in app form, the proper user experience of finding and sending

emotitones will be limited to computer-based web browsers.

F. Using Emotitones as a research tools

In the efforts to support the emotional expressiveness of music as compared to

language, Emotitones can be used as a research tool. The Emotitones database logs quite

52

a bit of information including what emotitones people are sending most frequently as

well as which emotional categories, genres, and genders are being sent, at what time of

day, over which region, and how often that emotitone is reciprocated with another

emotitone. As the emotitones user base grows, the behavioral tendencies of users will be

valuable, possibly informing researchers of communication patterns used by today’s

digital users - more specifically digital music users.

As far as specific experiments, controls would have to be implemented, and an

example experiment might be to compare communication efficacy between emotitones

and text messages.

One way of doing this is to find 20 subjects (10 pairs who have a relationship of

some kind), separate them into adjacent rooms, and give them each an MMS enabled

mobile device. Subject 1 of each pair would be instructed to search the Emotitones

database and find five musical clips that best express the emotions or sentiments that he

or she wishes to communicate to Subject 2. Subject 1 would then be asked to compose

five text messages in lieu of each selected musical clip, corresponding to the same

emotions or sentiments.

Subject 2 would be sent the musical clips (as emotitones) as well as the five text

messages one by one in random order. After receiving each clip or text, Subject 2 would

be asked to write down the interpretation of what Subject 1 intended to communicate.

The interpretations would be presented back to Subject 1 in pairs without indication of

whether the interpretation was based on the text version or music version of the

emotion. Subject 1 would be asked to select the interpretation best matching what the

intended emotion or sentiment was. If the music-based interpretations more accurately

53

convey the intended emotions than the text-based interpretations, it can be suggested

that the musical clips were more effective at communicating emotion.

Other studies could be done related to genre preferences, communication patterns,

ethnomusicology and gender studies as related to musical communication.

54

Appendix

A. Patent filing

A thorough prior art search was done by both me and my patent attorneys at Russ

Weinzimmer and Associates. Emotitones is patent-pending and the application can be

viewed publicly on the uspto website.

Provisions were just added to increase coverage and functionality, as well as claims to

foreign territories.

B. Licensing and the Public Domain

One of the biggest obstacles from growing the Emotitones catalog at a faster rate is the

licensing process. Because of the state of the music industry, the four major labels are

very protective of their digital assets. In the meantime, while the case is made,

Emotitones is negotiating with independent content providers.

C. Bricolage

The concept of bricolage has been used to describe the way youth plays with technology

and digital files without real knowledge of what is being done. The messing around

results in new creations and as a result, new behaviors, interactions and subcultures.

D. Getting to the next phase

After beta launch, Emotitones will go into fundraising mode in order to facilitate new

features and help with growth.

55

References

Abrams, D. (2009). Social Identity on a National Scale: Optimal Distinctiveness and Young People’s Self-Expression Through Musical Preference, Group Processes & Intergroup Relations vol. 12(3) 303-317, University of Kent.

Bierce, Ambrose. (1912). For Brevity and Clarity. Collected Works (New York & Washington).

Bigand, E., Viellard, S. Madurell, F., Marozeau, J. & Dacquet, A. (2005a). Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cognition and Emotion,19, 1113-1139.

Bilandzic, M.; Filonik, D.; Gross, M.; Hackel, A.; Mangesius, H.; Krcmar, H. (2009). A Mobile Application to Support Phatic Communication in the Hybrid Space. Information Technology: New Generations. ITNG ’09.

Blacking, J. (1973). How Musical Is Man? Seattle: University of Washington Press.

Blattner, M.M., Sumikawa, D.A., and Greenberg, R.M. (1989). Earcons and Icons: Their Structure and Common Design Principles. Human-Computer Interaction, Vol. 4 pp. 11-44. California: Lawrence Erlbaum Associates.

Buckingham, D. (2008). Introducing Identity. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.

Cherny, L. (1995). The Modal Complexity of Speech Events in a Social MUD. Electronic Journal of Communications 5, No 4. (accessible at http://bhasha.stanford.edu/~cherny/papers.html)

Daltrozzo, J. and Schon, D. (2008). Conceptual Processing in Music as Revealed by N400: Effects on Words and Musical Targets. Journal of Cognitive Neuroscience, 21:10, pp. 1882-1892, Massachusetts Institute of Technology.

Dijk, E.V., & Zeelenberg, M. (2006). The dampening effect of uncertainty on positive and negative emotions. Journal of Behavioral Decision Making, 19, 171-176.

56

http://bhasha.stanford.edu/~cherny/papers.html




Donner, J. (2007). The rules of beeping: Exchanging messages via intentional "missed calls" on mobile phones. Journal of Computer-Mediated Communication, 13(1), article 1.

Durkee, R. (1999). American Top 40: The Countdown of the Century. New York City: Schirmer Books.

Ebane, S. (2004). Digital music and subcultures: Sharing files, sharing styles. Vol. 9, No. 2.

Fitzpatrick, C. (2008). Scrobbling Identity: Impression Management on Last.fm.Technomusicology: A Sandbox Journal, Vol. 1, No. 2.

Garzonis, S., Jones, S., Jay, T., and O’Neill, E. (2009). Auditory Icon and Earcon Mobile Service Notifications: Intuitiveness, Learnability, Memorability and Preference. Boston: CHI.

Hankinson, J.C.K., and Edwards, A.D.N., (1999). Designing Earcons with Musical Grammars. ACM SIGCAPH No. 65. York, England: University of York.

Juslin, P.N. (2003). Communication emotion in music performance: Review and theoretical framework. In Music and Emotion: Theory and Research (pg 309-337). Oxford: Oxford University Press.

Kasesniemi, E.L. (2003) Mobile Messages: Young People and a New Communication Culture Tampere, Finland: Tampere University Press.

Koelsch, S. (2005). Investigating Emotion with Music: Neuroscientific Approaches. Leipzig, Germany: Max Planck Institute for Human Cognitive and Brain Sciences.

Koelsch, S., Gunter, T.C., Wittfoth, M., and Sammler, D. (2005). Interaction between Syntax Processing in Language and in Music: An ERP Study. Journal of Cognitive Neuroscience 17:10, pp. 1565-1577, Massachusetts Institute of Technology.

Kolb, B. and Whishaw, I.Q. (2003). Fundamentals of Human Neuropsychology. London: Worth Publishers.

Kubovy, M. and Shatin, J. (2009). Music and the Brain Series. Washington D.C.: Library of Congress.

Kuhl, O. (2008). Musical Semantics. New York: Peter Lang.

57

Lemmens, P.M.C., De Haan, A., Van Galen, G.P. and Meulenbroek, R.G.J. (2007). Emotionally charged earcons reveal affective congruency effects. Ergnomics Vol. 50, No. 12, 2017-2025. The Netherlands: Taylor & Francis.

Levitin, D.J. (2006). This is Your Brain on Music: the science of a human obsession. New York, NY: Dutton.

Levitin, D. J. (2008). The World in Six Songs: How the Musical Brain Created Human Nature. New York, NY: Dutton.

Ling, R. (2004). The Mobile Connection: The Cell Phone's Impact on Society. Kindle Edition.

McDermott, M. Goldman, S., and Booker, A. (2008). Mixing the Digital, Social, and Cultural: Learning, Identity, and Agency in Youth participation. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.

McNamara, T. P. (2005). Semantic priming: Perspectives from memory and word recognition. New York: Psychology Press.

McPherson, T. (2008). A Rule Set for the Future. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.

Metros, S.E. (1999). Making Connections: A Model for On-line Interaction. Leonardo, Vol. 32, No 4, pp. 281-291. Milan, Italy.

Miell, D., MacDonald, R., and Hargreaves, D. (2005). Musical Communication. Oxford: Oxford University Press.

Mithen, S. (2006). The Singing Neanderthals: The Origins of Music, Language, Mind, and Body. Boston, Massachusetts: Harvard University Press.

Modlitba, P. and Hoglind, D. (2005). Report in Musical Communication and Music Technology: Emotional expressions in dance.

Mustonen, M.S., (2007). Introducing Timbre to Design of Semi-Abstract Earcons. Masters Thesis, Information System Science. University of Jyväskylä, Department of Computer Science and Information Systems.

58

Nicolle, S. and Clark, B. (1998). Phatic Interpretations: Standardization and Conventionalisation, Revista Alicantina de Estudios Ingleses 11: 183-191. Middlesex University.

Niedermeyer E. and Da Silva F.L. (2004). Electroencephalography: Basic Principles, Clinical Applications, and Related Fields. London: Lippincot Williams & Wilkins.

Nussbaum, C.O. (2007). The Musical Representation: Meaning, Ontology, and Emotion. Cambridge, Mass, The MIT Press.

Peretz, I., and Zatorre, R. J. (2003). The Cognitive Neuroscience of Music. Oxford: Oxford University Press.

Sandvig, C. (2008). Wireless Play and Unexpected Innovation. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.

Sloboda, J.A. (1985). The Musical Mind. The Cognitive Psychology of Music. Oxford: Clarendon Press.

Sloboda, J.A. (2007)

Stald, G. (2008). Mobile Identity: Youth, Identity, and Mobile Communication Media. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.

Steinbeis, N. and Koelsch, S. (2010). Affective Priming Effects of Musical Sounds on the Processing of Word Meaning. Journal of Cognitive Neuroscience 23:3, pp. 604-621. Massachusetts Institute of Technology.

Tanzi, D. (1999). The Cultural Role and Communicative Properties of Scientifically Derived Compositional Theories. Leonardo Music Journal, Vol 9, pp. 103-106, Milan, Italy.

Wang, Victoria, Tucker, J.V., and Haines, K.R. (2009). Phatic Technology and Modernity. Center for Criminal Justice and Criminology & Department of Computer Sciences, School of Human Sciences & School of Physical Sciences, Singleton Park: Swansea University.

Weber, S. and Mitchell, C. (2008). Imaging, Keyboarding, and Posting Identities: Young People and New Media Technologies. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.

59

Westlake, E.J. (2008). Friend Me if You Facebook: Generation Y and Performative Surveillance. The Drama Review 52:4 (&200), New York University and the Massachusetts Institute of Technology.

Williams, J. P. (2003). The Straightedge Subculture on the Internet: A Case Study of Style-Display Online. Australia: Media International Australia incorporating Culture and Policy.

Valcheva, M. (2009). Playlistism: a means of identity expression and self-representation. The Mediatized Stories, The University of Oslo.

60

EMOTITONES: A USER INTERFACE FOR MUSICAL...

Documents

Transcript of EMOTITONES: A USER INTERFACE FOR MUSICAL...