EMOTITONES: A USER INTERFACE FOR MUSICAL...
Transcript of EMOTITONES: A USER INTERFACE FOR MUSICAL...
EMOTITONES: A USER INTERFACE FOR MUSICAL COMMUNICATION
Shefali Kumar Friesen
Submitted in partial fulfillment of the requirements for the Master of Music in Music Technology
in the Department of Music and Performing Arts Professions in The Steinhardt School
New York University Advisors: Dr. Kenneth Peacock, Dr. Agnieszka Roginska
06/10/2011
Table of Contents
I. Background A. Foundations of musical communication 4 B. Game-changing technology 5
II. Limitations and Motivations A. Limitations or current technology 7 B. Motivations for designing a new interface 8
III. Factors influencing design of the user interface (UI) A. Emotion-based navigation and content categorization 10 B. Media-rich messaging 16 C. One to One Communication 21 D. Mobile delivery 28
IV. Development A. Interface and message flow 34 B. Application deployment 39
Application Screenshots 42
V. Discussion and Conclusions A. Summary 47 B. Expanding features 48 C. Future scope 50 D. Predictions 51 E. Limitations 52 F. Emotitones as a tool 52
Appendix 55
References 56
3
I.
FOUNDATIONS OF MUSICAL COMMUNICATION
“Music is a fundamental channel of communication: it provides a means by which
people can share emotions, intentions, and meanings.”
-Hargreaves, MacDonald, and Miell
A. A few words on communication models
The word communication has many different definitions to date. While previous
definitions specified valid communication channels, today we accept communication to
be: the “imparting or exchange of information, ideas, or feelings”, or simplified further,
“an act or instance of transmitting” (Miriam-Webster, 2011). Despite (or perhaps because
of) this broad definition, and the progressive connotation of the concept, researchers in
music continue to wrestle with demonstrating proof that music is a valid form of
communication.
At the core of accomplishing this task, many attempts have been made to prove
that music is a language and thus a form of communication. Researchers have made
efforts to apply linguistic principles such as semantics to music, claiming that for music
to be qualified as a language, it would have to follow the same rules (Hargreaves et al,
2005). As scholars discovered, however, music is applicable to structural semantics
(notes, chords etc), but lacks definable meaning (Kuhl, 2008). This “fluidity” of meaning
in music, has made applying the ‘language’ label a difficult theory to support; as while
4
some characteristics such as tempo and harmonic progressions are measurable and
definable, others such as emotional response, are personal and cultural - making them
very difficult to measure by any means (Kuhl, 2008).
While at the very least, the formal structure of music, as well as its tendency to
communicate meaning, makes for an elegant metaphor between it and language, later
communication models, influenced by semiotics and cognitive science, placed emphasis
on the usage of language rather than language as a system (Kuhl, 2008). In other words,
rather than proving that music is a language, it is more productive to examine the
comparisons between the system of music and the system of language. As discussed in
III, using this approach, there is support for music as a transmission of meaning,
emotion, and understanding.
From a historic perspective, music and language have fundamentally existed in all
human societies (Mithen 2006). Some archeologists assert that music and language even
existed in prehistoric societies (Blacking, 1973) and (Sloboda, 1985). With research in
favor of these findings, musical communication of today, is simply a continuation of
historic behavior.
B. Information technology and music technology
“Information technologies have profoundly affected communication processes by
simplifying queries, liberating energy, reorganizing semantic axes and points of view,
and reorienting the relationship between the production of meaning and our socio-
technical environment” (Tanzi, 1999).
It has always been the role of communication technology to enable more meaningful
human connections and to increase both the efficiency and frequency of information
5
exchange. Regardless of the innovator’s motivation, communication technologies alter
cultural behaviors and typically make the world more accessible.
Just as communication has been influenced by information technologies, musical
communication has also been influenced by music technologies. One example is the
advent of the radio broadcast and its resulting behavior: music dedication. This cultural
practice began as the Long Distance Dedication on the American Top 40 radio show in
1970. A DJ (Kasem) would play a mailed-in record while reading the accompanying
letter dedicated to a listener. The tradition has continued for over four decades on the
radio, and over mix-tapes, mix-cds, and digitally shared playlists.
Another game-changing music technology innovation is the mp3. The Moving
Pictures Expert Group made a break-through with their implementation of lossy data
compression/resolution and perceptual coding (http://mpeg.chiariglione.org/). Music
files could as a result, be relatively small without compromising on auditory quality.
Music is everywhere as a result, and implemented in online and offline communication
behaviors (discussed in III).
The ringback, while not quite a game-changer, facilitated a more intimate and
direct form of musical communication. In 2001, engineers found a way to manipulate
what was heard on the caller’s side of a ringing phone. For the first time, a mobile
phone owner could designate a unique song that each contact would listen to instead of
the typical ring. Each song selected would communicate the dynamic of the relationship
in some way.
Other examples of game-changers are the digital recorder, digital audio
workstations or DAW, mp3 players, and mobile music such as ringtones, all of which
either changed or introduced new ways of communicating with music.
6
II.
LIMITATIONS AND MOTIVATIONS
A. Limitations of current technology
As technology advances, so does communication - particularly in the age of social
networking. Short Message Service texts or SMS messages are the most prevalent form
of communication worldwide, and is remarkable in ability to communicate to anyone,
anywhere, anytime. However, as revealed by case studies (to be shared in coming
sections) text-based technologies often lead to misinterpretations of tone and meaning
in messages. Whether the absence of intonation in an email leads to error in sizing up a
relationship with a colleague, or a teenager misinterprets a “to the point”
communication style to mean her loved one is laconic and uninterested, it is clear that
the nature of today’s digital communication lacks the clarity of vocal communication, or
better yet, face-to-face interactions. “It is too easy to misunderstand something if you
can’t read the body language and the faces - a single word can easily be
misunderstood.” (Stald, 2008).
Aside from the lack of emotional cues in today’s communication channels, the
number of words allowed to convey an idea are often limited to 140-160 characters.
Even an eloquent communicator would have trouble conveying his or her mood
accurately.
Another limitation which will be explored in the next section, is that free, and
novel communication platforms tend to favor the broadcasting communicator, or the
contributions to the live “feed,” over the thoughtful, expressive communicator. The
7
rapid rate of information exchange has made communication fleeting in some ways.
The result, is many users who have a voice, but do not know what to say, or who to say
it to.
One of the coping strategies for emotional limitations is emoticons, or the faces
made from punctuation and or letters to convey mood. They date back to 1800s as a
concise way of representing emotions in theater (Bierce, 1912), but digitally they have
surfaced in a number of ways. For example, when text-based virtual realities called
MUDS (multi-user domains) began popping up in the 90s, they gave rise to an exclusive
and universally understood language made up of “idioms, acronyms, and iconic
emoticons,” which were created not only “to economize keystrokes, but also to help
define the contexts for conversations, establishing responsiveness and attentiveness,
communicating understanding, initiating play, describing actions in real life and
conveying mood, feeling and emotion” (Cherny, 1995).
Today, the prevalence of emoticons is significant. In instant messenger applications
from AOL IM to Blackberry’s BBM, emoticon menus are embedded and quick to call
upon in order to establish a mood, clarify context, and other various shared meanings.
B. Motivations for designing a new interface
The Emotitones name derived from the concept of emoticons. Emotitones are
auditory emoticons, in some ways, which although longer, and more complex, aim to
provide an enriching and non-text-based tool for expression (see expanding features in
Discussions). The logo for emotitones is an emoticon with an eighth note for a mouth.
8
The motivation of providing an emotionally-rich channel for communication, is
supplemented by the motivation to help the music industry, a business which has been
wounded and dysfunctional for over a decade.
The past decade has proved to be a time of resurgence; the beginning of a comeback of
sorts for the music industry. Even after several years, however, the model is still broken,
and solutions are needed to restore public opinion about the general value of music.
From advertising and sync licensing, to ringtones and radio, music has fulfilled many
commercial needs by tapping into the emotions of the end-user. In each case, however,
music is consumed passively. To demonstrate the greater value inherent in a song, the
listener must be actively engaged by music, or a musical experience. This engagement
should result in an action (or transaction) of some kind.
From an artistic perspective, lyrics are a powerful, but overlooked communication
tool. They convey common emotions, and situations relatable to any given person at
any given time. This is evident in dedications, musical greetings, mix-tapes, and music
sharing, and was a motivation for constructing the Emotitones platform.
As mentioned, musical interaction and communication are not novel concepts.
Emotitones aims to focus on the shared meaning between sender and receiver, instead
of performer and listener. This will facilitate a shift from passive listening to active
communication.
Given this motivation and the limitations of emotional expressivity in today’s digital
communication, this thesis proposes a user interface for musical communication called
Emotitones, and the elements contributing to the design of the UI.
9
III.
FACTORS INFLUENCING THE DESIGN OF THE USER INTERFACE (UI)
After committing to creating a solution for musical communication, there were several
decisions that went into the design of the Emotitones user interface. While prototyping
the user experience felt relatively natural, significant research went into each design
element. The first decision was to make the navigation and categorization, emotion-
based.
A. Emotion-based navigation and content categorization
Navigation is the backbone of user experience in many applications. It is a
prediction of user mindset- and must address the following questions: what will the
user be thinking from page to page? and what will motivate them / guide them to
complete the desired action?
With musical communication as the end goal, it was crucial to focus on the
message; not text-based information, but rather audio-based emotion. To choose
emotion-based navigation, it was necessary to find confirmation that emotional
expressivity in music exists; in other words, adequate research suggesting that music is
effective at conveying emotion (or concise ideas/sentiments) was required.
The most compelling research addressing the emotional expressivity of music,
comes from studies in cognitive neuroscience within the last five years. Researchers
know that it is possible to prime a stimulus for an expected result. In other words, by
presenting one concept to a subject, the expectation of another concept can be created.
10
The former concept is called a primer, and can be more or less effective based on how
related it is to the later concept. This phenomenon has been used to test several theories
on cognitive processing. Before highlighting these studies, some brief comments on
priming and the N400 are necessary.
The brain has at least two systems: the first is implicit and responds quickly and
automatically, and the other is an explicit system designed to interpret signals from the
implicit signal (Kubovy, 2008). Priming, an important factor in recent experiments
measuring the cognitive processing of music, is an effect of the implicit memory, in
which exposure to a stimulus influences response to a later stimulus (Kolb & Whishaw,
2003). Here is an example of priming described by Michael Kubovy, in the Library of
Congress lecture series Music and the Brain: table 1.1:
In table 1.1, the words being compared are SOFA and SOFA. In subject 1, because SOFA
is preceded by the word COUCH, it is said that it has been primed for an expected
result, where as SOFA preceded by DISK in subject 2, has not been primed for the
expected result. Similarly, CAR has been primed by TRUCK, but not CAR preceded by
11
subject 1 subject 2
CARPET CARPET
COUCH DISK
SOFA SOFA
HAMMER TRUCK
CAR CAR
PICTURE PICTURE
table 1.1
HAMMER. When a stimulus has been primed effectively, it can be accessed more
readily than the same unprimed stimulus, therefore leading to more streamlined
cognitive processing (Kohl, Whishaw, 2003). To measure such claims,
electroencephalography (EEG) was used to record the faint electromagnetic fields
emitted by the brain; more specifically, waves less than 100 Hz indicating brain activity
(Niedermeyer, Silva, 2004). The voltages were picked up by electrodes, amplified, and
recorded into a computer where they were analyzed.
Before measuring activity related to unexpected stimuli in experiments surrounding
cognitive processing, it is important to first measure results of expected stimuli. Kubovy
again illustrates an example: The word dog was displayed on a screen over and over,
while the subject’s brain waves recorded. The waveform of the stimulus was aligned
with the recorded waveform of the response, and an average was taken of the two,
resulting in the event related potential, or ERP. This is, in a sense an extraction of how
the brain is responding to a stimulus (Kubovy & Shatin, 2009). When dealing with
unexpected stimuli, negative voltages result in events recorded at various time intervals
after the introduction of the stimulus. For example, when dealing with the sentence “I
like my coffee with cream and dog” the unrelated and unexpected presence of dog
prompts a reaction in the brain while processing. This ERP occurs 400 milliseconds after
the unexpected stimulus, and is thus called the N400. The N400 is present in many of
the studies at hand (Kubovy, 2006).
Koelsch and his team achieved breakthroughs in their ability to make studies of
cognitive processing applicable to the processing of music. To do this, experiments used
music and language as two separate approaches to priming the same word, with the
results compared.
12
In one language priming study, for example, the word “wide” or “wideness” (from a
German translation) was primed by one sentence: “the gaze wondered off into the
distance”, while the second sentence was constructed with no association to the word
wide: “The manocles (hand cuffs) allow only a little movement.” The result was
expected: a higher N400 for wide/wideness proceeding the second (unrelated)
sentence.
To test this in the musical realm, Koelsch et al used a musical excerpt by Richard
Strauss (Opus 54 “Salome”) to evoke the feeling of the word “wide” or “wideness”. The
second musical excerpt was from a more dissonant and closed-feeling piece by Valpola
(from the E-minor piece for accordion). Just as the related prime in the language
experiment yielded a lower N400 than the unrelated prime, the musical prime by
Strauss resulted in a lower N400 in subjects than the musical prime by Valpola. This was
a result of the characteristics of each piece. Because the piece by Strauss exhibits a multi-
instrumental, consonant, and seemingly grand and expansive sonic quality, it evokes a
feeling of wideness more so than the Valpola piece which is dissonant, with fewer
instruments, with a seemingly closed feeling. The smaller value for an N400 peak is an
indication of less cognitive processing of a concept.
The illustration of the N400 responses (the peak on each chart as indicated on the
C2 chart) in figure 1.2 shows the cognitive responses to the concept of wideness using
language-based and music-based primes. The dotted line represents the ERPs (event
related potentials) responding to the unrelated primes, and the solid lines represent the
ERPs in response to the related primes. In examining the two charts comparing the
unrelated and related N400s for both music and language, it can be concluded that the
musical primes are just as effective at conveying the idea of wideness as the language
13
primes are. This is indicated specifically by the differences in values of the N400 peaks
comparing unrelated and related data points. The space between the two ERPs both for
music and language primed-stimuli show the readiness of the brain to receive
effectively primed concepts.
Koelsch and Steinbeis did a similar study recently, but instead of using excerpts of
related and unrelated music to prime words, musical chords with consonance and
dissonance, major and minor keys, and varying timbres were used as primers for
emotionally congruous and incongruous words. N400s were again measured and it was
found that the emotionally congruous words yielded a lower N400 in both musically
trained and untrained subjects (Steinbeis & Koelsch 2010). This was a study in “affective
priming effects of musical sounds on the processing of word meaning” and resulted in a
14
Figure 1.2
N400
Event Related Potentials
priming using Strauss piece priming using Valpola
powerful contribution to the discussion of musical communication. In a similar study
on duration, it was shown that this type of meaning, can be conveyed within a subject
hearing 250 milliseconds of music (Bigand et al, 2005), and in other studies, meaning is
conveyed in the changing of one semitone, or timbral characteristic (Sloboda et al, 2007).
The findings suggest that “musical mode can effect the processing of language on
affective level.”
On the subject of ideas or affect conveyed by music, it is worth mentioning, that
what all musical structures used in these studies have in common, is that they are
representative of concepts in one of three ways: 1) by imitation, 2) by association, or 3) a
sense of embodiment (Kubovy & Shatin, 2009). For example, when trying to prime a
subject to choose a circle over a square, a researcher may play a short musical excerpt
conveying something “smooth” sounding verses something “angular”. Because the
brain recognizes the embodiment of a circle, it can be primed by characteristics
embodying the same concept. This phenomenon of mixed metaphor has been studied in
great deal by experts in synesthesia - who would assert that sounds and music can
represent a concept so powerfully, that they can convey meaning in other senses entirely
(Cytowic, 2009). The notion that music is representative of concepts and emotions,
suggests that allowing a user to choose his or her own music to convey an idea based on
emotion and categorization, would be effective.
Many musical representations in today’s society most likely are a result of cultural
learning, (such as a listener’s association of fanfare with royalty). “By communicating
an emotion, however basic, music can refer to a variety of different affective states,
which are more or less, unanimously understood by listeners familiar with the musical
idiom” (Juslin, 2003). Music representation by association is most likely to be more
15
culturally dependent than other musical representations, and is dependent on the
listener drawing on allusions (Kubovy, 2006). With a rich cultural memory, listeners
have implicit knowledge of the music in his or her culture.
Even beyond cultural associations, recent studies have demonstrated that certain
emotions in music are recognized both by Western and non-Western listeners. These
subjects were able to classify western pieces as being happy, sad, or scary, without any
familiarity with the pieces (Fritz et al 2009). This idea of universal emotions also
supports emotion-based navigation.
Unlike the past decade in which the emotional expressiveness of music was
limited to theories and proofs dealing with structural comparisons between music and
language, currently there is clearly enough support to base navigation and content
categorization on emotion. The details and better picture of what this means will be
explored in the development section of this paper.
B. Media-rich messaging
The second UI decision to explore, is the enabling of media-rich messaging - in this case
attaching a musical excerpt to the message being communicated, or making the musical
excerpt the message itself. Research on this subject is lead by the communication
sciences, but is becoming more cross-disciplinary due to the rise of social-networking
technologies. Research by Weber & Mitchell provides a good starting point for this
discussion of the effect multi-media has on communication efficiency.
Observably, digital users adopt multi-media into their communication on a regular
basis. Whether the media comes in the form of an embedded video link, or photo
attachments, the content becomes a crucial part of the message. Taken a step further,
16
with ever-evolving social platforms and enabling technologies, users (particularly
young users) find the means to modify these media artifacts to make them their own.
“Young people’s own digital productions facilitate a blending of media, genres,
experimentations, modifications, and reiterations, which Mizuko ”Mimi” Ito describes
as a media-mix” (Weber & Mitchell, 2008). This customization of media can be seen in
remixed audio and video files pervasive around the internet. Common examples
include film footage that has been overdubbed with audio created by the user or taken
from another source; in the many renditions of unlicensed cover songs; and in the
personalized computer animations used as digital greetings. This consumable nature of
‘media-mix’ is described by Henry Jenkins as being “production”; not production for
commercial purposes, but for “interactive consumption,” in which a user consumes
media including images, audio, and video, to create their own media productions
(Weber & Mitchell, 2008). Best paraphrased:
“users merge digital technologies with commercial media narratives in the context
of specific communities, in effect fusing and remaking both the narrative and the tool.
From early scrap-booking practices in Studio-era Hollywood to the audio mix tapes of
the 1970s, to the fan fiction and textual poaching explored by cultural studies
researchers, we know that viewers and readers have long “re-mixed” or poached
commercial culture” (McPherson, 2008).
The limitations of plain text have become apparent to digital culture, leaving
media-less messages best for informational purposes only. While the production aspect
of communication is predominantly exercised by youth demographics, the interactive
consumption aspect of today’s digital communication has reached ubiquity. This can be
observed in social-networking sites such as Facebook and Twitter.
17
Historically, multi-media implementations have lead to new styles of
communication. MUDS, as mentioned earlier, led to the creation of an exclusive
language made of idioms, acronyms, and emoticons. Another, more complex example is
Machinima culture.
Machinima is the result of creating animated movies in real-time through video
game technology. Or more elaborately, visual narratives “created by recording events
and performances (filmmaking) with artistically created characters moved over time
(animation) within an adjustable virtual environment (3D game technology platform or
engine)” (Lowood, 2005). While the example seems to impose an esoteric knowledge
requirement on the user, the digital youth of today have a similarly fluent and complex
handle on multimedia implementation, and like machinima users, view interactive
capabilities with peers, equally valuable. Machinima users were able to exploit a
technology platform, in order to express themselves, while simultaneously creating a
subculture. This aspect of subculture, is also important in supporting specifically music-
rich messaging.
Media-rich messaging also supports the school of thought that multi-sensory
messages, lead to more emotionally-rich communication or experiences. “When
modeling a communication experience, designers tend to limit user interaction to visual
cues, occasionally accompanied by sound. But reality is actually multi-sensory and
packed with an array of complex emotional cues... “(Metros, 1999), and “...the more
modalities a medium uses, for example images and sounds, the more senses are
activated and the more effective is the feeling of presence” (Stald, 2008). Support for this
assertion that delivering information through more than one sensory experience is
18
effective, is found in the creation and implementation of earcons - an auditory tool that
has been used to convey information for decades.
Lemmens et al define earcons as “audio messages used in human-computer
interfaces to provide information and feedback.” While they are typically short (often
less than 500 ms), they create strong associations, acting as cues for specific tasks that a
user carries out. Both Windows and Apple computers have a history of using earcons
which tell a user when they have carried out specific functions such as booting up their
computer; opening files; saving files; and putting files in the trash. Earcons can confirm
that a task has been carried out successfully; inform when an error or something
unexpected has occurred; warn when something is failing, or needs attention, and
occasionally act as bells and whistles to an otherwise mundane task. A surprising
number of studies have been done on earcons, including methods of creating them; how
musical elements contribute to their efficiency; the resulting associations formed by
users; the psychological impact of positive and negative earcons; and how they relate to
their visual counterpart. Most compelling in the discussion of music being used to
create emotionally-rich communication through multi-sensory experiences, are the
elements that earcon designers consider when approaching each audio cue. For
example, in the study done by Lemmens et al it was asserted “the difference in affective
appreciation of the major and minor modes can be incorporated in the set of
transformations for earcons. The major/minor transformation can then be used
specifically to create affectively-charged earcons for use in affective human-computer
interfaces” (p. 2018). On the flip side, a study was done in 2010 on the potential hazards
of using dissonant warnings for technical errors - possible creating too strong a negative
visceral response in the end-user.
19
In another study called “Designing Earcons with Musical Grammers”, Hankinson
and Edwards recall early earcon designers who stayed away from compositions using
more than four notes as to avoid musical associations and affect. Their study, on the
contrary asserts that if used correctly, musical gestures and associated grammars
applied to earcons can provide the user with rich information. This notion is confirmed
by other researchers who have pinpointed the capability of conveying affect through
pitch, rhythm, and timbre.
The concise nature and observable impact of earcons have made them an
intriguing subject in examining audio-visual tools. Methods in cognitive processing
(similar to those mentioned earlier) allow researchers to observe how earcons (varying
in sonic quality) effect the brain’s ability to process information, as well as form
affective responses. Congruency also plays a role - or how closely an earcon matches the
concept it is trying to convey. Researchers use stimulus-response compatibility (SRC) to
label or describe efficient implementations which result in improved user performance
(strong stimulus-response mappings) (Lemmens et al, 2018). This is part of affective
computing research.
Despite the wide range of research, all are in agreement that earcons effectively convey
information, thus improving human-computer interaction. Best said in the context of
emotitones, “Earcons could be used, in any program employing emoticons, to more
easily differentiate between positively and negatively valenced emotions” (Lemmens et
al, 2024).
In a culture where digital users receive information, form impressions and share
perspectives through consumption of multimedia, Emotitones facilitates this
communication behavior further by allowing users to express themselves through the
20
multimedia content itself; more specifically, through musical content. Decisions
regarding the transmission of these messages begin with the consideration for who the
receiving audience is.
C. One to One Communication
This section of the user interface discussion concerns the communication channel
specifically, and the decision to enable a one-to-one channel verses a broadcast.
The strength in research favoring peer-to-peer communication over the type of
communication demonstrated in blogging, status-updates, and tweeting cultures, rests
in the assumption that a communicator is more invested in a message directed at one
user than in a broadcast to an undefined group of people.
The main difference between the two approaches, is that in direct communication, the
presence of the receiver / listener is crucial, and must be considered by the sender /
communicator. While communication theories defining these roles are progressing as
technology evolves, they stem from traditional models of communication, and are
adapted as needed.
The information transmission model by Shannon and Weaver was widely favored
in the mid-20th century. In this model, the communicator chooses a specific channel to
deliver a message to a targeted receiver (Hargreaves et al, 2005). Many musical
communication researchers consider this an oversimplification, arguing that
communication (musical or otherwise) involves creativity and interaction between the
performer (sender) and listener (receiver): the communication is "much more interactive
and re-creative than is suggested by the idea of information being passed from one
21
person (e.g. the performer) to another (the listener)” (Hargreaves et al, 2005). The
listener, they assert, has a role in defining or interpreting the message (or piece of
music), and therefore cannot be compared to a passive listener. Modern theories of
musical communication address this shortcoming, however they lack consensus in
terms of what roles the communicator and receiver play, and as to whether or not
musical messages have coded meanings (Kendell & Carterette, 1990).
The distinction between composer and performer was accepted however, and added as
an extra step in the communication chain.
This meant that the performer, had to first decode, then interpret musical meaning, then
re-encode the message before sending to the listener, where in which “each of these
processes is dependent on the shared implicit and explicit knowledge of all three
participants in the chain, and is influenced by the context and environment within
which the process takes place” (Hargreaves et al, 2005).
22
figure 1.3
Figure 1.3 shows the complex but elegant musical communication model by Juslin,
who addressed the uncertainty between the listener’s perception and his or her affective
response, as well as defined the composer’s role as a “causal” influence on the listener
(Juslin, 2003). His studies also examined the translation of intention (composer’s and
performer’s), and resulting affective response in the listener. Because the composer’s
intention is translated by the performer’s intention, the performance takes on acoustic
features that effect and shape the listener’s perception. The patterns that the listener
then recognizes and internalizes, formulate a response - possibly emotional, and thus
lead to a new mental state or experience (Hargreaves et al, 2005). In the case of
Emotitones, it is reasonable to say that the sender is, in effect, a second performer -
interpreting the original message of the composer and singer, and again encoding the
piece of music just before sending it to a receiver, who will be influenced by four
considerations, the original composition/writing, the original performance, the sender’s
added comments/impressions, and finally, the receiver’s own associations with the
piece.
Like Juslin, other scholars have been integral in addressing some of the subtleties
of communication chains. Speaking to the importance of the receiver, Johnson and
Laird’s model asserts that when a communicator codes a message to the receiver, the
message becomes symbolic, or a representation of what the sender wishes to send the
receiver. The receiver must then decode the message, and therefore must have a mutual
understanding of what the symbolic coding means (Hargreaves et al, 2005). While a
performer on stage may opt to take artistic liberties in favor of direct, clear
communication of a specific concept or idea, a user wishing to communicate an idea or
emotion to another person, will be unsuccessful should he/she opt to send a vague,
23
coded message with no regard for mutual understanding. This is the difference between
expression and communication. It is expected that designing the Emotitones UI using
one-to-one communication (or communication within a small group), will prompt the
sender to consider mutual understanding, and thus, will result in a more successful and
fulfilling communication exchange.
On a side note, it has been suggested (astutely) by Tanzi, that the first
communication decision belongs to the composer: “The composer must decide whether
to hold on to sonic memories” or to let “algorithms dispose him of them. Music is thus
ultimately cognitive and anthropological, not merely musical” (Tanzi, 1999).
While progress in neuroscience has put musical communication models in the
context of information processing in the cognitive system, as discussed in III, and others
have focused on modeling musical communication after language models using
semantics and semiotics (also covered in III), other studies focus on communication
models as influenced by digital technologies in the age of social-networking, resulting
in the highly expressive nature of communication.
The concept of expression as a form of everyday communication is a new
phenomenon, and one that indicates that a one-to-one channel for expressive
communication is a logical next step for a new technology interface. Today, the logistics
of message delivery (such as email platforms, and SMS platforms) are taken for granted,
as users live in total ubiquity of technologically-driven communication channels. In
other words, if a user sends a text message, he or she does not feel uncertain about
whether the message will reach the recipient. Studies today focus on other layers of
complexity; for instance, instead of computational thought being spent on how an
intended message gets from the sender to receiver, users must consider the construction
24
of the message itself, and what channel to use for the delivery of the message. These are
elements of new communication behaviors surfacing in the digital realm, and are being
studied by researchers in several disciplines.
With the tools for media-rich communication readily available, and the wide
choice in channels for message delivery, individual expressivity plays a much greater
role. Communication exchange cannot happen without a series of individual decisions,
each a part of the communicator’s preference and identity. The following are studies
speaking to the influence of expressivity and identity in communication.
As new behaviors in the digital age emerge, theories and observations regarding self-
perception and the formation of identity, are arrived at by applying traditional school of
thought to modern, practical situations.
Erving Goffman, is still quoted and studied today by digital theory researchers.
His “impression management” speaks to the tendency for individuals to monitor and
guide others’ impressions by altering their own settings, physical appearance, and
manners (Goffman, 1959). In today’s context, the performance of self “applies not only
to face-to-face interaction, but also to asynchronous and real-time interaction on the
internet. While Goffman could not have predicted the dynamics of computer-mediated
interaction, his model works because users, socialized in face-to-face interaction are
often conscious of applying the rules of such interaction to the cyber world” (Westlake,
2008). This is reminiscent of facebook user behaviors. Posting, tagging, and updating
status, are actions typically broadcast to all other “friends” on a user’s profile, with each
post carefully deliberated. Goffman labelled social interaction as being “dramaturgical”
in that it is like a theater performance. His metaphorical “front” stage and “back” stage
25
distinguished between people acting or conforming to social rituals at gatherings, and
people behaving when not playing a role and free to be themselves, respectively
(Buckingham, 2008). “While certain elements that Goffman defined as part of the ‘front
stage’ performance are absent in the computer-mediated interaction (visual cues such as
clothing and facial expression and aural cues such as tone), they are replaced in chat
and on websites by more “staged” elements such as font, photographs, music, and
graphics” (Westlake, 2008). These staged elements become the characteristics of a digital
individual, who can “tell stories of sorts (often non-linear and multi-voiced) and leave a
digital trail, fingerprint, or photograph” (Weber & Mitchell, 2008).
The “production” and “interactive consumption” discussed earlier are also
identity forming. Weber and Mitchell credit reflexivity as one explanation of how
consumption and production contribute to identity formation: “Firstly their own media
production (both through its processes and its outcomes) forces young people to look at
themselves, sometimes through new eyes, providing feedback for further modification
of their self-representations. Secondly, the source materials and modes of young
people’s media production are often evident or transparent; the choices and processes
that they use reveal and identify them in ways that they themselves might not even
realize” (Weber & Mitchell, 2008).
Digital artifacts used in remixing and in expression over social media channels
range in media type, duration, and format. Music-based examples have the most
relevance in consideration of a musical communication tool.
“Music is one of the most widespread and significant cultural objects that enhance
dimensions of people’s everyday life, and thus has become a significant component in
26
the domains of cognitive, emotional, and social functionality” (Hargreaves & North,
1999).
The concept of music being a one-to-one interaction already exists through music
sharing. Aside from those mentioned briefly in the introduction, peer-to-peer sharing
applications, mp3 websites, and social-networking sites allowing profile music, all
enable music sharing. While in some cases, these websites are used to display music in
the public domain, music preferences, or music choices are often shared between peers.
“Music represents a remarkable meeting point of the private and public realms,
providing encounters of self-identity with collective identity” (Hesmondhalgh, 2008).
The sophistication and method to sharing music in a meaningful way, Valcheva calls
“Playlistism.”
In making playlists, people characterize themselves, and express their personality
while capturing the emotional state they are in (Dijik, 2006). Ebane et al go so far as to
say playlists are a “reliable personality barometer and a locus for negotiations of
meaning, identity, and online presence” (Ebane, Slaney, & White, 2004). Anecdotally,
most would say this is true - music preferences have strong associations to subculture.
Frith has done many studies on this phenomenon, and made the conclusion that music
functions as a “badge” for social beings. This badge-like quality of music, “is claimed to
communicate value, attitude, and opinion to others and thus a means of identity
representation and self-expression” (Valcheva, 2009). Frith’s findings also assert that an
individual’s musical selection highlights some of the unconscious personality traits that
person has.
Several studies have examined the effects and functionality of music sharing
technologies (using playlists) including: iTunes (Voida et al, 2005), Napster (Brown et al,
27
2001), last.fm (Fitzpatrick 2008), Webjay (acquired and shut down by Yahoo), Push!
Music, and TunA (Bassoli et al, 2006) (Valcheva 2009). In the case of last.fm, the platform
allows users to share playlists, construct visualizations of musical taste, and express his
or her identity through musical subculture. While TunA allows users to stream other
users’ playlists in a “eavesdropping” manner, Push!Music is a novel system which
allows users to “push” songs while mobile, in an effort to share music preferences, and
make personal recommendations. This peer-to-peer interaction increases the value of
musical interaction by placing importance on the receiver; if a song is being sent as a
recommendation, the sender has taken the receiver into consideration.
Making emotitones deliverable to individuals goes one step further; the sender
must consider if the message in the song itself is what should be communicated, not just
the receiver’s potential affect to that style of music.
D. Mobile delivery
Thus far, current research / studies support a user interface which hosts emotion-based
navigation and content classification; media-rich messaging, and peer-to-peer
communication. The next UI element to consider is the method of delivery. After
surveying the current reigning information technologies, it was clear that Emotitones
would have to consider delivery over the mobile platform.
“.. seen in this very broad evolutionary perspective, the significance of the mobile
phone lies in empowering people to engage in communication, which is at the same
time free from the constraints of physical proximity and spatial immobility” (Geser,
2008).
28
One simple but powerful aspect favoring mobile devices is their worldwide
dominance; their ubiquity.
The economic research illustrating this world wide dominance of mobile devices and
mobile internet within the last two years alone is more than enough to make a decision
on this delivery method (there are over 4.6 billion mobile users in the world); however,
the design of the Emotitones user interface is based on neurological, technological and
sociological analyses, not on economics.
On the subject of the ever-present nature of mobile phones is Stald’s account: “it is
ubiquitous in youth cultural contexts as a medium for constant updating, coordinating,
information access, and documentation. At the same time, the mobile is an important
medium for social networking, the enhancing of group and group identity, and for the
exchange between friends which is needed in the reflexive process of identity
construction.” The mobile is “the ideal tool to deal with the pace of information
exchange, the management of countless loose, close or intimate relations, the
coordination of ever-changing daily activities, and the insecurity of every day
life” (Stald, 2008). Stald’s findings were based on quantitative and qualitative studies on
fifteen teenage to mid twenty-year old Danes and their mobile habits.
The mobile phone is first and foremost a communicative device, however due to
the increasing number of capabilities and functions it is responsible for i.e. email, GPS,
entertainment, news/reference, time keeping, etc, it is becoming an object of necessity;
one that is crucial for functioning in today’s society. Rich Ling asserts that mobile
devices change the approach to which daily life is organized and coordinated (Ling,
2004). In the traditional sense of time being the meter for the coordination of daily life,
Ling suggests: “Instead of relying on a mediating system, mobile telephony allows for
29
direct contact that is in many cases more interactive and more flexible than time-based
coordination” (Ling, 2004).
Aside from the urgent and necessary functions, the phone is also viewed as a
personal log for day to day experiences (Stald, 2008). Media capturing functionalities
allow users to document experiences through photos, notes, calendars and sound
samples/voice memos. As Stald found, the memories created and shared on mobile
devices inevitably lead to emotional connections felt with the phones back log of digital
files.
The emotive nature of the mobile phone in its ability to connect loved ones; to
function as a personal log; and in its ability to capture moments of communication and
experience, evokes the imagery of Marshall McLuhan’s “extension of man.” As a
medium, mobile users, particularly youth, have found several ways to personalize their
devices, indicating further that there is an unarticulated emotional attachment between
device and user. Some of these personalizations include background screen images, cell
phone cases, ringtones, alarm tones, gaming, photo ids, and so on; “through its basic
appearance, the decorative adaptations, the choice of ringtones, and other alerts, and
through screen background, the mobile itself provides signals about the user’s identity
or at least their self-perception. The use of language, spelling, their actual way of
interacting in dialogues, and the use of additional communicative elements and services
also reveal things about the user’s personal settings” (Stald, 2008).
The emotional accounts of young mobile users across studies range from keeping
in touch through MMS messages and sharing moods and every day events, to taking
video of crowning moments and engaging in full conversations over instant messenger.
These accounts inevitably strengthen relationships and identity. This kind of emotional
30
expressivity of mobile devices supports the case for mobile delivery, but perhaps a more
compelling case is the emergence of phatic communication.
The mobile phone (via social media) has enabled communication functions which
traditionally were only present in verbal communication. As observed in interpersonal
communication and linguistics, phatic communication, commonly referred to as “small
talk” occurs when an exchange exists merely for the purpose of confirming that a
channel exists and is functional. Originally derived by Russian linguist Roman
Jakobson, this type of communication is not meant to convey any specific information
or meaning, but instead, acts merely to utilize a channel, to check that the channel is
working, or to make a comment about that channel (Jakobson, 1959). These exchanges
have understood meaning that do not focus on the words themselves, but rather the
delivery and intention of the phraseology. As pointed out by Zegarac and Clark, despite
the meaningless nature of the words comprising a phatic message, the interpretation of
these messages have social effects (Nicolle & Clark, 1998). While there are many studies
in linguistics and communication sciences examining the content and intent of phatic
messages, Wang et al go further to define “phatic technologies” whose primary purpose
is to “establish, develop, and maintain human relationships”.
While much of phatic communication can be seemingly thoughtless, Ling describes
“grooming” messages (a type of phatic communication) which occur when a
communicator lets another communicator know that they are “there” for them and
actively listening; this exchange serves to nurture the relationship.
The constant messages sent in youth culture for the purpose of “being
thoughtful” (regardless of the lack of information in the message), has been compared
to phatic communication in linguistics. The behaviors of SMS users frequently follow
31
phatic communication patterns, enabling small talk more so than conveying meaningful
information (Ling, 2004). Behaviors such as “poking” on facebook, or pinging through
instant messenger also demonstrate the digital application of phatic communication.
Additional research on the subject has been on the rise within the last decade as
technology forms new communication behaviors, making devices such as the mobile
phone, crucial to understand. As it relates to the mobile phone, phatic communication
is observed (previously mentioned) as a social and emotive interaction without
conveying specific information, such as with the text “hey how are you?” or “what’s
up?” (Bilandzic et al, 2009). Another type of mobile phatic communication has been
observed in European countries, as well as in Africa, North America, Latin America,
and India (this is not an exhaustive list), and utilizes the ringing feature on mobile
devices or other sonic alerts to communicate a shared meaning with another user,
instead of the typical voice or text used to communicate (Kasesniemi et al, 2003).
Observed in the study on Danish youth behavior, mobile users exhibited what is called
“pilaris” by using the number of times a phone would ring to convey specific meaning
(Stald, 2008). Mobile users observed in Donner’s study in Rwanda, used “beeping”,
from SMS/text messaging and missed calls to communicate specific previously
determined meanings. According to the observations, there were three kinds of beeps
used: callback, pre-negotiated instrumental, and relational (Donner, 2007). Examples
given for “pre-negotiated instrumental” include “I’m thinking of you” or “Come pick
me up”. The behavior has spread so much so that an application was prototyped to
“support phatic communication in the hybrid space” (Bilandzic et al, 2009).
This behavior of using sonic alerts to communicate (only a small deviation from
the idea of communicating through musical clips), the emotional connection mobile
32
users feel to their devices, and the ubiquity and necessity of mobile devices, make the
case for incorporating mobile delivery in the Emotitones user interface.
33
IV.
DEVELOPMENT
A. Interface and message flow
The primary purpose of Emotitones, is to enable a emotionally-rich platform for
communication. Observing that the effects of music are highly visceral in most cases,
(especially when dealing with affect), and with adequate and current supporting
research (mentioned in III), the emotion-based navigation was implemented.
Emotion-based navigation revolves around the motivations of the expected
Emotitones user. The premise for sending an emotitone, is that a user desires a form of
expression beyond simple text communication, which typically constrains emotional
expressivity. This user has a pre-determined emotion or sentiment in mind when
visiting the platform, thus Emotitones navigation should be reflective of their emotional
motivations; informing the user on how to best express a given emotion. In other words,
from the moment a user logs in, to the time they send an emotitone, they will be
prompted to make functional decisions based on their emotions.
Part of these navigational decisions is making it easy for the user to find a suitable
piece of media content to represent their sentiment. In the UI, this is facilitated by
content categorization (database tagging) upon song clip ingestion, and a multi-
parametric search.
When dealing with content ingestion, or uploading content the Emotitones
database, song clips are chosen based on their ability to convey succinct ideas or
emotions. Ordinarily, this happens through eloquent song writing, in which the writer
creates relatable, empathic lyrics; or through effective composition, in which the
34
composer creates music evoking highly visceral responses in listeners. The beta catalog
of emotitones includes primarily vocal music in which it is requisite that the lyrics are
concise, annunciated, and well-articulated. While the eventual database will include all
types of music and sound conveying various emotions and ideas, the initial collection of
song clips are somewhat literal for the purpose of developing a successful proof of
concept. Once selected for the database, each emotitone is categorized and tagged
according to the emotion or sentiment describing the over-arching theme being
conveyed. This is to enable effective search for an appropriate emotitone.
The emotional categories for the emotitones beta define what is thought to be the
most inclusive categories describing common, and universal human sentiments. They
are: romance/encouragement/controversy/friendship/humor/spiritual/occasions/
musings/all.
The key difference between these sentimental categories, and ones typically used in
studies on music and emotion such as happy, sad, angry, and scared; is that the
sentiments must take into account shared meaning with the receiver - a consideration
that is absent in many studies which focus on the emotional reaction of only one
listener. For example, if happy was used instead of romance, it would be very difficult
to find music and lyrics appropriate for the relationship dynamic at hand. Vice versa, it
is hard to think of a romantic song clip that would not be appropriate for a sender
wishing to be romantic with the receiver (other than surface level characteristics such as
gender, and other subtleties -to be discussed later). Other than the occasional browsing,
it is hypothesized that users will send emotitones with a specific purpose and person in
mind.
35
To facilitate the finding of appropriate emotitones, a three-parametric search was
implemented, with the emotion-based “sentiment/occasion” category described above
being first.
The second parameter for the sender to decide on is the genre of music, which is also
part of the emotion-based search for an appropriate emotitone. It is hypothesized, that
the sender will have a genre preference based on his or her own musical preferences. As
explored earlier, these musical preferences stem directly from affect; from the visceral
effects of listening to a specific genre of music over time. The beta phase genres include:
rock / pop / hip hop / country / classics / world / other / all.
While resources were consulted (charts, mp3 stores etc), these genre categories
were chosen based on their inclusive and encompassing nature (of sub-genres), and
based on strong presence of subcultures.
The third emotion-based search parameter that the sender must decide on is
gender, the choices being:
male / female / all
This parameter was implemented with the anticipation of senders having specific
messages in mind, for specific people, thus having a preference in gender for the first
person voice. This is an emotional consideration, with the hypothesis being that a song
sent in the first person voice of the same gender, is more emotionally effective than one
communicated in the opposite gender of the sender. Support for this could be found in
surveying ringtone users as to which gender is preferred. This of course is only a
starting point.
Continuing with the navigation flow, after the user goes through the multi-
parametric search, they are then invited to preview the resulting clips if desired, or they
36
can proceed to sending the clip (or buying the full length song). On the send clip page,
the sender is given the opportunity to customize their message by adding text (and in
the future, photos or video). This is the last part of the implemented emotion-based
navigation.
The second design element of the UI discussed in III is media-rich messaging. While
music was always intended to be the content through which users could communicate,
decisions had to be made on catalog, duration, and file type.
The emotitones beta is limited in terms of its categories, and content. Conceptually,
the catalog will house audio and visual clips representing the largest catalogs in the
world. Only by giving the users exhaustive options, will they be able to communicate
fully using the platform.
In terms of clip length, a decision was made to cap duration at 30 seconds. Full
length clips were not considered as they are computationally expensive in terms of file
size and delivery time, and in consideration for the ever decreasing attention spans of
digital users.
The length of fifteen to thirty seconds is the range of length for most song choruses. The
ringtone edit of a song is typically this length, and most often the most emotive part of a
song, as well as the most concise in terms of idea or concept. Logistically, being able to
ask for ringtone edits from content providers is easier to accommodate as no further
editing is required. In the cases where new edits need to be made, this is handled in-
house using Audacity.
The desired file type for music clips is mp3 at 128 kbps. In the application, the
smaller the file size the better, and since the output speakers are likely to be of low
quality, any higher quality music files would be undetectable.
37
The conclusion that one-to-one communication was desired over broadcast (such
as Twitter) led to the design of the “send tone” interface; the last page in the emotion-
based navigation. As mentioned before, once a sender has selected a clip, he or she is
given the opportunity to enter the recipient’s mobile number along with a customized
text. While in beta, the mobile number entry is manual, the later stage versions of the
application will interface with the user’s native contact list. There is also an entry
prompted for the recipient email so that the receiver is notified that they have been sent
an emotitone, and to please check their device settings if they do not receive an
emotitone.
To encourage a dialog between sender and receiver, the receiver is given the
opportunity to ‘reply with an emotitone’. The hope is that in the app version of the
platform, a musical dialog can take place.
The last element of the interface discussed in III, was the decision to integrate
mobile delivery. The app version of emotitones will be mobile-based and self-contained
within the app, but the beta exists as a web to mobile platform. It is apparent as to why
mobile delivery makes sense (discussed previously), however the decision to make the
sender’s experience web-based was an issue of ease of use and adaptability. In other
words, browser-based search and navigation is easier, and most likely will lead to more
time spent on the site, and more users.
In terms of file type, Emotitones are delivered as MMS messages. MMS was the
only option for mobile-specific, media-rich delivery, starting from web, and not self-
contained within an app.
38
B. Application deployment
1. The development of the Emotitones platform revolved around three core issues: 1)
Where will the content be stored? 2) How will it be accessed? and 3) How will it be
delivered?
The first part of the storage issue refers to hosting. All sites must have a hosting
solution, and in the last few years, many have migrated to cloud-based computing. The
Emotitones demo was originally hosted on the Amazon EC2, however due to better
support and more flexibility, Rackspace Cloud was chosen for beta, with the Emotitones
server running on a linux-based Debian Box.
Storage within the application is another issue relating to database development. The
emotitones database has to be able to handle several functions: storage of mp3 files and
corresponding tags/metadata; multi-user access; multi-parametric search capabilities;
and sending, retrieval, and editing of data and files. The selected database system for
Emotitones is MySQL (Facebook, Google, Wikipedia), as it can handle the requirements,
plus large scale content ingestion.
Access of content is enabled through the post-login, online interface which
communicates to the Emotitones MySQL database. Because the application is a browser-
based interface with dynamic content, javascript was the selected as the development
tool, with AJAX to integrate with MySQL. Javascript is reputable for non-browser based
applications, and AJAX is a powerful server integration language.
In the cases where user access involves uploading content through forms, XML, a
widely used tool for data transmission, is used for ingesting information in machine-
readable form, while AJAX accesses the database repository. Emotitones has several of
39
these user-uploaded forms, some of which deal with music files, while others, simple
text. The safekeeping of tags, metadata and other information is dependent on the XML
coding.
The delivery of emotitones, is reliant upon integration with a third party API
allowing for MMS delivery over all major carriers in North America. The Hook Mobile
API uses M.A.X. 2.0 which is a Mobile API EXtension mobile utility platform. M.A.X.
runs on a REST-based interface, which stands for Representational State Transfer; an
architecture running over HTTP (web-based). The delivery of content from database to
mobile phone also involves short codes, which give access to carrier delivery over the
SMS and MMS platforms.
Provided that a receiver’s phone is MMS enabled, emotitones can reach any user
using this API integration. In the receiver’s MMS inbox, the subject displays “John Doe
(username) has sent you an Emotitone.” After clicking on the MMS itself, the receiver
can view the customized text, and press the play button to hear the emotitone. The
receiver is then given an option to reply with an emotitone, in which case they are
directed to the web-based interface. For security, the previews and emotitones are
forward-locked.
The architecture in review, includes a remote server, a requesting source, a
receiving source, and a database repository. The server houses a database of music clips
which have been edited, meta-tagged, and categorized by sentiment and/or occasion,
genre, and gender. Emotitones integrates with an API allowing for successful delivery
and receipt of MMS messages. Any web-enabled device is able to send emotitones, and
any North American MMS-enabled device is able to receive emotitones. The Emotitones
beta allows a user to do the following: create a login, browse clips, preview clips, select
40
and customize chosen clip with text, and send the clip via MMS to a receiver’s mobile
phone.
Other functions of the site include:
1) A daily analysis of logs in the database repository to display information such as
“Top 20 Emotitones chart” and “Today’s Top 5”; a back-end log of when emotitones
have been sent; and safe keeping of user information, for login functionality.
2) A submissions form to allow users (artists or labels with copyright permissions) to
upload edited emotitones to the database pending approval. The users are prompted to
tag and classify each clip such that the emotitone will display in the results of the multi-
parametric search. While full-length downloads are accepted, they are edited before
uploading to the database.
3) A suggestions page. Any user can fill out the online suggestions form if they think
would like to request a song for the Emotitones service. They are prompted for
categorization information, but not permitted to upload the file itself.
4) Aside from the multi-parametric search, users can search the emotitones database
using a keyword search. Each song clip has been tagged with 12 keywords. In most
cases the keywords include song title, artist name, chorus/hook phrasing, mood,
corresponding emotion, genre, and genre of vocalist.
5) Buy links. In most places where a user can listen to an emotitone, he or she also has
the option of purchasing the full length download. This was implemented from an
emotional perspective. For example, if a receiver feels moved by the message in an
emotitone that some one has sent, he or she may want the full length version of the
song, which has a new meaning attached to it.
41
V.
DISCUSSION AND CONCLUSIONS
A. Summary
Many novel systems, especially in the information technology and social-networking
spaces, pre-date the presence of research in full support of the concept being exhibited.
However, when approaching the Emotitones application, not as whole, but as a series of
UI decisions, it was evident that multi-disciplinary support existed. From a practical
standpoint, behaviors demonstrated by users of social, mobile, and communication
technologies, already include integration of multimedia for emotive purposes. While
platforms focusing on this phenomenon are in early stages of emergence, the digital
culture, and its “interactive consumption” has existed for over two decades, and the
mechanisms by which a communicator can express him/herself through digital media
are already integrated in existing platforms such as facebook, myspace, twitter, and
foursquare.
The development of Emotitones has been an uphill battle of cross-platform
development, licensing negotiations, and issues with multi-territory delivery (as well as
developmental cost considerations). These battles are worth fighting for the promise of
a new communication platform; one that empowers users with multimedia content,
namely music, to fulfill the emotional expressivity lacking in so many other
platforms.
The main difference, as articulated previously between these platforms and
Emotitones, is the added value placed on the listener, or end user. In a society where
shameless plugs, spam, mass marketing, and junk mail are easily transmitted over
47
every platform, the listener, or receiver is taken for granted. Even peer to peer textual
interaction lacks the empathy that face-to-face interaction between two strangers
requires. A person can get away with terse, laconic text-based communication, while
successful face-to-face communication must follow the rules of interpersonal
communication. When a user sends an emotitone, he or she must have the receiver in
mind. The main value of Emotitones is in the communication exchange.
Because it had less bearing on the interface design and development, the subject of
how Emotitones can help artists has not been discussed. The artist perspective has
always been a motivation for Emotitones. Music after all, is only possible with artistic
effort and follow through. While the platform enables emotionally-rich communication,
it is also a tool for artists to share and promote excerpts of work. A new release can be
sent as an emotitone, with an adjoining text such as “I wrote this song for my father
who has just passed”, and a link to the full length version of the song. Provided the
artist does not communicate in a way that is construed as spam, it could be an effective
way to reach fans on a more direct and visceral level than the typical release promotion.
B. Expanding features
The beta phase of Emotitones is only a small representation of the features that
will make it a powerful platform for communication. Here are additions for the next
phase:
1) Photo and video attachment capabilities: research supporting media-rich messaging
suggests that a multi-sensory experience is much stronger in evoking an emotional
response
48
2) World wide territories: limitations with the Hook Mobile API do not allow for
delivery outside North America, however some of the strongest mobile markets are
international such as Japan, China, Korea, Brazil, and the UK
3) Karaoke (customizable voice option): in countries such as Korea, the ability to sing
over instrumental versions of songs, is prevalent. Emotitones aims to give users the
option to record their own voice over a song clip, and send it. This may prove to be
extremely powerful in emotional connectivity.
4) Sound sampling: while enabling users to upload any audio file could result in
copyright infringement, users will be allowed to upload and send sounds recorded on
their mobile or pc recorder.
5) Foreign-language song selection: making emotitones delivery available to countries
outside North America is more valuable once there are song clips in the database local
to that region.
6) Editing tool: when artists and labels submit content, songs must be pre-edited to the
correct length. Implementing a dragging tool for editing duration, would make this
easier and more manageable.
7) Exhaustive emotional categories and genres: beta phase development only allowed
for less than a dozen emotional categories, and genres. Many more emotion based
categories and genres will be added in order to properly tag and classify music.
8) Community and user id: Emotitones can spark many conversations on subjects such
as song meaning, artists careers, song feedback in general etc. It is important to enable
the community with comment platforms, and to allow for more information to identify
who each user is. The research on subculture and identity suggests that musical identity
is very important to digital users.
49
9) Commerce: the business model for Emotitones was not discussed, but it exists and
revolves around premium content, royalties from full length music and other products,
as well as virtual gifting. The platform will be implemented in the next phase.
10) Unlocking song selection: gaming has gone through huge growth in the age of social
networking. In certain applications it is a camouflage for reward programs. In
emotitones, the ability to “unlock” song selection will be treated as a game, rewarding
users for loyalty or for their musical interest / knowledge.
11) “Short-hand” emotitones: As mentioned in the opening section, the derivation of the
word emotitone, comes from emoticon. It is possible to make a “short-hand” version of
audio excerpts such that they convey mood without lyrics, or lengthy passages. A short-
hand emotitone, would be one second in duration or less, and added as a menu in
instant messenger applications and the Emotitones community chat. The research
available concerning earcons as well as the cognitive processing of musical gestures
equal in duration suggest that the short-hand emotitones will be effective at
communicating affect in the context of social interaction.
C. Future scope
While the current focus of Emotitones is to enable musical communication, the
vision extends to multi-media communication in general. The future scope of the
platform includes being able to send any digital artifact that communicates an idea,
emotion or sentiment, whether it be a political speech fragment, a strong literary quote
coupled with a related painting, or a humorous video clip from a movie. Because people
of today are digital consumers and in most cases digital producers (unknowingly at
times), they must be enabled to share in a way that gives credit to the communication
50
embedded in each piece of media they have collected or created. These multi-media
pieces are in most cases, not meant to be consumed passively as they were created to
convey meaning- and thus represent meaning.
Such a platform revolves around the database itself, namely content ingestion
(having as much to choose from as possible) and optimized search functionality
(making it easy for users to find what they want). The search engine required is a
significant undertaking, and is part of the future scope of Emotitones.
As an improvement to musical communication, a lyrics database is needed. Users
will want the choice of sending the lyrics to an emotitone along with the audio file.
Whatever can be done to facilitate bricolage(see appendix), will make the service more
compelling.
D. Predictions
As with any newly developed platform, there is always a possibility that technology
will be used in a way other than that for which it was created. This happened with the
application called Chatroulette in that it became a tool for sexual exploitation. Its
intention was to facilitate world-wide impromptu video conversations (with the
motivation of wanting to make the world more accessible to people). Typically this kind
of malfunction happens when a service adds some kind of user-generated functionality.
It is unforeseeable as to how Emotitones could be misused, however, the user-generated
aspect of the platform will not be enabled for beta.
It is predicted that user growth will rely on the growth of the content database. If a
user tries the platform but is unable to find a music clip suitable for the emotion or
sentiment desired, they will most likely not return until there is a wider selection. Some
51
users however, will send an emotitone regardless because of its novelty. It is exciting to
receive an emotitone, even if the words are not exactly right. This is akin to greeting
cards, which historically are vague, and generic. It is up to the person giving the card to
“customize” it with a personal message.
E. Limitations
The Emotitones beta is limited in many ways. As mentioned, content selection must
reach critical mass before the platform truly enables musical communication. In
addition, some users could see MMS delivery as a downside. While SMS and MMS
have achieved relative ubiquity in today’s mobile market, the group of users who get
charged to receive MMS, may find it frustrating to receive emotitones. Senders are
warned several times, however, about standard messaging fees, and savvy (or just
literate) mobile device owners know how to disable MMS delivery to their phones. This
should not be a significant limitation, as there is no difference in cost between receiving
an emotitone, or receiving a photo in MMS.
Another major limitation is the login and navigation being web-based only. While
optimized for most mobile browsers, sending an emotitone from a mobile device is not
a satisfying user experience. Mobile apps were created as solutions to this problem, and
until emotitones exists in app form, the proper user experience of finding and sending
emotitones will be limited to computer-based web browsers.
F. Using Emotitones as a research tools
In the efforts to support the emotional expressiveness of music as compared to
language, Emotitones can be used as a research tool. The Emotitones database logs quite
52
a bit of information including what emotitones people are sending most frequently as
well as which emotional categories, genres, and genders are being sent, at what time of
day, over which region, and how often that emotitone is reciprocated with another
emotitone. As the emotitones user base grows, the behavioral tendencies of users will be
valuable, possibly informing researchers of communication patterns used by today’s
digital users - more specifically digital music users.
As far as specific experiments, controls would have to be implemented, and an
example experiment might be to compare communication efficacy between emotitones
and text messages.
One way of doing this is to find 20 subjects (10 pairs who have a relationship of
some kind), separate them into adjacent rooms, and give them each an MMS enabled
mobile device. Subject 1 of each pair would be instructed to search the Emotitones
database and find five musical clips that best express the emotions or sentiments that he
or she wishes to communicate to Subject 2. Subject 1 would then be asked to compose
five text messages in lieu of each selected musical clip, corresponding to the same
emotions or sentiments.
Subject 2 would be sent the musical clips (as emotitones) as well as the five text
messages one by one in random order. After receiving each clip or text, Subject 2 would
be asked to write down the interpretation of what Subject 1 intended to communicate.
The interpretations would be presented back to Subject 1 in pairs without indication of
whether the interpretation was based on the text version or music version of the
emotion. Subject 1 would be asked to select the interpretation best matching what the
intended emotion or sentiment was. If the music-based interpretations more accurately
53
convey the intended emotions than the text-based interpretations, it can be suggested
that the musical clips were more effective at communicating emotion.
Other studies could be done related to genre preferences, communication patterns,
ethnomusicology and gender studies as related to musical communication.
54
Appendix
A. Patent filing
A thorough prior art search was done by both me and my patent attorneys at Russ
Weinzimmer and Associates. Emotitones is patent-pending and the application can be
viewed publicly on the uspto website.
Provisions were just added to increase coverage and functionality, as well as claims to
foreign territories.
B. Licensing and the Public Domain
One of the biggest obstacles from growing the Emotitones catalog at a faster rate is the
licensing process. Because of the state of the music industry, the four major labels are
very protective of their digital assets. In the meantime, while the case is made,
Emotitones is negotiating with independent content providers.
C. Bricolage
The concept of bricolage has been used to describe the way youth plays with technology
and digital files without real knowledge of what is being done. The messing around
results in new creations and as a result, new behaviors, interactions and subcultures.
D. Getting to the next phase
After beta launch, Emotitones will go into fundraising mode in order to facilitate new
features and help with growth.
55
References
Abrams, D. (2009). Social Identity on a National Scale: Optimal Distinctiveness and Young People’s Self-Expression Through Musical Preference, Group Processes & Intergroup Relations vol. 12(3) 303-317, University of Kent.
Bierce, Ambrose. (1912). For Brevity and Clarity. Collected Works (New York & Washington).
Bigand, E., Viellard, S. Madurell, F., Marozeau, J. & Dacquet, A. (2005a). Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cognition and Emotion,19, 1113-1139.
Bilandzic, M.; Filonik, D.; Gross, M.; Hackel, A.; Mangesius, H.; Krcmar, H. (2009). A Mobile Application to Support Phatic Communication in the Hybrid Space. Information Technology: New Generations. ITNG ’09.
Blacking, J. (1973). How Musical Is Man? Seattle: University of Washington Press.
Blattner, M.M., Sumikawa, D.A., and Greenberg, R.M. (1989). Earcons and Icons: Their Structure and Common Design Principles. Human-Computer Interaction, Vol. 4 pp. 11-44. California: Lawrence Erlbaum Associates.
Buckingham, D. (2008). Introducing Identity. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.
Cherny, L. (1995). The Modal Complexity of Speech Events in a Social MUD. Electronic Journal of Communications 5, No 4. (accessible at http://bhasha.stanford.edu/~cherny/papers.html)
Daltrozzo, J. and Schon, D. (2008). Conceptual Processing in Music as Revealed by N400: Effects on Words and Musical Targets. Journal of Cognitive Neuroscience, 21:10, pp. 1882-1892, Massachusetts Institute of Technology.
Dijk, E.V., & Zeelenberg, M. (2006). The dampening effect of uncertainty on positive and negative emotions. Journal of Behavioral Decision Making, 19, 171-176.
56
Donner, J. (2007). The rules of beeping: Exchanging messages via intentional "missed calls" on mobile phones. Journal of Computer-Mediated Communication, 13(1), article 1.
Durkee, R. (1999). American Top 40: The Countdown of the Century. New York City: Schirmer Books.
Ebane, S. (2004). Digital music and subcultures: Sharing files, sharing styles. Vol. 9, No. 2.
Fitzpatrick, C. (2008). Scrobbling Identity: Impression Management on Last.fm.Technomusicology: A Sandbox Journal, Vol. 1, No. 2.
Garzonis, S., Jones, S., Jay, T., and O’Neill, E. (2009). Auditory Icon and Earcon Mobile Service Notifications: Intuitiveness, Learnability, Memorability and Preference. Boston: CHI.
Hankinson, J.C.K., and Edwards, A.D.N., (1999). Designing Earcons with Musical Grammars. ACM SIGCAPH No. 65. York, England: University of York.
Juslin, P.N. (2003). Communication emotion in music performance: Review and theoretical framework. In Music and Emotion: Theory and Research (pg 309-337). Oxford: Oxford University Press.
Kasesniemi, E.L. (2003) Mobile Messages: Young People and a New Communication Culture Tampere, Finland: Tampere University Press.
Koelsch, S. (2005). Investigating Emotion with Music: Neuroscientific Approaches. Leipzig, Germany: Max Planck Institute for Human Cognitive and Brain Sciences.
Koelsch, S., Gunter, T.C., Wittfoth, M., and Sammler, D. (2005). Interaction between Syntax Processing in Language and in Music: An ERP Study. Journal of Cognitive Neuroscience 17:10, pp. 1565-1577, Massachusetts Institute of Technology.
Kolb, B. and Whishaw, I.Q. (2003). Fundamentals of Human Neuropsychology. London: Worth Publishers.
Kubovy, M. and Shatin, J. (2009). Music and the Brain Series. Washington D.C.: Library of Congress.
Kuhl, O. (2008). Musical Semantics. New York: Peter Lang.
57
Lemmens, P.M.C., De Haan, A., Van Galen, G.P. and Meulenbroek, R.G.J. (2007). Emotionally charged earcons reveal affective congruency effects. Ergnomics Vol. 50, No. 12, 2017-2025. The Netherlands: Taylor & Francis.
Levitin, D.J. (2006). This is Your Brain on Music: the science of a human obsession. New York, NY: Dutton.
Levitin, D. J. (2008). The World in Six Songs: How the Musical Brain Created Human Nature. New York, NY: Dutton.
Ling, R. (2004). The Mobile Connection: The Cell Phone's Impact on Society. Kindle Edition.
McDermott, M. Goldman, S., and Booker, A. (2008). Mixing the Digital, Social, and Cultural: Learning, Identity, and Agency in Youth participation. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.
McNamara, T. P. (2005). Semantic priming: Perspectives from memory and word recognition. New York: Psychology Press.
McPherson, T. (2008). A Rule Set for the Future. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.
Metros, S.E. (1999). Making Connections: A Model for On-line Interaction. Leonardo, Vol. 32, No 4, pp. 281-291. Milan, Italy.
Miell, D., MacDonald, R., and Hargreaves, D. (2005). Musical Communication. Oxford: Oxford University Press.
Mithen, S. (2006). The Singing Neanderthals: The Origins of Music, Language, Mind, and Body. Boston, Massachusetts: Harvard University Press.
Modlitba, P. and Hoglind, D. (2005). Report in Musical Communication and Music Technology: Emotional expressions in dance.
Mustonen, M.S., (2007). Introducing Timbre to Design of Semi-Abstract Earcons. Masters Thesis, Information System Science. University of Jyväskylä, Department of Computer Science and Information Systems.
58
Nicolle, S. and Clark, B. (1998). Phatic Interpretations: Standardization and Conventionalisation, Revista Alicantina de Estudios Ingleses 11: 183-191. Middlesex University.
Niedermeyer E. and Da Silva F.L. (2004). Electroencephalography: Basic Principles, Clinical Applications, and Related Fields. London: Lippincot Williams & Wilkins.
Nussbaum, C.O. (2007). The Musical Representation: Meaning, Ontology, and Emotion. Cambridge, Mass, The MIT Press.
Peretz, I., and Zatorre, R. J. (2003). The Cognitive Neuroscience of Music. Oxford: Oxford University Press.
Sandvig, C. (2008). Wireless Play and Unexpected Innovation. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.
Sloboda, J.A. (1985). The Musical Mind. The Cognitive Psychology of Music. Oxford: Clarendon Press.
Sloboda, J.A. (2007)
Stald, G. (2008). Mobile Identity: Youth, Identity, and Mobile Communication Media. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.
Steinbeis, N. and Koelsch, S. (2010). Affective Priming Effects of Musical Sounds on the Processing of Word Meaning. Journal of Cognitive Neuroscience 23:3, pp. 604-621. Massachusetts Institute of Technology.
Tanzi, D. (1999). The Cultural Role and Communicative Properties of Scientifically Derived Compositional Theories. Leonardo Music Journal, Vol 9, pp. 103-106, Milan, Italy.
Wang, Victoria, Tucker, J.V., and Haines, K.R. (2009). Phatic Technology and Modernity. Center for Criminal Justice and Criminology & Department of Computer Sciences, School of Human Sciences & School of Physical Sciences, Singleton Park: Swansea University.
Weber, S. and Mitchell, C. (2008). Imaging, Keyboarding, and Posting Identities: Young People and New Media Technologies. Digital Youth, Innovation, and the Unexpected. Cambridge, MA: The MIT Press.
59
Westlake, E.J. (2008). Friend Me if You Facebook: Generation Y and Performative Surveillance. The Drama Review 52:4 (&200), New York University and the Massachusetts Institute of Technology.
Williams, J. P. (2003). The Straightedge Subculture on the Internet: A Case Study of Style-Display Online. Australia: Media International Australia incorporating Culture and Policy.
Valcheva, M. (2009). Playlistism: a means of identity expression and self-representation. The Mediatized Stories, The University of Oslo.
60