Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

17
Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems Zhiguang Eric Zhang Advanced Musical Acoustics - Summer 2014 Dr Braxton Boren

description

Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Transcript of Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Page 1: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding

SystemsZhiguang Eric Zhang

Advanced Musical Acoustics - Summer 2014Dr Braxton Boren

Page 2: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Perceptual evaluation of codec quality has been

largely based on transparency,

annoyance of artifacts, psychoacoustic models, and cognitive models

As music is a subjective, evocative experience, my aim was to

investigate the psychological and emotional impact of coded musical

audio

1. is the artistic intent impacted?2. are there psychological or

emotional differences?

Page 3: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Psychology of music

discrete vs dimensional

basic emotions

sadness, happiness, fear, anger, disgust, tenderness

valence, activity, and tension factors

perceived vs felt

Page 4: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

2-dimensional vs 3-dimensional models

Page 5: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Psychoacoustics in audio coding

Non-linear Bark scale, critical band rate, excitation patterns

Exploits simultaneous and temporal auditory masking

Quantization of spectrum via global masking threshold and signal-to-mask ratio

zBark =13*arctan(0.76 * f /1000)+3.5*arctan(( f / 7500)2 )

Page 6: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

MP3 - 32 x subband filter -> MDCT -> scale factors -> quantization -> Huffman coding

AAC - pure MDCT -> scale factors -> quantization -> Huffman coding

Ogg Vorbis - piecewise linear approximation -> residual -> vector quantization -> Huffman coding

FLAC - Huffman coding of linear predictive coding residual and run-length coding (lossless)

*Perceptual models are an important part of the encoding process

Page 7: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Stimuli

Classic rock

Pop / rock

Hiphop

Electronic

Downtempo

Jazz

Page 8: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Formats

128 kbps MP3 CBR (SoundCloud streaming, baseline)

320 kbps MP3 CBR (Google Play Music maximum quality)

~255 kbps AAC VBR (iTunes Radio?)

~320 kbps Ogg Vorbis VBR (Spotify premium)

FLAC (Android, HDTracks, Pono, Qobuz)

Page 9: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Audio Technica ATH-M50x

Page 10: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Survey questions per excerpt-version

What did you feel, perceive, or experience?

How strong or intense was it?

What was the ‘quality’ of your experience?

Page 11: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

MIRtoolbox v1.3MIRemotion

Basic emotion prediction

Activity, valence, and tension prediction

Each rating based on 4-5 audio features

Page 12: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Subjective results

49.25% of responses from 16 perceived or felt affect choices match emotions predicted by MIRemotion 3-dimensional analysis

‘Intensity’ not statistically significant; scale probably not sensitive enough or null hypothesis is true

128 kbps MP3 have highest ‘Quality’ proportional score %

this is supposed to be the baseline codec

Highest proportion of stimuli all fall within ‘satisfying’ affect quality across all codecs; also probably not sensitive enough or null hypothesis is true

Page 13: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

128 kbps MP3s have highest valence or pleasure for all excerpts (SD RMS + max fluc) => what are the resulting dynamics?

320 kbps MP3s have highest activity or energetic arousal in 5/6 excerpts (RMS + spectral centroid + max fluc)

Actually in good agreement with subjective data

AAC (4) and Ogg Vorbis (2) have highest tense arousal for all excerpts (contributing factors unknown)

FLAC’s lossless characteristic manifests itself in RMS, SD RMS, spectral centroid, and spectral spread

MIRemotion analysis

Page 14: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

MP3s have highest maximum summarized fluctuation

Contributes to valence and activity

Measure of rhythmic periodicity (0-10Hz)

Highly correlated with low frequencies

MP3 artifact arising from quantization of low frequencies (linearly-spaced sub band filter and dense critical bands)?

Effect magnified by asymmetric headphone frequency response?

Page 15: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Heuristic

Remove all subjective data for a participant that is identical across an excerpt

Composition drives participant across an emotional threshold past which he/she is unable to reflect upon or distinguish the nuances of the experience

Largest proportion of 320 kbps MP3 affect quality data shifts from ‘satisfying’ to ‘powerful’

MP3 is resilient and relevant, and still sounds great!

Page 16: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Affect quality pre- and post- heuristic

Page 17: Towards Assessing the Emotional Impact of Music Encoded with Various Digital Audio Coding Systems

Future work

Investigate MP3 maximum summarized fluctuation phenomenon and dynamics

Investigate Ogg Vorbis and AAC tense arousal factors

Believed to involve loudness and dynamics

Find ways of gathering more sensitive subjective data

Evaluate statistical significance of MIRemotion data

Evaluate statistical significance of combined objective and subjective data

VBR / ABR / CBR

Headphone frequency response

How to develop codecs that behave more like FLAC?

Investigate 320 kbps MP3 spectral spread constraint