CGMB 324: Multimedia System design

CGMB 324: MULTIMEDIA SYSTEM DESIGN

Chapter 05: Multimedia Element III – Audio

Objectives

Upon completing this chapter, you should be able to:

Understand how audio data is being represented in computers

Understand the MIDI file format Understand how to apply audio in

multimedia systems

Digitization Of Sound


Facts about soundSound is a continuous wave that travels through the air, at around 1000 km/h. without air there is no sound !!! (like in space)The wave is made up of pressure differences. Sound is detected by measuring the pressure level at a certain location. Sound waves have normal wave properties

Reflection (bouncing) Refraction (change of angle when entering a medium

with different density) Diffraction (bending around an obstacle)

This makes the design of “surround sound” possible


Human ears can hear in the range of 20 Hz (a deep rumble) to about 20 kHz and at intensities starting –10dB to 25 dB.

This changes with age, as the sensitivity of our hearing usually reduces.

The intensity of sound can be measured in terms of Sound Pressure Level (SPL) in decibels (dBs).


A turbojet engine might be as loud as 165 dB.

A car driving on the highway, about 100 dB.

And, a whisper, averaging around 35 dB.

We cannot hear very high frequencies outside our hearing range and neither can we hear very low ones.


Sounds around us happen in a wide range of frequency and intensity.

For example, leaves rustling are very soft, or very low intensity (amplitude) sounds, but they have high frequencies.

A jet engine is a very high amplitude sound and it is also in the high frequency area.

A cargo truck is very loud (high intensity), but within the low frequency area.

Our speech sounds spread across many frequencies and vary in intensity.


Digitization In GeneralMicrophones and video cameras

produce analog signals (continuous-valued voltages)


In audio digitization, two components form the composite sinusoidal signal of the actual sound – the fundamental sine wave and the harmonics.

Amplitude = intensityFrequency = pitch

sine wav patternderive from y=sin x graph

ADCVoice 10010010000100

Sine wave

Analog - Digital Converter (ADC)


To get audio or video into a computer, we must digitize it (convert it into a string of numbers)

So, we have to understand discrete sampling

Sampling -- divide the horizontal axis (the time dimension) into discrete pieces. Uniform sampling is ubiquitous (everywhere at once).


Quantization (sampling in the amplitude)-- divide the vertical axis (signal strength) into pieces. Sometimes, a non-linear function is applied.

8-bit quantization divides the vertical axis into 256 levels.

16 bit gives you 65536 levels. Digital audio is a real representation of

sound in the form of bits and bytes.

Quantization & Sampling

Amplitude

Quantization Levels

Sampling t (seconds)


Question:

Given a 16 bit CD-quality musical piece to be sampled using 44.1KHz (sampling rate per second) for 10 minutes; what is the size of the file in mono and stereo? (Use x1024 conversion)

Answer:

44.1 x 103 x 16 bit x (10 x 60) / 8 (bits) = 50.486 MBytes

For stereo (left/right) channel, the amount is doubled:

50.486MBytes x 2 = 100.936 MBytes

44.1 x 103 x 16 bit x (10 x 60) = 423360000 bits (/8)

(mono) = 52920000 bytes (/1024) = 51679.688 KB (/1024)

= 50.468 MB

Stereo (mono x 2) = 50.468 x 2

= 100.936 MB

That’s why we need COMPRESSION!


Digitizing Audio

Questions for producing digital audio (Analog-to-Digital Conversion):

1. How often do you need to sample the signal?

2. How good is the signal? 3. How is audio data formatted?

Nyquist Theorem

Suppose we are sampling a waveform. How often do we need to sample it to figure out its frequency?

• The Nyquist Theorem, also known as the sampling theorem, is a principle that engineers follow in the digitization of analog signals. • For analog-to-digital conversion (ADC) to result in a faithful reproduction of the signal, slices, called samples, of the analog waveform must be taken frequently. • The number of samples per second is called the sampling rate or sampling frequency.

Nyquist Theorem

If we sample only once per cycle (blue area), we may think the signal is a constant.

Nyquist Theorem

If we sample at another low rate, e.g., 1.5 times per cycle, we may think it's a lower frequency waveform

Nyquist Theorem

Nyquist rate -- It can be proven that a bandwidth-limited signal can be fully reconstructed from its samples, if the sampling rate is at least twice the highest frequency in the signal.

The highest frequency component, in hertz, for a given analog signal is fmax.

According to the Nyquist Theorem, the sampling rate must be at least 2(fmax), or twice the highest analog frequency component

Typical Audio Formats

Popular audio file formats include .au (Unix workstations), .aiff (MAC), .wav (PC etc)

A simple and widely used audio compression method is Adaptive Delta Pulse Code Modulation (ADPCM). Based on past samples, it predicts the next

sample and encodes the difference between the actual value and the predicted value.

Audio Quality vs. Data Rate

Quality Sample

Rate(KHz)

Bits per

Sample

Mono / Stereo

Data Rate (if Uncompressed)

Frequency Band

Telephone 8 8 Mono 8 KBytes/sec 200-3,400 Hz

AM Radio 11.025 8 Mono 11.0 KBytes/sec - FM Radio 22.050 16 Stereo 88.2 KBytes/sec -

CD 44.1 16 Stereo 176.4 KBytes/sec 20-20,000 Hz

DAT 48 16 Stereo 192.0 KBytes/sec 20-20,000 Hz

DVD Audio

192 24 Stereo 1,152.0 KBytes/sec

20-20,000 Hz

Audio Quality vs. Data Rate For 44.1 KHz, CD quality recording, one

sample is taken every 1 / 44.1x103 = 22.675 s (microsecond) for a single channel

For DVD quality recording, one sample is taken every 1/ 192,000 = 5.20 s (microsecond) for a single channel.

You can expect it to have 4 times better accuracy and fidelity than a CD!

Applying Digital Audio In MM Systems

When using audio in a multimedia system, you have to consider some things, like : The format of the original file (ex. WAV) The overall amplitude (is it loud enough) Trimming (how long do you want it to be?) Fade In & Fade Out Time stretching Frequency and channels (ex. 44.1KHz, Stereo) Effects File size


The format we usually work with is WAV. It is uncompressed but allows us to preserve the

highest quality. If the waveform isn’t loud enough, you can either

‘normalize’ it, which increases the amplitude as high as it can get without clipping (pops).

Or if it still isn’t loud enough, you might need to use dynamics processing, which can enhance vocals over instruments.

For example, this is useful for making heavy metal songs (which usually have portions of whispering and loud guitar solos) more palatable to the audience using such a MM system.

Normalization

Before Normalization After Normalization

Applying Digital Audio In MM Systems Sometimes, we may only want a portion of the

music file, so we need to ‘trim’ it. This can be done by selecting the portion you

want to keep and then, executing the ‘trim’ command.

Usually, after trimming, we’re left with a sample that ‘just starts’ and ‘suddenly ends’.

To make it more pleasing to the ear, fade-ins and fade-outs are performed.

This is usually done to the first and last 5 seconds of the waveform (depending on how long your sample is), to make it seem as if the clip is just starting and ends properly

Trimming

A portion of the waveform is

selected

And then, the ‘trimming’ function removes

everything else, leaving just the part we require

Fades

The front portion (usually first 5 seconds) of the trimmed clip is selected

Then, the ‘fade in’ function is executed

The resulting clip

Applying Digital Audio In MM Systems Some sound files end abruptly with a simple drum beat or

scream. If it’s too short, you can always ‘time-stretch’ that portion. This powerful function usually distorts the waveform a

little. However, for a sharp, loud shriek at the end (of a song,

for example), stretching it to last maybe 0.25 seconds longer, might actually make it sound better.

Time stretching is also useful when your audio stream doesn’t quite match the video stream.

It can then be used to equal their lengths for synchronization. The better way, is of course, to adjust the video frame rate or delete some scenes, but that is another story.


You also need to decide, what frequency you wish to keep the audio sample in. If it’s a song, CD quality is standard (known as

Red Book or ISO 10149). If it’s merely narration, a lower frequency of

about 11.025 KHz, and a single channel (mono) is sufficient.

You can also do the same for simple music, like reducing the frequency to 22.05 KHz, which still sounds good.


Another technique is resampling.

This involves reducing the bitrate from 16 to 8 or 24 to 16 and so forth.

Resampling usually has a greater effect on the overall sound quality (lower bitrate means poorer quality).

So, it’s often best to retain the bitrate, but reduce the sample frequency.

All of this is done to save precious space and memory.

Applying Digital Audio In MM Systems There are times when you need to add certain

effects to your audio. Unlike the things we can do with images, audio

effects are not always so obvious. For example, we can mimic the stereo effect of

an inherently mono signal by duplicating the waveform (creating two channels from the single one), and then, slightly delaying one of them (left or right) to create a pseudo-stereo effect.

Other effects or functions like reverb, allow you to make a studio recording sound like a live performance or simply make your own voice recording sound a little better than it actually is.

Applying Digital Audio In MM Systems Finally, the file size is important. If you saved

as much memory as you can by using a lower frequency, shorter clip and lower bitrate, you can still save more by using a good compression scheme like mp3, which gives you a ratio of about 12:1.

There are often other audio compression codecs available to you as well.

Remember though, that compression has its price – the user will need to be able to decode that particular codec and usually, more processing power is required to play the file.

It may also cause synchronization problems with video. All things considered, developers usually still compress their audio.

Introduction To MIDI

MIDI

What is MIDI? MIDI is an acronym for Musical Instruments Digital Interface

Definition of MIDI: a protocol that enables computers, synthesizers, keyboards, and other musical devices to communicate with each other.

It is a set of instructions how a computer should play musical instruments.

History of MIDI

MIDI is a standard method for electronic musical equipment to pass messages to each other.

These messages can be as simple as play middle C until I tell you to stop or more complex like adjust the VCA bias on oscillator 6 to match oscillator 1.

MIDI was developed in the early 1980s and proceeded to completely change the musical instrument market and the course of music.

Its growth exceeded the wildest dreams of its inventors, and today, MIDI is entrenched to the point that you cannot buy a professional electric instrument without MIDI capabilities.

Terminologies:

Synthesizer: It is a sound generator (various pitch,

loudness etc.). A good (musician's) synthesizer often has

a microprocessor, keyboard, control panels, memory, etc.

Terminologies:

Sequencer: It can be a stand-alone unit or a software

program for a personal computer. (It used to be a storage server for MIDI data. Nowadays it is more a software music editor on the computer.)

It has one or more MIDI INs and MIDI OUTs.

Terminologies:

Track: Track in the sequencer is used to organize the

recordings. Tracks can be turned on or off on recording or

playback. To illustrate, one might record an oboe melody line

on Track Two, then record a bowed bass line on Track Three.

When played, the sounds can be simultaneous. Most MIDI software now accommodates 64 tracks

of music, enough for a rich orchestral sound. Important: Tracks are purely for convenience;

channels are required.

Terminologies:

Channel: MIDI channels are used to separate information

in a MIDI system. There are 16 MIDI channels in one cable. Each channel address one MIDI instrument. Channel numbers are coded into each MIDI

message.

Terminologies:

Timbre: The quality of the sound, e.g., flute

sound, cello sound, etc. Multi-timbral -- capable of playing many

different sounds at the same time (e.g., piano, brass, drums, etc.)

Pitch: musical note that the instrument plays

Terminologies:

Voice: Voice is the portion of the synthesizer that

produces sound. Synthesizers can have many (16, 20, 24, 32, 64,

etc.) voices. Each voice works independently and

simultaneously to produce sounds of different timbre and pitch.

Patch: the control settings that define a particular

timbre.

General MIDI

General MIDI MIDI + Instrument Patch Map + Percussion

Key Map a piece of MIDI music (usually) sounds the same anywhere it is played Instrument patch map is a standard program list

consisting of 128 patch types. Percussion map specifies 47 percussion sounds. Key-based percussion is always transmitted on

MIDI channel 10.

General MIDI

Requirements for General MIDI Compatibility:

Support all 16 channels. Each channel can play a different

instrument/program (multi-timbral). Each channel can play many voices

(polyphony). Minimum of 24 fully dynamically allocated

voices.

General MIDI

The playback on MIDI will only be accurate if the playback device is identical to the one used for production.

Even with the general MIDI standard, the sound of a MIDI instrument varies according to the electronics of the playback device and the sound generation method it uses.

MIDI is also unsuitable for spoken dialog. MIDI usually requires a certain amount of

knowledge in music theory.

Application Of MIDI

Is MIDI suitable for MM Systems? Sometimes it is, sometimes not. A webpage is a valid MM system. If the choice of music is not complex and

repetitive, then MIDI is ideal for playing in the background, because it is small and compatible.

Otherwise, most MM Systems work with digital audio, such as WAV, mp3 etc.

MIDI IN-OUT

Example

A musician pushes down (and holds down) the middle C key on a keyboard.

This causes a MIDI Note-On message to be sent out of the keyboard's MIDI OUT jack.

That message is received by the second instrument which sounds its middle C in unison.

MIDI Application – Cakewalk (sequencer)

Instrument Assignment (Multi-Timbral)

CGMB 324: Multimedia System design

Documents

Transcript of CGMB 324: Multimedia System design