CGMB 324: Multimedia System design

52
CGMB 324: MULTIMEDIA SYSTEM DESIGN Chapter 05: Multimedia Element III – Audio

description

CGMB 324: Multimedia System design. Chapter 05: Multimedia Element III – Audio. Objectives. Upon completing this chapter, you should be able to: Understand how audio data is being represented in computers Understand the MIDI file format Understand how to apply audio in multimedia systems. - PowerPoint PPT Presentation

Transcript of CGMB 324: Multimedia System design

Page 1: CGMB 324: Multimedia System design

CGMB 324: MULTIMEDIA SYSTEM DESIGN

Chapter 05: Multimedia Element III – Audio

Page 2: CGMB 324: Multimedia System design

Objectives

Upon completing this chapter, you should be able to:

Understand how audio data is being represented in computers

Understand the MIDI file format Understand how to apply audio in

multimedia systems

Page 3: CGMB 324: Multimedia System design

Digitization Of Sound

Page 4: CGMB 324: Multimedia System design

Digitization Of Sound

Facts about soundSound is a continuous wave that travels through the air, at around 1000 km/h. without air there is no sound !!! (like in space)The wave is made up of pressure differences. Sound is detected by measuring the pressure level at a certain location. Sound waves have normal wave properties

Reflection (bouncing) Refraction (change of angle when entering a medium

with different density) Diffraction (bending around an obstacle)

This makes the design of “surround sound” possible

Page 5: CGMB 324: Multimedia System design

Digitization Of Sound

Human ears can hear in the range of 20 Hz (a deep rumble) to about 20 kHz and at intensities starting –10dB to 25 dB.

This changes with age, as the sensitivity of our hearing usually reduces.

The intensity of sound can be measured in terms of Sound Pressure Level (SPL) in decibels (dBs).      

Page 6: CGMB 324: Multimedia System design

Digitization Of Sound

A turbojet engine might be as loud as 165 dB.

A car driving on the highway, about 100 dB.

And, a whisper, averaging around 35 dB.

We cannot hear very high frequencies outside our hearing range and neither can we hear very low ones.    

Page 7: CGMB 324: Multimedia System design

Digitization Of Sound

Sounds around us happen in a wide range of frequency and intensity.

For example, leaves rustling are very soft, or very low intensity (amplitude) sounds, but they have high frequencies.

A jet engine is a very high amplitude sound and it is also in the high frequency area.

A cargo truck is very loud (high intensity), but within the low frequency area.

Our speech sounds spread across many frequencies and vary in intensity.

Page 8: CGMB 324: Multimedia System design

Digitization Of Sound

Digitization In GeneralMicrophones and video cameras

produce analog signals (continuous-valued voltages)

Page 9: CGMB 324: Multimedia System design

Digitization Of Sound

In audio digitization, two components form the composite sinusoidal signal of the actual sound – the fundamental sine wave and the harmonics.

Amplitude = intensityFrequency = pitch

sine wav patternderive from y=sin x graph

Page 10: CGMB 324: Multimedia System design

ADCVoice 10010010000100

Sine wave

Analog - Digital Converter (ADC)

Page 11: CGMB 324: Multimedia System design

Digitization Of Sound

To get audio or video into a computer, we must digitize it (convert it into a string of numbers)

So, we have to understand discrete sampling

Sampling -- divide the horizontal axis (the time dimension) into discrete pieces. Uniform sampling is ubiquitous (everywhere at once).

Page 12: CGMB 324: Multimedia System design

Digitization Of Sound

Quantization (sampling in the amplitude)-- divide the vertical axis (signal strength) into pieces. Sometimes, a non-linear function is applied.

8-bit quantization divides the vertical axis into 256 levels.

16 bit gives you 65536 levels. Digital audio is a real representation of

sound in the form of bits and bytes.

Page 13: CGMB 324: Multimedia System design

Quantization & Sampling

Amplitude

Quantization Levels

Sampling t (seconds)

Page 14: CGMB 324: Multimedia System design

Quantization & Sampling

Question:

Given a 16 bit CD-quality musical piece to be sampled using 44.1KHz (sampling rate per second) for 10 minutes; what is the size of the file in mono and stereo? (Use x1024 conversion)

Answer:

44.1 x 103 x 16 bit x (10 x 60) / 8 (bits) = 50.486 MBytes

For stereo (left/right) channel, the amount is doubled:

50.486MBytes x 2 = 100.936 MBytes

Page 15: CGMB 324: Multimedia System design

44.1 x 103 x 16 bit x (10 x 60) = 423360000 bits (/8)

(mono) = 52920000 bytes (/1024) = 51679.688 KB (/1024)

= 50.468 MB

Stereo (mono x 2) = 50.468 x 2

= 100.936 MB

That’s why we need COMPRESSION!

Quantization & Sampling

Page 16: CGMB 324: Multimedia System design

Digitizing Audio

Questions for producing digital audio (Analog-to-Digital Conversion):

1. How often do you need to sample the signal?

2. How good is the signal? 3. How is audio data formatted?

Page 17: CGMB 324: Multimedia System design

Nyquist Theorem

Suppose we are sampling a waveform. How often do we need to sample it to figure out its frequency?

• The Nyquist Theorem, also known as the sampling theorem, is a principle that engineers follow in the digitization of analog signals. • For analog-to-digital conversion (ADC) to result in a faithful reproduction of the signal, slices, called samples, of the analog waveform must be taken frequently. • The number of samples per second is called the sampling rate or sampling frequency.

Page 18: CGMB 324: Multimedia System design

Nyquist Theorem

If we sample only once per cycle (blue area), we may think the signal is a constant.

Page 19: CGMB 324: Multimedia System design

Nyquist Theorem

If we sample at another low rate, e.g., 1.5 times per cycle, we may think it's a lower frequency waveform

Page 20: CGMB 324: Multimedia System design

Nyquist Theorem

Nyquist rate -- It can be proven that a bandwidth-limited signal can be fully reconstructed from its samples, if the sampling rate is at least twice the highest frequency in the signal.

The highest frequency component, in hertz, for a given analog signal is fmax.

According to the Nyquist Theorem, the sampling rate must be at least 2(fmax), or twice the highest analog frequency component

Page 21: CGMB 324: Multimedia System design

Typical Audio Formats

Popular audio file formats include .au (Unix workstations), .aiff (MAC), .wav (PC etc)

A simple and widely used audio compression method is Adaptive Delta Pulse Code Modulation (ADPCM). Based on past samples, it predicts the next

sample and encodes the difference between the actual value and the predicted value.

Page 22: CGMB 324: Multimedia System design

Audio Quality vs. Data Rate

Quality Sample

Rate(KHz)

Bits per

Sample

Mono / Stereo

Data Rate (if Uncompressed)

Frequency Band

Telephone 8 8 Mono 8 KBytes/sec 200-3,400 Hz

AM Radio 11.025 8 Mono 11.0 KBytes/sec - FM Radio 22.050 16 Stereo 88.2 KBytes/sec -

CD 44.1 16 Stereo 176.4 KBytes/sec 20-20,000 Hz

DAT 48 16 Stereo 192.0 KBytes/sec 20-20,000 Hz

DVD Audio

192 24 Stereo 1,152.0 KBytes/sec

20-20,000 Hz

Page 23: CGMB 324: Multimedia System design

Audio Quality vs. Data Rate For 44.1 KHz, CD quality recording, one

sample is taken every 1 / 44.1x103 = 22.675 s (microsecond) for a single channel

For DVD quality recording, one sample is taken every 1/ 192,000 = 5.20 s (microsecond) for a single channel.

You can expect it to have 4 times better accuracy and fidelity than a CD!

Page 24: CGMB 324: Multimedia System design

Applying Digital Audio In MM Systems

When using audio in a multimedia system, you have to consider some things, like : The format of the original file (ex. WAV) The overall amplitude (is it loud enough) Trimming (how long do you want it to be?) Fade In & Fade Out Time stretching Frequency and channels (ex. 44.1KHz, Stereo) Effects File size

Page 25: CGMB 324: Multimedia System design

Applying Digital Audio In MM Systems

The format we usually work with is WAV. It is uncompressed but allows us to preserve the

highest quality. If the waveform isn’t loud enough, you can either

‘normalize’ it, which increases the amplitude as high as it can get without clipping (pops).

Or if it still isn’t loud enough, you might need to use dynamics processing, which can enhance vocals over instruments.

For example, this is useful for making heavy metal songs (which usually have portions of whispering and loud guitar solos) more palatable to the audience using such a MM system.

Page 26: CGMB 324: Multimedia System design

Normalization

Before Normalization After Normalization

Page 27: CGMB 324: Multimedia System design

Applying Digital Audio In MM Systems Sometimes, we may only want a portion of the

music file, so we need to ‘trim’ it. This can be done by selecting the portion you

want to keep and then, executing the ‘trim’ command.

Usually, after trimming, we’re left with a sample that ‘just starts’ and ‘suddenly ends’.

To make it more pleasing to the ear, fade-ins and fade-outs are performed.

This is usually done to the first and last 5 seconds of the waveform (depending on how long your sample is), to make it seem as if the clip is just starting and ends properly

Page 28: CGMB 324: Multimedia System design

Trimming

A portion of the waveform is

selected

And then, the ‘trimming’ function removes

everything else, leaving just the part we require

Page 29: CGMB 324: Multimedia System design

Fades

The front portion (usually first 5 seconds) of the trimmed clip is selected

Then, the ‘fade in’ function is executed

The resulting clip

Page 30: CGMB 324: Multimedia System design

Fades

The front portion (usually first 5 seconds) of the trimmed clip is selected

Then, the ‘fade in’ function is executed

The resulting clip

Page 31: CGMB 324: Multimedia System design

Applying Digital Audio In MM Systems Some sound files end abruptly with a simple drum beat or

scream. If it’s too short, you can always ‘time-stretch’ that portion. This powerful function usually distorts the waveform a

little. However, for a sharp, loud shriek at the end (of a song,

for example), stretching it to last maybe 0.25 seconds longer, might actually make it sound better.

Time stretching is also useful when your audio stream doesn’t quite match the video stream.

It can then be used to equal their lengths for synchronization. The better way, is of course, to adjust the video frame rate or delete some scenes, but that is another story.

Page 32: CGMB 324: Multimedia System design

Applying Digital Audio In MM Systems

You also need to decide, what frequency you wish to keep the audio sample in. If it’s a song, CD quality is standard (known as

Red Book or ISO 10149). If it’s merely narration, a lower frequency of

about 11.025 KHz, and a single channel (mono) is sufficient.

You can also do the same for simple music, like reducing the frequency to 22.05 KHz, which still sounds good.

Page 33: CGMB 324: Multimedia System design

Applying Digital Audio In MM Systems

Another technique is resampling.

This involves reducing the bitrate from 16 to 8 or 24 to 16 and so forth.

Resampling usually has a greater effect on the overall sound quality (lower bitrate means poorer quality).

So, it’s often best to retain the bitrate, but reduce the sample frequency.

All of this is done to save precious space and memory.

Page 34: CGMB 324: Multimedia System design

Applying Digital Audio In MM Systems There are times when you need to add certain

effects to your audio. Unlike the things we can do with images, audio

effects are not always so obvious. For example, we can mimic the stereo effect of

an inherently mono signal by duplicating the waveform (creating two channels from the single one), and then, slightly delaying one of them (left or right) to create a pseudo-stereo effect.

Other effects or functions like reverb, allow you to make a studio recording sound like a live performance or simply make your own voice recording sound a little better than it actually is.

Page 35: CGMB 324: Multimedia System design

Applying Digital Audio In MM Systems Finally, the file size is important. If you saved

as much memory as you can by using a lower frequency, shorter clip and lower bitrate, you can still save more by using a good compression scheme like mp3, which gives you a ratio of about 12:1.

There are often other audio compression codecs available to you as well.

Remember though, that compression has its price – the user will need to be able to decode that particular codec and usually, more processing power is required to play the file.

It may also cause synchronization problems with video. All things considered, developers usually still compress their audio.

Page 36: CGMB 324: Multimedia System design

Introduction To MIDI

Page 37: CGMB 324: Multimedia System design

MIDI

What is MIDI? MIDI is an acronym for Musical Instruments Digital Interface

Definition of MIDI: a protocol that enables computers, synthesizers, keyboards, and other musical devices to communicate with each other.

It is a set of instructions how a computer should play musical instruments.

Page 38: CGMB 324: Multimedia System design

History of MIDI

MIDI is a standard method for electronic musical equipment to pass messages to each other.

These messages can be as simple as play middle C until I tell you to stop or more complex like adjust the VCA bias on oscillator 6 to match oscillator 1.

MIDI was developed in the early 1980s and proceeded to completely change the musical instrument market and the course of music.

Its growth exceeded the wildest dreams of its inventors, and today, MIDI is entrenched to the point that you cannot buy a professional electric instrument without MIDI capabilities.

Page 39: CGMB 324: Multimedia System design

Terminologies:

Synthesizer: It is a sound generator (various pitch,

loudness etc.). A good (musician's) synthesizer often has

a microprocessor, keyboard, control panels, memory, etc.

Page 40: CGMB 324: Multimedia System design

Terminologies:

Sequencer: It can be a stand-alone unit or a software

program for a personal computer. (It used to be a storage server for MIDI data. Nowadays it is more a software music editor on the computer.)

It has one or more MIDI INs and MIDI OUTs.

Page 41: CGMB 324: Multimedia System design

Terminologies:

Track: Track in the sequencer is used to organize the

recordings. Tracks can be turned on or off on recording or

playback. To illustrate, one might record an oboe melody line

on Track Two, then record a bowed bass line on Track Three.

When played, the sounds can be simultaneous. Most MIDI software now accommodates 64 tracks

of music, enough for a rich orchestral sound. Important: Tracks are purely for convenience;

channels are required.

Page 42: CGMB 324: Multimedia System design

Terminologies:

Channel: MIDI channels are used to separate information

in a MIDI system. There are 16 MIDI channels in one cable. Each channel address one MIDI instrument. Channel numbers are coded into each MIDI

message.

Page 43: CGMB 324: Multimedia System design

Terminologies:

Timbre: The quality of the sound, e.g., flute

sound, cello sound, etc. Multi-timbral -- capable of playing many

different sounds at the same time (e.g., piano, brass, drums, etc.)

Pitch: musical note that the instrument plays

Page 44: CGMB 324: Multimedia System design

Terminologies:

Voice: Voice is the portion of the synthesizer that

produces sound. Synthesizers can have many (16, 20, 24, 32, 64,

etc.) voices. Each voice works independently and

simultaneously to produce sounds of different timbre and pitch.

Patch: the control settings that define a particular

timbre.

Page 45: CGMB 324: Multimedia System design

General MIDI

General MIDI MIDI + Instrument Patch Map + Percussion

Key Map a piece of MIDI music (usually) sounds the same anywhere it is played Instrument patch map is a standard program list

consisting of 128 patch types. Percussion map specifies 47 percussion sounds. Key-based percussion is always transmitted on

MIDI channel 10.

Page 46: CGMB 324: Multimedia System design

General MIDI

Requirements for General MIDI Compatibility:

Support all 16 channels. Each channel can play a different

instrument/program (multi-timbral). Each channel can play many voices

(polyphony). Minimum of 24 fully dynamically allocated

voices.

Page 47: CGMB 324: Multimedia System design

General MIDI

The playback on MIDI will only be accurate if the playback device is identical to the one used for production.

Even with the general MIDI standard, the sound of a MIDI instrument varies according to the electronics of the playback device and the sound generation method it uses.

MIDI is also unsuitable for spoken dialog. MIDI usually requires a certain amount of

knowledge in music theory.

Page 48: CGMB 324: Multimedia System design

Application Of MIDI

Is MIDI suitable for MM Systems? Sometimes it is, sometimes not. A webpage is a valid MM system. If the choice of music is not complex and

repetitive, then MIDI is ideal for playing in the background, because it is small and compatible.

Otherwise, most MM Systems work with digital audio, such as WAV, mp3 etc.

Page 49: CGMB 324: Multimedia System design

MIDI IN-OUT

Page 50: CGMB 324: Multimedia System design

Example

A musician pushes down (and holds down) the middle C key on a keyboard.

This causes a MIDI Note-On message to be sent out of the keyboard's MIDI OUT jack.

That message is received by the second instrument which sounds its middle C in unison.

Page 51: CGMB 324: Multimedia System design

MIDI Application – Cakewalk (sequencer)

Page 52: CGMB 324: Multimedia System design

Instrument Assignment (Multi-Timbral)