ICT 514 Multimedia Systems - Murdoch...

80
school of information technology master of science in information technology ICT 514 Multimedia Systems Topic 4: Audio Ref: Chapman N. & Chapman J., “Digital Multimedia” 2 nd Edition Chapter 9 Lance Fung

Transcript of ICT 514 Multimedia Systems - Murdoch...

Page 1: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

school of information technologymaster of science in information technology

ICT 514Multimedia SystemsTopic 4: Audio

Ref: Chapman N. & Chapman J., “Digital Multimedia” 2nd Edition Chapter 9

Lance Fung

Page 2: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 2 school of information technology

Overview• Objectives of this lecture• Sound• Hearing• What is sound?• Digital sound• Sampling• Resolution• Storing• MIDI• Streaming• Editing• Playback

Page 3: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 3 school of information technology

Objectives

• At the end of this lecture, you will be able to:– describe what sound is and how we hear it– outline what is meant by sampling sound– compare WAV and MP3 in terms of file size

and the use of compression– understand what MIDI is and how MIDI files

are created– discuss the advantages/disadvantages of

streaming sound technology

Page 4: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 4 school of information technology

Sound• The theory of sound is complex - digital sound is even

more complex.• Sound is one of the most engaging and influential

elements of multimedia, be it music, special effects, or spoken words.

• Sound, particularly music, can stimulate emotions and associations.

– How often do you hear a particular song and immediately remember some event, person, experience associated with it?

• Like colours that create different emotions (see lecture on design principles), sound can make you feel dreamy, romantic, annoyed, frightened, shocked, etc.

Page 5: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 5 school of information technology

Sound• Physical disturbance: Sound is an organised

movement of molecules caused by a vibrating body in some medium.

• Sensation (psychological): Sound is the auditory sensation produced through the ear by the alteration in pressure.

Page 6: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 6 school of information technology

Sound

• How is sound heard?• Sound originates when a body moves back and

forth rapidly enough to send a wave through the medium in which it is vibrating.

• Sound is produced when two or more objects collide, releasing a wave of energy which in turn forces changes in the surrounding air pressure. – For example, clapping your hands. – Sound waves move in all directions from the disturbance,

like ripples produced when a stone is dropped into a pond.

Page 7: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 7 school of information technology

Sound• The vibrations travel as waves through some medium like

air and reach our ears.• This analogue wave causes our eardrum to vibrate and

the vibrations are transmitted by nerves to the brain.• We talk of digital sound but what we actually hear is

analogue (sound travelling through the air).• The ear is made of 3 parts:

– Outer Ear– Middle Ear– Inner Ear

Page 8: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

Outerear

Middleear

Innerear

Page 9: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 9 school of information technology

Page 10: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

Hearing – Outer Ear

The Outer Ear

• The outer ear is comprised of:

– auricle (or pinna)

– ear canal

Page 11: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 11 school of information technology

Hearing – Outer EarEar Canal• Each ear leads into an ear canal - an irregular cylinder

with an average diameter of less than 0.8 mm and about 2.5 cm long.

• The ear canal is open at the outer end which is surrounded by the pinna (or auricle). The pinna plays an important spatial focusing role in hearing.

• The canal narrows slightly and widens towards its inner end, which is sealed off by the eardrum (tympanic membrane).

• The canal is therefore like an organ pipe. It has a shaped tube enclosing a resonating column of air - with the combination of open and closed ends.

Page 12: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

Hearing – Outer Ear• The ear canal supports

(resonates or enhances) sound vibrations best at the frequencies which the human ears hear most sharply.

• This resonance amplifies the variations of air pressure that make up sound waves, placing a peak pressure directly at the eardrum.

• For frequencies between approximately 2 KHz and 5.5 KHz, the sound pressure level at the eardrum is approximately 10 times the pressure of the sound at the auricle.

Page 13: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

Hearing – Outer Ear

Eardrum• The eardrum is the interface between outer and

middle ear.• Airborne sound waves reach only as far as the

eardrum. Here they are converted into mechanical vibrations in the solid materials of the middle ear.

• Sounds (air pressure waves) first set up sympathetic vibrations in the taunt membrane of the eardrum, just as they do in the diaphragm of some types of microphone.

• The eardrum passes these vibrations on to the middle ear structure.

Page 14: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

Hearing – Middle Ear

The Middle Ear

Page 15: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 15 school of information technology

Hearing – Middle EarThe Middle Ear• The middle ear contains

three small bones known as:

– Malleus (hammer)– Incus (anvil)– Stapes (stirrup)

• These bones form a system of levers which are linked together and driven by the eardrum.

• The malleus pushes the incus, the incus pushes the stapes.

Page 16: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 16 school of information technology

Hearing – Middle Ear• Working together as a lever system, these bones amplify

the force of sound vibrations.• In combination, these bones (which are the smallest

bones in the human body) double or triple the force of the vibrations at the eardrum.

• The muscles of the middle ear (the tiniest muscles in the human body) modify the performance of these bones as an amplifying unit.

• These muscles also act as safety devices to protect the ear against excessively large vibrations from very loud sounds - a sort of automatic volume control.

• This is also the part that sometimes gets stuffed up when we have a cold.

Page 17: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 17 school of information technology

Hearing – Middle EarFrom Outer to Inner Ear• Within the 4 cm or so occupied by the outer and middle

ears, three distinct physical principles operate to magnify weak vibrations in air so that they can establish pressure waves in a liquid: – The organ pipe resonance of the ear canal may increase the air

pressure force 10 times.– The mechanical advantages of the bone system may nearly

triple it.– The pinpointing arrangement of the eardrum and the oval

window may provide another thirty fold increase.• The result of these three mechanisms may be an

amplification of a sound wave by more than 800 timesbefore it sets the liquid of the inner ear in motion.

Page 18: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

Hearing – Inner Ear

The Inner Ear

• Semi-circular canals (for balance)

• Cochlea (for sensing sound)

Page 19: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

Hearing

• The semicircular canals are the body's balance mechanism and it is thought that it plays no part in hearing.

• The cochleaincludes:– vestibular canal– tympanic canal– cochlea duct

Page 20: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 20 school of information technology

Hearing – Inner Ear• The fibres (hair cells) in the cochlea

detect sound. The fibres are long at one end of the cochlea and gradually become shorter at the other.– The length of the fibres seems to

determine the frequency of sound to which the fibre is sensitive.

• High frequency tones occur at the base of the cochlea and the lower frequencies towards the apex.

Page 21: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 21 school of information technology

Hearing – Inner Ear

• Sound in the cochlea branch of the auditory nerve is interpreted in the brain in terms of:– Pitch (determined by frequency of sound)– Loudness (amplitude of sound) – Timbre (quality, determined by the waveform of the sound).

• It is in the brain that we actually hear.• Human beings are sensitive to sound frequencies from

about 16Hz to 20KHz.– For comparison, dogs are sensitive to sound frequencies over

25KHz, and bats at about 100KHz.

Page 22: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 22 school of information technology

Sound properties• Pitch

– Analogous to colour in light (wavelength or frequency).– Sound pitch depends on the frequency.

• Intensity– Intensity of sound is the energy per second (power). Relative intensities

are measured in decibels.

• Loudness– Loudness is a sensation and it is therefore difficult to measure. Usually the

greater the intensity, the greater the loudness.

• Quality or Timbre– If the same note is played on a violin and a piano, you can tell the

difference. Each instrument introduces different overtones (background frequencies).

Page 23: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 23 school of information technology

What is sound?• Sound - in its analogue form - has a waveform, which has

2 characteristics:• 1. Frequency

– Frequency determines the pitch of a sound. – It is a measurement of how many waveforms there are in a given

period of time. The more waveforms there are moving one after another, the higher the pitch of the sound.

– Frequencies are measured in Hertz, which is the number of oscillations or waves per second, i.e 1 Hz = 1 cycle per second

• 2. Amplitude– The amplitude of a sound wave is the “height” of the sound wave at

a particular time.– This is the loudness or volume of the sound.– Simplistically, the higher the wave, the louder the sound.

Page 24: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 24 school of information technology

TimeAmplitude

Frequency

0

+

-

Analogue waveform

Page 25: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 25 school of information technology

High Pitch: this wave corresponds to a pure tone of 3,000 Hz (3 kHz)

Low Pitch: this wave corresponds to a pure tone of 300 Hz.

Page 26: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 26 school of information technology

Digital sound

• We’ll look at different processes in creating a digital sound file:– Sampling– Storing– File formats (WAVE, MP3, VQF)– Streaming– MIDI– Editing– Playback

Page 27: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 27 school of information technology

Sampling• Natural sound sources are analogue but sound

on the Web and the Internet is digital because a computer is used to record or playback the sound.– To record sound, an Analogue to Digital Converter (ADC) is

needed.– To playback sound, a Digital to Analogue Converter (DAC) is

needed.

• ADCs convert analogue to digital by sampling the analogue signal. Sampling a sound wave consists of determining the amplitude of the sound at a number of discrete times within a given time frame.

Page 28: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 28 school of information technology

Sampling

• To record an analogue sound on your computer, you need to take a snapshot of the sound wave every so often and then “join the dots” to reconstruct an image of the original wave.

Page 29: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 29 school of information technology

• This process is called sampling.– Digitised sound is sampled sound.– Every fraction of a second, a sample

of sound is taken and stored as digital information in bits.

– The number of times a sample is taken is called the sampling rate or sampling frequency.

– The sampling rate is measured in Hertz or KiloHertz (thousands of samples per second).

Page 30: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

Time

Samp

le Am

plitud

e

0

+

-

Time

Samp

le Am

plitud

e

0

+

-

Sampling the waveform

Low sampling frequency

Page 31: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 31 school of information technology

Sampling

• Obviously, the more often a sample is taken, the better the representation of the original, continuous sound. – If a low sampling rate is used to sample

high frequency sound, then much of the sound information is lost.

– What sampling rate should be used?

Page 32: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 32 school of information technology

• In the 1920s, H. Nyquist showed that a signal of bandwidth (H) could be completely reconstructed by making only 2H samples per second.– Simply, this means that the sampling rate must be

at least twice the highest frequency that is to be recorded.

– A bandwidth is the range of frequencies needed to transmit a signal. The larger the bandwidth, the larger the information carrying capacity. (Rather like the size of a pipe that is needed for a certain amount of liquid to flow.)

Page 33: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

How often to sample?

If we sample at 1 time per cycle we’ll think it’s a constant.

If we sample at 1.5 per cycle we’ll think it’s a a low frequency.

Page 34: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 34 school of information technology

Sampling• If the sampling rate is low, then aliasing will

occur.• If the sampling frequency is too high, it is a

waste of storage space.• For sound that we can hear, the highest

frequency is approximately 20KHz. – Therefore CD Audio is sampled at 44.1KHz.

• If a human voice is to be digitised then a sampling rate of about 10KHz would be enough because speech rarely contains frequencies above 5KHz

Page 35: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 35 school of information technology

Sampling

• Example sampling rates (samples/sec)– Telephone 8,000– AM radio 11,025– FM radio 22,050– Compact disk 44,100

Page 36: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 36 school of information technology

Sampling• There are three sampling frequencies

most often used in multimedia.– 11.025 kHz is sufficient for speech (it’s about as

low as you can go and still get reasonable results)– 22.05 kHz is OK depending on other factors.– 44.1 kHz is CD quality frequency for music– (Note that the first two are sub-multiples of CD

quality 44.1)– Obviously, the more samples/second, the larger

the sound file.

Page 37: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 37 school of information technology

Resolution• Each sample of data is stored with a certain

amount of information about the amplitude of the sound at that moment in time.

• The number of bits used to store the amplitude determines the resolution.– If we use only 4 bits to store this amplitude value (i.e. 4 bits per

sample), then only 16 possible values are available for the amplitude.

– 8 bits give 256 possible values.– 16 bits give 65,536 different values.

• Again, the more values the better the representation and the more values the larger the file.

Page 38: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 38 school of information technology

Resolution

• In the process of saving the amplitude to a certain number of bits, some data may be lost.

• If the number of bits available to record the amplitude is low, then there may be some clipping of the top and bottom of the wave, which severely distorts the sound.

Page 39: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 39 school of information technology

• The resolution at which a recording is made is determined by your sound card:– If your sound card is only capable of

recording in 8 bit then that is the best resolution you can get.

– If your sound card is capable of recording at 16 bit, then you can use the recording software to record in either 8 or 16 bit.

Page 40: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 40 school of information technology

Resolution• It may not necessarily be good to

store the highest and lowest amplitude in a sound signal (i.e. the full dynamic range).– These high and low amplitudes do not occur

frequently so if values are allocated to them then the more common amplitudes have a sparser allocation of values.

– This means that the accuracy at which the more common amplitudes are stored is low.

Page 41: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 41 school of information technology

• So sound can be recorded at a reduced amplitude range.– By reducing the range, the highest and lowest

amplitudes will be chopped off at the limit set. This is called clipping.

• 8-bit resolution is good enough for voice but music requires 16-bit resolution for high fidelity.

Page 42: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 42 school of information technology

Resolution

• Finally, there is the option of recording in mono or stereo.– We hear with two ears and catch different sounds

in each ear.– Stereophonic sound simulates this, so are more

lifelike.– Mono recordings pick up everything, but sound

dull and flat.

Page 43: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 43 school of information technology

• Stereo is achieved by recording on two channels instead of one. This means, of course, that twice the information is stored for every sample that is taken.– Obviously, stereo files will be twice as large for

each sampling frequency as mono files.

Page 44: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 44 school of information technology

Resolution

CoolEdit

Page 45: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 45 school of information technology

Resolution

For 1 second 16-bit resolution stereo on a CD:

44,100 x 16 x 2 = 1,411,200 bits

For 1 hour:

44,100 x 16 x 2 x 3,600 = ~635 Mb

Page 46: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 46 school of information technology

Storing• When the sampling has been done and the bits recorded

for every sample, this data needs to be stored.• Now you need to decide in which format to store the file:

– WAV– MP3– VQF– Liquid-Audio– a2B– RealAudio– Windows Audio 4.0

• We will look at some of these formats …

Page 47: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 47 school of information technology

Storing : WAV

• Wave (.wav) is the most basic and oldest sound file format, and is the Windows sound system format.– Windows sound effects are recorded as wav files.

• WAV is to sound what BMP is to graphics.

Page 48: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 48 school of information technology

• WAV doesn’t use any compression techniques. – They are constructed simply by taking samples of the sound

and then recording those as digital data.– The file is simply a recording of all data from all the samples

taken.

• Just as BMP is the largest of the graphic file formats, WAV files are the largest of the sound file formats.

• An 8 sec recording of Mozart in wav format is 1.3 Mb ...

Page 49: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 49 school of information technology

Storing : WAV• Wave (.wav) is the most basic and oldest sound

file format, and is the Windows sound system format.– Windows sound effects are recorded as wav files.

• MacOS sound format is AIFF• UNIX sound format is AU• All have broadly similar capabilities now. Each

platform supports compressed and uncompressed data, with a range of compressors.

Page 50: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 50 school of information technology

Storing : MP3

• MP3 stands for MPEG-1 Audio Layer 3. – Layer 3 is one of three coding schemes (layer 1,

layer 2 and layer 3) for the compression of audio signals.

– Each layer increases in complexity in the encoding process.

– Layer 3 uses compression to discard sounds which cannot be heard by humans, that is, the very low and very high frequencies.

Page 51: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 51 school of information technology

Storing : MP3

• MP3 uses a lossy compression technique– This means some data is lost but the loss in quality is not

noticed.– The compression is impressive - up to 12 times smaller than

a corresponding WAV file.– MP3 has become popular as a means of compressing

audio, particularly music, for downloading over the Internet, both legally and illegally.

Page 52: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 52 school of information technology

Storing : VQF

• VQF is another audio compression format. It is similar to MP3 in one regard: it takes large sound files, and compresses them down to very small files.

• However, VQF is regarded as doing it better than MP3s.

Page 53: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 53 school of information technology

• There are three important aspects to sound encoding:

1. File sizeVQF files are approximately 30-35% smaller than MP3 files.

2. Sound qualityThe sound quality of VQFs is much better than MP3s.

3. CPU usageHere VQFs are more cumbersome than MP3s. However, they were meant to be so. When MP3s were developed, Pentiums were superior. With Pentium IIIs and later CPUs, and other multimedia enhanced computers, the load can be handled by most machines. This is what allows it to pack as much (or more) sound data into a 30% smaller file!!

Page 54: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 54 school of information technology

Streaming

• Concerts, speeches, lectures, etc. are now recorded in a way that allows you to listen to them in real time.

• The technique used is called streaming.– Streaming audio is not a file format, but a technique for

playing back audio (and other) data transmitted over the Internet and other communications channels.

– Normally when you click on a sound file link, the sound file is downloaded to the computer’s memory, your player is opened and then the sound begins to play.

Page 55: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 55 school of information technology

• With the streaming technique, the sound is compressed and then sent through the communication channels in a continuous stream.– This means that the user doesn’t have to wait for a

large file to download.

Page 56: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 56 school of information technology

Streaming

• Streaming uses a lossy compression.– It reduces a file by a factor of 100.– The quality, of course, is not as good as MP3.

• Streaming audio must be encoded and played back with the right software, such as RealAudio.

Page 57: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 57 school of information technology

• The problem with streaming data is that often the speed at which the data comes across the Internet and the speed at which the player can play the data are not the same.– The data may arrive too fast and bits have to be cached

(stored temporarily) until the player can catch up.– The data may arrive too slowly, because of congestion on

the transmission channels, and the sound becomes very jerky.

Page 58: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 58 school of information technology

MIDI

• MIDI stands for Musical Instrument Digital Interface.– MIDI is a standard adopted by the electronic

music industry for controlling computeriseddevices such as synthesisers.

– It is a standard language for recording the sound from electronic music hardware.

– Because of this standard language, electronic instruments and computers can communicate with each other.

Page 59: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 59 school of information technology

• Rather than sampling existing sound, MIDI is a technique and format for recording original music which is played into the computer.– Normally this is done using a synthesiser (a computerised

musical instrument which produces digital rather than analogue sound).

– But you can simulate the synthesiser with the computer keyboard.

Page 60: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 60 school of information technology

MIDI

• MIDI files, then, do not record music.– MIDI files store a description of the music that is

played by the synthesiser.– The MIDI representation of a sound always

includes values for the note’s pitch, length and volume.

– The file may also include additional characteristics of the music.

Page 61: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 61 school of information technology

• MIDI files are much smaller than WAV.– MIDI files are to sound what vector graphics are to

images.– A 2-minute .mid recording is only 12K.– Listen to the range of instrumental sounds that

can be replicated with MIDI, but also notice that the music does sound synthetic, a computer-generated replica of music …

Page 62: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 62 school of information technology

• MIDI interface allows computer to send MIDI data to instruments

• Store MIDI sequences in files, exchange them between computers, incorporate into multimedia

• Computer can synthesize sounds on a sound card, or play back samples from disk in response to MIDI instructions

– Computer becomes primitive musical instrument (quality of sound inferior to dedicated instruments)

MIDI and Computers

Page 63: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 63 school of information technology

• Instructions that control some aspect of the performance of an instrument

• Status byte – indicates type of message• 2 data bytes – values of parameters

– e.g. Note On + note number (0..127) + key velocity

• Running status – omit status byte if it is the same as preceding one

MIDI Messages

Page 64: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 64 school of information technology

• Synths and samplers provide a variety of voices

• MIDI Program Change message selects a new voice, but mapping from values to voices is not defined in the MIDI standard

• General MIDI (addendum to standard) specifies 128 standard voices for Program Change values

– Actually GM specifies voice names, no guarantee that identical sounds will be produced on different instruments

General MIDI

Page 65: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 65 school of information technology

Editing• As with a graphics file, you can manipulate a

sound file.– One of the obvious things you might want to do is to remove parts

of the file that you don’t like (like cropping a photo), and add other parts. Or reverse the sound (like Paul Simon, Gracelands)

– Remember, though, to edit sound files well, you need to have a good knowledge of music and digital sound files. But you can experiment - maybe this is what will interest you most and you will learn more about it.

• CoolEdit will not edit MP3 or MIDI files.– MP3 is a final (compressed) file format.– You will need an original digital WAV file to edit.

Page 66: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 66 school of information technology

• Timeline divided into tracks• Sound on each track displayed as a

waveform• 'Scrub' over part of a track e.g. to find

pauses• Cut and paste, drag and drop• May combine many tracks from different

recordings (mix-down)

Sound Editing

Page 67: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 67 school of information technology

• Noise gate• Low pass and high pass filters• Notch filter• De-esser• Click repairer• Reverb• Graphic equalizer• Envelope Shaping• Pitch alteration and time stretching• etc

Effects and Filters

Page 68: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 68 school of information technology

• In general, lossy methods required because of complex and unpredictable nature of audio data

• CD quality, stereo, 3-minute song requires over 25 Mbytes

–Data rate exceeds bandwidth of dial-up Internet connection

• Difference in the way we perceive sound and image means different approach from image compression is needed

Compression

Page 69: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 69 school of information technology

• Non-linear quantization

• Higher quantization levels spaced further apart than lower ones

• Quiet sounds represented in greater detail than loud ones

Companding

Page 70: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 70 school of information technology

•Differential Pulse Code Modulation– Similar to video inter-frame compression– Compute a predicted value for next sample, store

the difference between prediction and actual value

•Adaptive Differential Pulse Code Modulation

– Dynamically vary step size used to store quantized differences

ADPCM

Page 71: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 71 school of information technology

• Identify and discard data that doesn't affect the perception of the signal

– Needs a psycho-acoustical model, since ear and brain do not respond to sound waves in a simple way

• Threshold of hearing – sounds too quiet to hear

• Masking – sound obscured by some other sound

Perceptually-Based Compression

Page 72: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 72 school of information technology

The Threshold of Hearing

Page 73: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 73 school of information technology

Masking

Page 74: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 74 school of information technology

• Split signal into bands of frequencies using filters– Commonly use 32 bands

• Compute masking level for each band, based on its average value and a psycho-acoustical model

– i.e. approximate masking curve by a single value for each band

• Discard signal if it is below masking level• Otherwise quantize using the minimum number of

bits that will mask quantization noise

Compression Algorithm

300

Page 75: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 75 school of information technology

Incorporating into your web page• You can choose if you want your sound

file to be foreground music or background music.

• Foreground music:– For WAV or MID files, use the regular href tags, e.g.

• <a href=“filename.mid”>Mozart music</a>• <a href=“filename.wav”>Yesterday music</a>

Page 76: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 76 school of information technology

Playback

• Background music:– More problematic, because Netscape and IE use

different tags.Netscape uses <embed> tags.IE uses <bgsound> tags (add this as an attribute

in the <body> tag).

Page 77: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 77 school of information technology

– To have both browsers playback, use the following HTML code:<embed src="filename.mid" width=“145” height=“55”><noembed><bgsound src=“filename.mid”></noembed>

– There are other attributes you can add to the embed tag, such as:autostart=“true” -- for playing onceautostart=“true” loop=“true” -- for playing continuously

• For more about HTML for sound, see the Annabella’s URI in the reference section.

Page 78: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 78 school of information technology

Summary• How do we hear? • Where is the sound interpreted?• What is sound?• What are the two characteristics of a waveform?• How do you digitise sound?• What is the sampling rate?• What sampling frequency would you use for speech?

CD?• What is an uncompressed format of a sound file?• What sort of compression is an MP3 file?• What is streaming?• What sort of compression does streaming use?

Page 79: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 79 school of information technology

ReferencesBeekman, Brent and Rathswohl, pp. 50-51, 57-59, 162-164.Play it again Sam, APC, August 1999.Nyquist,

http://www.cs.sfu.ca/undergrad/CourseMaterials/CMPT365/material/notes/Chap3/Chap3.1/Chap3.1.html

URLs:Physiology of Hearing, http://online.anu.edu.au/ITA/ACAT/drw/PPofM/hearing/hearing1.htmlhttp://www.mcl.tulane.edu/departments/pathology/fermin/Hearing.htmlhttp://www.medimagery.net/Hearing-Animation.htm http://www.iurc.montp.inserm.fr/cric/audition/english/sound/fsound.htm http://hyperphysics.phy-astr.gsu.edu/hbase/sound/ear.html

Page 80: ICT 514 Multimedia Systems - Murdoch Universityftp.it.murdoch.edu.au/Units/ICT514/Lectures/ICT514_06_04_Slides.pdf– Timbre (quality, determined by the waveform of the sound). •

ICT 514_06_Lect 4 80 school of information technology

HTML Help for Music: http://www.annabella.net/music.htmlBeginner’s MP3 Player Setup: http://help.mp3.com/help/gettingstarted/guide.htmlDigitalising Sound: http://opus1.com/~violist/help/adc.htmlMIDI: http://www.harmony-central.com/MIDI/Doc/doc.htmlDownloads for audio/video: http://www.real.comSound cards: http://www.pctechguide.com/11sound.htm