CGMB 324: Multimedia System design
-
Upload
wesley-buckley -
Category
Documents
-
view
45 -
download
0
description
Transcript of CGMB 324: Multimedia System design
![Page 1: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/1.jpg)
CGMB 324: MULTIMEDIA SYSTEM DESIGN
Chapter 05: Multimedia Element III – Audio
![Page 2: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/2.jpg)
Objectives
Upon completing this chapter, you should be able to:
Understand how audio data is being represented in computers
Understand the MIDI file format Understand how to apply audio in
multimedia systems
![Page 3: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/3.jpg)
Digitization Of Sound
![Page 4: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/4.jpg)
Digitization Of Sound
Facts about soundSound is a continuous wave that travels through the air, at around 1000 km/h. without air there is no sound !!! (like in space)The wave is made up of pressure differences. Sound is detected by measuring the pressure level at a certain location. Sound waves have normal wave properties
Reflection (bouncing) Refraction (change of angle when entering a medium
with different density) Diffraction (bending around an obstacle)
This makes the design of “surround sound” possible
![Page 5: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/5.jpg)
Digitization Of Sound
Human ears can hear in the range of 20 Hz (a deep rumble) to about 20 kHz and at intensities starting –10dB to 25 dB.
This changes with age, as the sensitivity of our hearing usually reduces.
The intensity of sound can be measured in terms of Sound Pressure Level (SPL) in decibels (dBs).
![Page 6: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/6.jpg)
Digitization Of Sound
A turbojet engine might be as loud as 165 dB.
A car driving on the highway, about 100 dB.
And, a whisper, averaging around 35 dB.
We cannot hear very high frequencies outside our hearing range and neither can we hear very low ones.
![Page 7: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/7.jpg)
Digitization Of Sound
Sounds around us happen in a wide range of frequency and intensity.
For example, leaves rustling are very soft, or very low intensity (amplitude) sounds, but they have high frequencies.
A jet engine is a very high amplitude sound and it is also in the high frequency area.
A cargo truck is very loud (high intensity), but within the low frequency area.
Our speech sounds spread across many frequencies and vary in intensity.
![Page 8: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/8.jpg)
Digitization Of Sound
Digitization In GeneralMicrophones and video cameras
produce analog signals (continuous-valued voltages)
![Page 9: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/9.jpg)
Digitization Of Sound
In audio digitization, two components form the composite sinusoidal signal of the actual sound – the fundamental sine wave and the harmonics.
Amplitude = intensityFrequency = pitch
sine wav patternderive from y=sin x graph
![Page 10: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/10.jpg)
ADCVoice 10010010000100
Sine wave
Analog - Digital Converter (ADC)
![Page 11: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/11.jpg)
Digitization Of Sound
To get audio or video into a computer, we must digitize it (convert it into a string of numbers)
So, we have to understand discrete sampling
Sampling -- divide the horizontal axis (the time dimension) into discrete pieces. Uniform sampling is ubiquitous (everywhere at once).
![Page 12: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/12.jpg)
Digitization Of Sound
Quantization (sampling in the amplitude)-- divide the vertical axis (signal strength) into pieces. Sometimes, a non-linear function is applied.
8-bit quantization divides the vertical axis into 256 levels.
16 bit gives you 65536 levels. Digital audio is a real representation of
sound in the form of bits and bytes.
![Page 13: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/13.jpg)
Quantization & Sampling
Amplitude
Quantization Levels
Sampling t (seconds)
![Page 14: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/14.jpg)
Quantization & Sampling
Question:
Given a 16 bit CD-quality musical piece to be sampled using 44.1KHz (sampling rate per second) for 10 minutes; what is the size of the file in mono and stereo? (Use x1024 conversion)
Answer:
44.1 x 103 x 16 bit x (10 x 60) / 8 (bits) = 50.486 MBytes
For stereo (left/right) channel, the amount is doubled:
50.486MBytes x 2 = 100.936 MBytes
![Page 15: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/15.jpg)
44.1 x 103 x 16 bit x (10 x 60) = 423360000 bits (/8)
(mono) = 52920000 bytes (/1024) = 51679.688 KB (/1024)
= 50.468 MB
Stereo (mono x 2) = 50.468 x 2
= 100.936 MB
That’s why we need COMPRESSION!
Quantization & Sampling
![Page 16: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/16.jpg)
Digitizing Audio
Questions for producing digital audio (Analog-to-Digital Conversion):
1. How often do you need to sample the signal?
2. How good is the signal? 3. How is audio data formatted?
![Page 17: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/17.jpg)
Nyquist Theorem
Suppose we are sampling a waveform. How often do we need to sample it to figure out its frequency?
• The Nyquist Theorem, also known as the sampling theorem, is a principle that engineers follow in the digitization of analog signals. • For analog-to-digital conversion (ADC) to result in a faithful reproduction of the signal, slices, called samples, of the analog waveform must be taken frequently. • The number of samples per second is called the sampling rate or sampling frequency.
![Page 18: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/18.jpg)
Nyquist Theorem
If we sample only once per cycle (blue area), we may think the signal is a constant.
![Page 19: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/19.jpg)
Nyquist Theorem
If we sample at another low rate, e.g., 1.5 times per cycle, we may think it's a lower frequency waveform
![Page 20: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/20.jpg)
Nyquist Theorem
Nyquist rate -- It can be proven that a bandwidth-limited signal can be fully reconstructed from its samples, if the sampling rate is at least twice the highest frequency in the signal.
The highest frequency component, in hertz, for a given analog signal is fmax.
According to the Nyquist Theorem, the sampling rate must be at least 2(fmax), or twice the highest analog frequency component
![Page 21: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/21.jpg)
Typical Audio Formats
Popular audio file formats include .au (Unix workstations), .aiff (MAC), .wav (PC etc)
A simple and widely used audio compression method is Adaptive Delta Pulse Code Modulation (ADPCM). Based on past samples, it predicts the next
sample and encodes the difference between the actual value and the predicted value.
![Page 22: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/22.jpg)
Audio Quality vs. Data Rate
Quality Sample
Rate(KHz)
Bits per
Sample
Mono / Stereo
Data Rate (if Uncompressed)
Frequency Band
Telephone 8 8 Mono 8 KBytes/sec 200-3,400 Hz
AM Radio 11.025 8 Mono 11.0 KBytes/sec - FM Radio 22.050 16 Stereo 88.2 KBytes/sec -
CD 44.1 16 Stereo 176.4 KBytes/sec 20-20,000 Hz
DAT 48 16 Stereo 192.0 KBytes/sec 20-20,000 Hz
DVD Audio
192 24 Stereo 1,152.0 KBytes/sec
20-20,000 Hz
![Page 23: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/23.jpg)
Audio Quality vs. Data Rate For 44.1 KHz, CD quality recording, one
sample is taken every 1 / 44.1x103 = 22.675 s (microsecond) for a single channel
For DVD quality recording, one sample is taken every 1/ 192,000 = 5.20 s (microsecond) for a single channel.
You can expect it to have 4 times better accuracy and fidelity than a CD!
![Page 24: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/24.jpg)
Applying Digital Audio In MM Systems
When using audio in a multimedia system, you have to consider some things, like : The format of the original file (ex. WAV) The overall amplitude (is it loud enough) Trimming (how long do you want it to be?) Fade In & Fade Out Time stretching Frequency and channels (ex. 44.1KHz, Stereo) Effects File size
![Page 25: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/25.jpg)
Applying Digital Audio In MM Systems
The format we usually work with is WAV. It is uncompressed but allows us to preserve the
highest quality. If the waveform isn’t loud enough, you can either
‘normalize’ it, which increases the amplitude as high as it can get without clipping (pops).
Or if it still isn’t loud enough, you might need to use dynamics processing, which can enhance vocals over instruments.
For example, this is useful for making heavy metal songs (which usually have portions of whispering and loud guitar solos) more palatable to the audience using such a MM system.
![Page 26: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/26.jpg)
Normalization
Before Normalization After Normalization
![Page 27: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/27.jpg)
Applying Digital Audio In MM Systems Sometimes, we may only want a portion of the
music file, so we need to ‘trim’ it. This can be done by selecting the portion you
want to keep and then, executing the ‘trim’ command.
Usually, after trimming, we’re left with a sample that ‘just starts’ and ‘suddenly ends’.
To make it more pleasing to the ear, fade-ins and fade-outs are performed.
This is usually done to the first and last 5 seconds of the waveform (depending on how long your sample is), to make it seem as if the clip is just starting and ends properly
![Page 28: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/28.jpg)
Trimming
A portion of the waveform is
selected
And then, the ‘trimming’ function removes
everything else, leaving just the part we require
![Page 29: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/29.jpg)
Fades
The front portion (usually first 5 seconds) of the trimmed clip is selected
Then, the ‘fade in’ function is executed
The resulting clip
![Page 30: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/30.jpg)
Fades
The front portion (usually first 5 seconds) of the trimmed clip is selected
Then, the ‘fade in’ function is executed
The resulting clip
![Page 31: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/31.jpg)
Applying Digital Audio In MM Systems Some sound files end abruptly with a simple drum beat or
scream. If it’s too short, you can always ‘time-stretch’ that portion. This powerful function usually distorts the waveform a
little. However, for a sharp, loud shriek at the end (of a song,
for example), stretching it to last maybe 0.25 seconds longer, might actually make it sound better.
Time stretching is also useful when your audio stream doesn’t quite match the video stream.
It can then be used to equal their lengths for synchronization. The better way, is of course, to adjust the video frame rate or delete some scenes, but that is another story.
![Page 32: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/32.jpg)
Applying Digital Audio In MM Systems
You also need to decide, what frequency you wish to keep the audio sample in. If it’s a song, CD quality is standard (known as
Red Book or ISO 10149). If it’s merely narration, a lower frequency of
about 11.025 KHz, and a single channel (mono) is sufficient.
You can also do the same for simple music, like reducing the frequency to 22.05 KHz, which still sounds good.
![Page 33: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/33.jpg)
Applying Digital Audio In MM Systems
Another technique is resampling.
This involves reducing the bitrate from 16 to 8 or 24 to 16 and so forth.
Resampling usually has a greater effect on the overall sound quality (lower bitrate means poorer quality).
So, it’s often best to retain the bitrate, but reduce the sample frequency.
All of this is done to save precious space and memory.
![Page 34: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/34.jpg)
Applying Digital Audio In MM Systems There are times when you need to add certain
effects to your audio. Unlike the things we can do with images, audio
effects are not always so obvious. For example, we can mimic the stereo effect of
an inherently mono signal by duplicating the waveform (creating two channels from the single one), and then, slightly delaying one of them (left or right) to create a pseudo-stereo effect.
Other effects or functions like reverb, allow you to make a studio recording sound like a live performance or simply make your own voice recording sound a little better than it actually is.
![Page 35: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/35.jpg)
Applying Digital Audio In MM Systems Finally, the file size is important. If you saved
as much memory as you can by using a lower frequency, shorter clip and lower bitrate, you can still save more by using a good compression scheme like mp3, which gives you a ratio of about 12:1.
There are often other audio compression codecs available to you as well.
Remember though, that compression has its price – the user will need to be able to decode that particular codec and usually, more processing power is required to play the file.
It may also cause synchronization problems with video. All things considered, developers usually still compress their audio.
![Page 36: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/36.jpg)
Introduction To MIDI
![Page 37: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/37.jpg)
MIDI
What is MIDI? MIDI is an acronym for Musical Instruments Digital Interface
Definition of MIDI: a protocol that enables computers, synthesizers, keyboards, and other musical devices to communicate with each other.
It is a set of instructions how a computer should play musical instruments.
![Page 38: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/38.jpg)
History of MIDI
MIDI is a standard method for electronic musical equipment to pass messages to each other.
These messages can be as simple as play middle C until I tell you to stop or more complex like adjust the VCA bias on oscillator 6 to match oscillator 1.
MIDI was developed in the early 1980s and proceeded to completely change the musical instrument market and the course of music.
Its growth exceeded the wildest dreams of its inventors, and today, MIDI is entrenched to the point that you cannot buy a professional electric instrument without MIDI capabilities.
![Page 39: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/39.jpg)
Terminologies:
Synthesizer: It is a sound generator (various pitch,
loudness etc.). A good (musician's) synthesizer often has
a microprocessor, keyboard, control panels, memory, etc.
![Page 40: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/40.jpg)
Terminologies:
Sequencer: It can be a stand-alone unit or a software
program for a personal computer. (It used to be a storage server for MIDI data. Nowadays it is more a software music editor on the computer.)
It has one or more MIDI INs and MIDI OUTs.
![Page 41: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/41.jpg)
Terminologies:
Track: Track in the sequencer is used to organize the
recordings. Tracks can be turned on or off on recording or
playback. To illustrate, one might record an oboe melody line
on Track Two, then record a bowed bass line on Track Three.
When played, the sounds can be simultaneous. Most MIDI software now accommodates 64 tracks
of music, enough for a rich orchestral sound. Important: Tracks are purely for convenience;
channels are required.
![Page 42: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/42.jpg)
Terminologies:
Channel: MIDI channels are used to separate information
in a MIDI system. There are 16 MIDI channels in one cable. Each channel address one MIDI instrument. Channel numbers are coded into each MIDI
message.
![Page 43: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/43.jpg)
Terminologies:
Timbre: The quality of the sound, e.g., flute
sound, cello sound, etc. Multi-timbral -- capable of playing many
different sounds at the same time (e.g., piano, brass, drums, etc.)
Pitch: musical note that the instrument plays
![Page 44: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/44.jpg)
Terminologies:
Voice: Voice is the portion of the synthesizer that
produces sound. Synthesizers can have many (16, 20, 24, 32, 64,
etc.) voices. Each voice works independently and
simultaneously to produce sounds of different timbre and pitch.
Patch: the control settings that define a particular
timbre.
![Page 45: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/45.jpg)
General MIDI
General MIDI MIDI + Instrument Patch Map + Percussion
Key Map a piece of MIDI music (usually) sounds the same anywhere it is played Instrument patch map is a standard program list
consisting of 128 patch types. Percussion map specifies 47 percussion sounds. Key-based percussion is always transmitted on
MIDI channel 10.
![Page 46: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/46.jpg)
General MIDI
Requirements for General MIDI Compatibility:
Support all 16 channels. Each channel can play a different
instrument/program (multi-timbral). Each channel can play many voices
(polyphony). Minimum of 24 fully dynamically allocated
voices.
![Page 47: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/47.jpg)
General MIDI
The playback on MIDI will only be accurate if the playback device is identical to the one used for production.
Even with the general MIDI standard, the sound of a MIDI instrument varies according to the electronics of the playback device and the sound generation method it uses.
MIDI is also unsuitable for spoken dialog. MIDI usually requires a certain amount of
knowledge in music theory.
![Page 48: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/48.jpg)
Application Of MIDI
Is MIDI suitable for MM Systems? Sometimes it is, sometimes not. A webpage is a valid MM system. If the choice of music is not complex and
repetitive, then MIDI is ideal for playing in the background, because it is small and compatible.
Otherwise, most MM Systems work with digital audio, such as WAV, mp3 etc.
![Page 49: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/49.jpg)
MIDI IN-OUT
![Page 50: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/50.jpg)
Example
A musician pushes down (and holds down) the middle C key on a keyboard.
This causes a MIDI Note-On message to be sent out of the keyboard's MIDI OUT jack.
That message is received by the second instrument which sounds its middle C in unison.
![Page 51: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/51.jpg)
MIDI Application – Cakewalk (sequencer)
![Page 52: CGMB 324: Multimedia System design](https://reader036.fdocuments.us/reader036/viewer/2022062408/568132a8550346895d994a2e/html5/thumbnails/52.jpg)
Instrument Assignment (Multi-Timbral)