Transcript of GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems...
- Slide 1
- Slide 2
- GCT731 Fall 2014 Topics in Music Technology - Music Information
Retrieval Overview of MIR Systems Audio and Music Representations
(Part 1) 1
- Slide 3
- Outlines Overview of MIR system Human Listening Machine
Listening Audio and Music representations Time-domain
representation Waveform Time-frequency domain representations
Sinusoids DFT STFT, Spectrogram 2
- Slide 4
- Human Listening L. Watts, Visualizing Complexity in the Brain,
2003 3 Ears Auditory Transduction http://www.youtube.com/
watch?v=PeTriGTENoc
- Slide 5
- Machine Listening Emulated the human auditory system? Well, it
might be better to understand the functionalities in a high level
and implement them in efficient ways for machines Basic
Functionalities Capture sounds and convert the air vibration to an
accessible form (by the machine) Transform the input to have a
better view of sounds Extract only necessary part Obtain desired
information from the extracted part 4
- Slide 6
- (Content-based) MIR System 5 Algorithms Feature Extraction
Sound Capture Representation Transform Block Diagram of MIR
system
- Slide 7
- Sound Capture 6 Microphone Mechanical vibration to electrical
signals A-D converter Sampling and Quantization Produce digital
waveforms Often, we store the waveforms as audio files If
necessary, the audio files are compressed (mp3, wma, ) Algorithms
Feature Extraction Sound Capture Representation Transform
- Slide 8
- Representation Transform Transform waveforms to have better
view of sounds Mostly using sinusoidal basis functions Types
Short-time Fourier Transform (STFT): Spectrogram Constant-Q
transform Auditory filter banks Remapped spectrogram (frequency or
amplitude) Auto-correlation 7 Algorithms Feature Extraction Sound
Capture Representation Transform
- Slide 9
- Feature Extraction and Algorithms Feature extraction Extract
only necessary variations in the data representation Algorithms
Determine categories or specific values through training Two
approaches in feature extraction and algorithms Heuristic approach:
make computational rules based on domain knowledge and
trial-and-error Learning-based approach: training the system using
labeled (or unlabeled) data The rest of this course is all about
this 8 Algorithms Feature Extraction Sound Capture Representation
Transform
- Slide 10
- Sound Capture 9 Microphone Mechanical vibration to electrical
signals Followed by pre-amplifiers Microphones and pre-amps have
characteristic frequency responses that colorize the input sound
A-D converter Sampling: continuous-to discrete-time signals
Quantization: finite numbers of amplitude steps Produce digital
waveforms Often, multiple input channels are used Stereo (2-ch) is
standard in music recordings Microphone arrays: good for sound
localization and spatial filtering (e.g. beam-forming )
- Slide 11
- Sampling Convert continuous signals to a series of discrete
numbers by uniformly picking up the signal values in time Sampling
theorem Sampling rate must be twice as high as the highest
frequency the continuous signals contain. Lowpass filter is applied
before sampling to avoid aliasing Human can hear up to 20kHz
Sampling rate of 40kHz or above Examples of sampling rates Speech:
8kHz, 16kHz Music: 22.05Hz, 44.1KHz, 48KHz Professional audio
gears: 48kHz, 96kHz 10
- Slide 12
- Quantization 11 Convert continuous level of values to a finite
set of steps in amplitude Create quantization error Can be regarded
as additive noise Sufficient number of quantization steps is
necessary to prevent the noise from being audible Examples of
quantization steps 8 bit: 48dB (dynamic range) 16 bit: 96dB 24 bit:
144dB Human ears: about 110 dB (depending on frequency)
- Slide 13
- (Digital) Waveform 12 The most basic audio representation that
computers can take x(n) = [a1, a2, a3,...] Good to view energy
change Overall dynamic range when zoomed out Fine-time note onset
when zoomed in But not very intuitive
- Slide 14
- Another View of Waveform Waveform can be seen as representing
signals with the following basis functions For example, the signal
x(n) is like: Can we find better basis functions? New basis
functions: 13
- Slide 15
- Sinusoids A periodic waveform drawn from a circle Why sinusoids
are important Fundamental in Physics Eigen-functions of linear
systems Human ears is a kind of spectrum analyzer 14 : Amplitude :
Angular Frequency : Initial Phase
- Slide 16
- Discrete Fourier Transform (DFT) Complex Sinusoid By Eulers
Identity: Discrete Fourier Transform Inner product with complex
sinusoid Inverse Discrete Fourier Transform 15
- Slide 17
- DFT Inverse DFT Basis Function View Practical Form of DFT
16
- Slide 18
- Matrix Multiplication View of DFT In fact, we dont compute this
directly. There is a more efficiently way, which is called Fast
Fourier Transform (FFT) Complexity reduction by FFT: O( N 2 ) O(
Nlog 2 N ) Practical Form of DFT 17
- Slide 19
- Practical Form of DFT DFT produces complex numbers! Magnitude
Correspond to energy at frequency k Phase Corresponds to phase at
frequency k 18
- Slide 20
- Examples of DFT 19 Sine waveform Drum Flute
- Slide 21
- Short-Time Fourier Transform (STFT) DFT assumes that the signal
is stationary It is not a good idea to apply DFT to long and
dynamically changing signals like music Instead, we segment the
signal and apply DFT separately Short-Time Fourier Transform
1.Segment a frame using a window function 2.Zero-padding if
necessary 3.Apply DFT to the zero-padded windowed waveform
4.Progress by hop size 5.Repeat step 1-4 This produces 2-D
time-frequency representations Get spectrogram from the magnitude
Parameters: window size, window type, FFT size, hop size 20
- Slide 22
- Windowing Types of window functions Rectangular, Triangle,
Hann, Hamming, Blackman-Harris Trade-off between the width of
main-lobe and the level of side-lobe 21 Main-lobe width Side-lobe
level
- Slide 23
- Zero-padding Adding zeros to a windowed frame in time domain
Corresponds to ideal interpolation in frequency domain In practice,
FFT size increases by the size of zero-padding 22
- Slide 24
- Example: Music 23
- Slide 25
- Example: Deep Note 24
- Slide 26
- Time-Frequency Resolutions in STFT Trade-off between
time-resolution and frequency-resolution Long window: high
frequency-resolution / low time-resolution short window: low
frequency-resolution / high time-resolution 25
- Slide 27
- References JOS DSP Books Mathematics of DFT
https://ccrma.stanford.edu/~jos/mdft/
https://ccrma.stanford.edu/~jos/mdft/ Spectral Audio Signal
Processing https://ccrma.stanford.edu/~jos/sasp/
https://ccrma.stanford.edu/~jos/sasp/ The Scientist and Engineers
Guide to Digital Signal Processing
http://www.dspguide.com/pdfbook.htm (See chapter 8-12)
http://www.dspguide.com/pdfbook.htm 26