Post on 02-Jun-2018
8/11/2019 Vector Quantization lecture6new.pdf
1/37
Vector Quantization and Clustering
Introduction
K-means clustering
Clustering issues Hierarchical clustering
Divisive (top-down) clustering
Agglomerative (bottom-up) clustering
Applications to speech recognition
6.345 Automatic Speech Recognition Vector Quantization & Clustering 1
Lecture # 6Session 2003
8/11/2019 Vector Quantization lecture6new.pdf
2/37
Acoustic Modelling
Waveform
SignalRepresentation
Feature Vectors
VectorQuantization
Symbols
Signal representation produces feature vector sequence
Multi-dimensional sequence can be processed by: Methods that directly model continuous space
Quantizing and modelling of discrete symbols
Main advantages and disadvantages of quantization:
Reduced storage and computation costs
Potential loss of information due to quantization
6.345 Automatic Speech Recognition Vector Quantization & Clustering 2
8/11/2019 Vector Quantization lecture6new.pdf
3/37
Vector Quantization (VQ)
Used in signal compression, speech and image coding
More efficient information transmission than scalar quantization(can achieve less that 1 bit/parameter)
Used for discrete acoustic modelling since early 1980s
Based on standard clustering algorithms:
Individual cluster centroids are called codewords Set of cluster centroids is called a codebook
Basic VQ is K-means clustering
Binary VQ is a form of top-down clustering(used for efficient quantization)
6.345 Automatic Speech Recognition Vector Quantization & Clustering 3
8/11/2019 Vector Quantization lecture6new.pdf
4/37
VQ & Clustering
Clustering is an example of unsupervised learning
Number and form of classes {C i } unknown
Available data samples {x i } are unlabeled
Useful for discovery of data structure before classication ortuning or adaptation of classiers
Results strongly depend on the clustering algorithm
6.345 Automatic Speech Recognition Vector Quantization & Clustering 4
8/11/2019 Vector Quantization lecture6new.pdf
5/37
Acoustic Modelling Example
[I ]
[n ]
-
[t ]
6.345 Automatic Speech Recognition Vector Quantization & Clustering 5
8/11/2019 Vector Quantization lecture6new.pdf
6/37
8/11/2019 Vector Quantization lecture6new.pdf
7/37
K -Means Clustering
Used to group data into K clusters, {C1 , . . . , CK}
Each cluster is represented by mean of assigned data
Iterative algorithm converges to a local optimum: Select K initial cluster means, {1 , . . . , K}
Iterate until stopping criterion is satised:
1. Assign each data sample to the closest cluster
x C i , d(x , i ) d (x , j ) , i = j
2. Update K means from assigned samples
i = E (x ) , x C i , 1 i K
Nearest neighbor quantizer used for unseen data
6.345 Automatic Speech Recognition Vector Quantization & Clustering 7
8/11/2019 Vector Quantization lecture6new.pdf
8/37
K -Means Example: K = 3
Random selection of 3 data samples for initial means
Euclidean distance metric between means and samples
0 3
1 4
2 56.345 Automatic Speech Recognition Vector Quantization & Clustering 8
8/11/2019 Vector Quantization lecture6new.pdf
9/37
K -Means Properties
Usually used with a Euclidean distance metric
d (x , i ) = x i 2 = (x i )t (x i )
The total distortion , D , is the sum of squared error
D =K
i=1 x C i
x i 2
D decreases between n th and n + 1 st iteration
D (n + 1) D (n )
Also known as Isodata, or generalized Lloyd algorithm
Similarities with Expectation-Maximization (EM) algorithm forlearning parameters from unlabeled data
6.345 Automatic Speech Recognition Vector Quantization & Clustering 9
8/11/2019 Vector Quantization lecture6new.pdf
10/37
8/11/2019 Vector Quantization lecture6new.pdf
11/37
K -Means Clustering: Stopping Criterion
Many criterion can be used to terminate K-means :
No changes in sample assignments
Maximum number of iterations exceeded
Change in total distortion, D , falls below a threshold
1 D (n + 1)D (n )
< T
6.345 Automatic Speech Recognition Vector Quantization & Clustering 11
8/11/2019 Vector Quantization lecture6new.pdf
12/37
8/11/2019 Vector Quantization lecture6new.pdf
13/37
Clustering Issues: Number of Clusters
In general, the number of clusters is unknown
K = 2
K = 4
Dependent on clustering criterion, space, computation ordistortion requirements, or on recognition metric
6.345 Automatic Speech Recognition Vector Quantization & Clustering 14
8/11/2019 Vector Quantization lecture6new.pdf
14/37
Clustering Issues: Clustering Criterion
The criterion used to partition data into clusters plays a strong rolein determining the nal results
6.345 Automatic Speech Recognition Vector Quantization & Clustering 15
8/11/2019 Vector Quantization lecture6new.pdf
15/37
Clustering Issues: Distance Metrics
A distance metric usually has the properties:
1. 0 d (x ,y )
2. d (x ,y ) = 0 iff x = y
3. d (x ,y ) = d (y ,x )
4. d (x ,y ) d (x ,z ) + d (y ,z )
5. d (x + z ,y + z ) = d (x ,y ) (invariant)
In practice, distance metrics may not obey some of theseproperties but are a measure of dissimilarity
6.345 Automatic Speech Recognition Vector Quantization & Clustering 16
8/11/2019 Vector Quantization lecture6new.pdf
16/37
Clustering Issues: Distance Metrics
Distance metrics strongly inuence cluster shapes:
Normalized dot-product: x t y x y
Euclidean: x i 2 = (x i )t (x i )
Weighted Euclidean: (x i )t W (x i ) (e.g., W = 1 ) Minimum distance (chain): min d (x ,x i ) , x i C i
Representation specic . . .
6.345 Automatic Speech Recognition Vector Quantization & Clustering 17
8/11/2019 Vector Quantization lecture6new.pdf
17/37
Clustering Issues: Impact of Scaling
Scaling feature vector dimensions can signicantly impactclustering results
Scaling can be used to normalize dimensions so a simple distancemetric is a reasonable criterion for similarity
6.345 Automatic Speech Recognition Vector Quantization & Clustering 18
8/11/2019 Vector Quantization lecture6new.pdf
18/37
8/11/2019 Vector Quantization lecture6new.pdf
19/37
8/11/2019 Vector Quantization lecture6new.pdf
20/37
Hierarchical Clustering
Clusters data into a hierarchical class structure
Top-down (divisive) or bottom-up (agglomerative)
Often based on stepwise-optimal , or greedy , formulation Hierarchical structure useful for hypothesizing classes
Used to seed clustering algorithms such as K-means
6.345 Automatic Speech Recognition Vector Quantization & Clustering 22
8/11/2019 Vector Quantization lecture6new.pdf
21/37
8/11/2019 Vector Quantization lecture6new.pdf
22/37
Example of Non-Uniform Divisive Clustering
0
1
2
6.345 Automatic Speech Recognition Vector Quantization & Clustering 24
8/11/2019 Vector Quantization lecture6new.pdf
23/37
Example of Uniform Divisive Clustering
0
1
2
6.345 Automatic Speech Recognition Vector Quantization & Clustering 25
8/11/2019 Vector Quantization lecture6new.pdf
24/37
Divisive Clustering Issues
Initialization of new clusters
Random selection from cluster samples
Selection of member samples far from center
Perturb dimension of maximum variance
Perturb all dimensions slightly
Uniform or non-uniform tree structures Cluster pruning (due to poor expansion)
Cluster assignment (distance metric)
Stopping criterion
Rate of distortion decrease
Cannot increase cluster size6.345 Automatic Speech Recognition Vector Quantization & Clustering 26
8/11/2019 Vector Quantization lecture6new.pdf
25/37
Divisive Clustering Example: Binary VQ
Often used to create M = 2 B size codebook(B bit codebook, codebook size M )
Uniform binary divisive clustering used
On each iteration each cluster is divided in two
+i = i (1 + )
i = i (1 )
K-means used to determine cluster centroids Also known as LBG (Linde, Buzo, Gray) algorithm
A more efficient version does K-means only within each binary
split, and retains tree for efficient lookup6.345 Automatic Speech Recognition Vector Quantization & Clustering 27
8/11/2019 Vector Quantization lecture6new.pdf
26/37
Agglomerative Clustering
Structures N samples or seed clusters into a hierarchy
On each iteration, the two most similar clusters are mergedtogether to form a new cluster
After N 1 iterations, the hierarchy is complete
Structure displayed in the form of a dendrogram
By keeping track of the similarity score when new clusters arecreated, the dendrogram can often yield insights into the naturalgrouping of the data
6.345 Automatic Speech Recognition Vector Quantization & Clustering 28
8/11/2019 Vector Quantization lecture6new.pdf
27/37
8/11/2019 Vector Quantization lecture6new.pdf
28/37
Agglomerative Clustering Issues
Measuring distances between clusters Ci and C j with respectivenumber of tokens n i and n j
Average distance: 1
n i n j i,jd (x i ,x j )
Maximum distance (compact): maxi,j
d (x i ,x j )
Minimum distance (chain): mini,j d (x i ,x j )
Distance between two representative vectors of each clustersuch as their means: d ( i , j )
6.345 Automatic Speech Recognition Vector Quantization & Clustering 30
8/11/2019 Vector Quantization lecture6new.pdf
29/37
Stepwise-Optimal Clustering
Common to minimize increase in total distortion on each mergingiteration: stepwise-optimal or greedy
On each iteration, merge the two clusters which produce the
smallest increase in distortion Distance metric for minimizing distortion, D , is:
n i n jn i + n j i j
Tends to combine small clusters with large clusters before
merging clusters of similar sizes
6.345 Automatic Speech Recognition Vector Quantization & Clustering 31
8/11/2019 Vector Quantization lecture6new.pdf
30/37
Clustering for Segmentation
6.345 Automatic Speech Recognition Vector Quantization & Clustering 32
8/11/2019 Vector Quantization lecture6new.pdf
31/37
Speaker Clustering
23 female and 53 male speakers from TIMIT corpus
Vector based on F1 and F2 averages for 9 vowels
Distance d (C i , C j ) is average of distances between members
2
4
6
6.345 Automatic Speech Recognition Vector Quantization & Clustering 33
8/11/2019 Vector Quantization lecture6new.pdf
32/37
Velar Stop Allophones
6.345 Automatic Speech Recognition Vector Quantization & Clustering 34
8/11/2019 Vector Quantization lecture6new.pdf
33/37
8/11/2019 Vector Quantization lecture6new.pdf
34/37
Acoustic-Phonetic Hierarchy
Clustering of phonetic distributions across 12 clusters
6.345 Automatic Speech Recognition Vector Quantization & Clustering 36
8/11/2019 Vector Quantization lecture6new.pdf
35/37
Word Clustering
A A_ M
A F T E R N O O N
A I R C R A F T
A I R P L A N E
A M E R I C A N
A N
A N Y
A T L A N T A
A U G U S T A
V A I L A B L E
B A L T I M O R E
B E
B O O K
B O S T O N
C H E A P E S T
C I T Y
C L A S S
C O A C H
C O N T I N E N T A L
C O S T
D A L L A S
D A L L A S
_ F O R T_ W O R T H
D A Y
D E L T A
D E N V E R
D O L L A R S
D O W N T O W
N
E A R L I E S T
E A S T E R N
E C O N O M Y
E I G H T
E I G H T Y
E V E N I N G
F A R E
F A R E S
F I F T E E N
F I F T Y
F I N D
F I R S T_ C L A S S
F I V E F
L Y
F O R T Y
F O U R
F R I D A Y
G E
T
G I V E
G O
G R O U N D
H U N D R E D
I N F O R M A T I O N I T
J U L Y
K I N D
K N O W
L A T E S T L E
A S T
L O W E S T
M A K E
M A Y
M E A L
M E A L S
M O N D A Y
M O R N I N G
M O S T
N E E D
N I N E
N I N E T Y
N O N S T O P
N O V E M B E R
O + C
L O C K
O A K L A N D
O H
O N E
O N E
_ W A Y
P_ M
P H I L A D E L P H I A
P I T T S B U R G H
P L A N E
R E T U R N
R O U N D
_ T R I P
S A N_ F R A N C I S C O
S A T U R D A Y
S C E N A R I O
S E R V E D
S E R V I C E
S E V E N
S E V E N T Y
S I X
S I X T Y
S T O P O V E R
S U N D A Y
T A K E
T E L L
T H E R E
T H E S E
T H I R T Y
T H I S
T H O S E
T H R E E
T H U R S D A Y
T I C K E T
T I M E
T I M E S
T R A N S P O R T A T I O N
T R A V E L
T U E S D A Y
T W E N T Y
T W O
T Y P E U
_ S_ A
I R
U N I T E D
U S E D
W A N T
W A S H I N G T O N
W E D N E S D A Y
W I L L
W O U L D
Z E R O
N I L
0
2
4
6
6.345 Automatic Speech Recognition Vector Quantization & Clustering 37
8/11/2019 Vector Quantization lecture6new.pdf
36/37
f
8/11/2019 Vector Quantization lecture6new.pdf
37/37
References
Huang, Acero, and Hon, Spoken Language Processing ,Prentice-Hall, 2001.
Duda, Hart and Stork, Pattern Classication , John Wiley & Sons,
2001. A. Gersho and R. Gray, Vector Quantization and Signal
Compression , Kluwer Academic Press, 1992.
R. Gray, Vector Quantization, IEEE ASSP Magazine , 1(2), 1984. B. Juang, D. Wang, A. Gray, Distortion Performance of Vector
Quantization for LPC Voice Coding, IEEE Trans ASSP , 30(2), 1982.
J. Makhoul, S. Roucos, H. Gish, Vector Quantization in SpeechCoding, Proc. IEEE , 73(11), 1985.
L. Rabiner and B. Juang, Fundamentals of Speech Recognition ,Prentice-Hall, 1993.
6.345 Automatic Speech Recognition Vector Quantization & Clustering 39