1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei...
-
Upload
maud-adams -
Category
Documents
-
view
213 -
download
0
Transcript of 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei...
![Page 1: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/1.jpg)
1
A Coarse Classification Scheme Based On Clustering
and Distance Thresholds
Presented byTienwei Tsai
July, 2005
![Page 2: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/2.jpg)
2
Outline
Introduction Feature Extraction Statistical Coarse Classification Experimental Results Conclusion
![Page 3: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/3.jpg)
3
1 Introduction
Paper documents -> Computer codes
OCR(Optical Character Recognition)
![Page 4: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/4.jpg)
4
Two subproblems in the design of classification systems Feature extraction Classification
The classification process is usually divided into two stages: Coarse classification process Fine classification process
![Page 5: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/5.jpg)
5
The purpose of this paper is to design a general coarse classification scheme: low dependent on domain-specific
knowledge. To achieve this goal, we need:
reliable and general features general classification method
![Page 6: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/6.jpg)
6
2 Feature Extraction The characteristics of features
used for coarse classification: Simple Reliable Low dimension
We are motivated to apply Discrete Cosine Transform (DCT) to extract statistical features.
![Page 7: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/7.jpg)
7
Discrete Cosine Transform (DCT)
It is widely used in compression A close relative of the discrete Fourier
transform (DFT) Converting a signal into elementary
frequency components Separating the image into parts of
differing importance (with respect to the image's visual quality).
Energy compaction property
![Page 8: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/8.jpg)
8
The DCT coefficients of the character image of “ 佛” .
![Page 9: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/9.jpg)
9
Energy Compacting Property of DCT
For most images, much of the signal energy lies at low frequencies.
About 90% signal energy appears in the upper left corner of size 10×10 for the DCT coefficients of size 48×48.
That is, about 90% signal energy appears in 4.34% DCT coefficients.
![Page 10: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/10.jpg)
10
Discrete Cosine Transform (DCT)
The DCT coefficients C(u, v) of an N×N image represented by x(i, j) can be defined as
where
1
0
1
0
),()()(2
),(N
i
N
j
jixvuN
vuC ),2
)12(cos()
2
)12(cos(
N
vj
N
ui
.1
,021
)(otherwise
wforw
![Page 11: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/11.jpg)
11
3 Statistical Coarse Classification
Two philosophies of classification Statistical
the measurements that describe an object are treated only formally as statistical variables, neglecting their “meaning
Structural regards objects as compositions of
structural units, usually called primitives.
![Page 12: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/12.jpg)
12
The Classification Problem
The ultimate goal of classification is to classify an unknown pattern x to one of M possible classes (c1, c2,…, cM).
In statistical approach, each pattern is represented by a set of D features, viewed as a D-dimensional feature vector.
The statistical classification system is operated in two modes: Training (learning) Classification (testing)
![Page 13: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/13.jpg)
13
Techniques to speed up the classification process
To speed up the classification process, two techniques can be applied: A. Clustering B. pruning
![Page 14: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/14.jpg)
14
A. Clustering
Clustering is the process of grouping the data objects into clusters so that objects within a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters.
Two main categories of existing clustering algorithms:
hierarchical methods: agglomerative (bottom-up) divisive (top-down).
partitioning methods: k-means k-medoids
![Page 15: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/15.jpg)
15
Coarse Classification via Clustering
In the training mode: the feature vectors of learning samples are
clustered first based on a certain criterion. In the classification mode:
the distances between a test sample and every cluster is calculated, and
the clusters that are nearest to the test sample are chosen as candidate clusters.
Then the classes within those candidate clusters are selected as the candidates of the test sample.
![Page 16: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/16.jpg)
16
B. Pruning
In the training mode: The template vector of a class is generated
from the training samples of the class. In the classification mode:
When a test sample is being classified, all the distances between the feature vector of the test sample and every template vector are calculated.
Then the templates whose distance to the test sample is beyond a predefined threshold will be pruned.
Therefore, for the test sample to be classified, the size of the candidate list is reduced.
![Page 17: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/17.jpg)
17
Coarse classification scheme
Our coarse classification is accomplished by four main modules: Pre-processing Feature extraction Clustering Pruning
![Page 18: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/18.jpg)
18
In the training mode, the training samples are first normalized to a certain size.
Then the most significant D features of each training sample are extracted by DCT.
Ex. The most important 4 features of a pattern are the 4 DCT coefficients with the lowest frequencies, namely C(0,0), C(0,1), C(1,0) and C(1,1).
Then the average D features are obtained for each class.
![Page 19: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/19.jpg)
19
k-means algorithm
Chooses k templates arbitrarily as the initial centers of the k clusters.
The remaining (M-k) templates are assigned to their nearest clusters one by one, according to a certain distance measure.
Each cluster center is updated progressively from the average of the total patterns within the cluster and can be viewed as a virtual reference pattern.
![Page 20: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/20.jpg)
20
Distance Measure
Suppose Pi = [pi1, pi2,…, piD] represents the feature vector of pattern Pi.
Then the dissimilarity (distance) between patterns Pi and Pj is defined as
D
djdidji ppPPd
1
2),(
![Page 21: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/21.jpg)
21
In the classification mode, for an input pattern to be recognized, the pattern is first normalized to a certain size.
Then the features of the pattern are extracted by DCT.
After that the pattern is matched against the center of each cluster to obtain the distance to each cluster.
Finally, the clusters which do not satisfy some distance-based thresholds will be excluded from the set of candidate clusters, and the members within the remaining clusters will be served as the candidates for the input pattern.
![Page 22: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/22.jpg)
22
3.2.1. Cluster pruning rules
To observe the rate of distance per maximum distance, the relative distance dm(x, Ci) is defined as
where d(x,Ci) is the distance between the test pattern x and each cluster Ci .
denotes the maximum distance between x and any cluster.
),(
),(),(
kC
iim CxdMax
CxdCxd
k
),( kC CxdMaxk
![Page 23: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/23.jpg)
23
We also define the relative distanceda(x, Ci) as
where denotes the average distance over all clusters.
),(
),(),(
kC
iia CxdAverage
CxdCxd
k
),( kC CxdAveragek
![Page 24: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/24.jpg)
24
Pruning Rules
R1 : Pruning via absolute distance threshold d the distance d(x,Ci) is compared with a pre-
determined distance threshold d. If d(x,Ci) is larger than d, then cluster Ci is
excluded from the candidate cluster set of x. R2 : Pruning via cluster rank
threshold r to filter out the farthest clusters by eliminating
the clusters whose ranks are larger than r.
![Page 25: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/25.jpg)
25
R3 : Pruning via relative distance threshold m If dm(x,Ci) is larger than m, then cluster Ci
is excluded from the candidate cluster set of x.
R4 : Pruning via relative distance threshold a If da(x,Ci) is larger than a, then cluster Ci is
excluded from the candidate cluster set of x.
![Page 26: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/26.jpg)
26
Each threshold value is obtained from the statistics of the training samples in the training mode, based on a certain accuracy requirement.
Rule R1 and R2 are obtained straightforwardly, which cannot measure the similarity between x and each cluster precisely.
Rule R3 and R4 take advantage of the discriminating ability of the relative distances.
For instance, ‘da(x,Ci)=0.5’ means the distance between x and cluster Ci is about half those distances between x and other clusters.
![Page 27: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/27.jpg)
27
4 Experimental Results 6000 samples (about 500 categories) are
extracted from Kin-Guan ( 金剛 ) bible. Each character image was transformed into a
48×48 bitmap. 5000 of the 6000 samples are used for training and
the others are used for testing. In our experiment
The number of clusters used in k-means clustering is set to 10.
The number of DCT coefficients extracted from each sample is set to 9, i.e., C(i, j), where i,j=0, 1, 2.
![Page 28: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/28.jpg)
28
• High reduction rate always results in low accuracy rate
![Page 29: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/29.jpg)
29
• Rules R3 performs better than the other rules.
![Page 30: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/30.jpg)
30
6 Conclusions This paper presents a coarse
classification scheme based on DCT and k-means clustering.
The advantages of our approach: The DCT features are simple, reliable
and appropriate for coarse classification.
The proposed coarse classification scheme is a general approach to most of the vision-oriented applications.
![Page 31: 1 A Coarse Classification Scheme Based On Clustering and Distance Thresholds Presented by Tienwei Tsai July, 2005.](https://reader034.fdocuments.us/reader034/viewer/2022051820/56649e735503460f94b72da3/html5/thumbnails/31.jpg)
31
Future works Future works include the application of
another well-known vision-oriented feature extraction method: wavelet transform.
Since features of different types complement one another in classification performance, by using features of different types simultaneously, classification accuracy could be further improved.