Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute...

22
Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo, JAPAN 101-8430 [email protected] Shin’ichi Satoh National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo, JAPAN 101-8430 [email protected] Student: Tu , Chien-Hsun 69821059 LIU, Yuan- Ming 69821039

Transcript of Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute...

Page 1: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on

Duy-Dinh Le National Institute of Informatics2-1-2 Hitotsubashi, Chiyoda-ku

Tokyo, JAPAN [email protected]

Shin’ichi SatohNational Institute of Informatics2-1-2 Hitotsubashi, Chiyoda-ku

Tokyo, JAPAN [email protected]

Student: Tu , Chien-Hsun 69821059 LIU, Yuan-Ming 69821039

Page 2: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

OutlineIntroductionProposed Framework -Face Processing

-Ranking by Local Density Score -Ranking by Bagging of SVM ClassifierExperimental ResultsConclusion

Page 3: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

IntroductionLarge image and video databases have

become more available than ever to users.

This trend has shown the need for effective and efficient tools for indexing and retrieving based on visual content.

Page 4: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

IntroductionImprove the retrieval performance is to take

into account visual information present in the retrieved faces.

Challenge - Facial appearance due to pose changes,

illumination ,facial expressions make face recognition difficult.

- No labels makes supervised and unsupervised learning methods inapplicable.

Page 5: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

System Framework

Page 6: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Proposed Framework- Face Processing We perform a ranking process and learning of

person X’s model as follows:

Step 1: Detect faces and eye positions, and then perform face normalizations.

Page 7: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Proposed Framework

Step 2: Compute an eigenface space and project the input faces into this subspace.

Step 3: Estimate the ranked list of these faces using Rank-By-Local-Density-Score.

Step 4: Improve this ranked list using Rank-By-Bagging-ProbSVM.

Page 8: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Ranking by Local Density ScoreAmong the faces retrieved by text-based

search engines for a query of person-X, relevant faces usually look similar and form the largest cluster.

Page 9: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Ranking by Local Density Score

One approach of re-ranking these faces is to cluster based on visual similarity.

Problem

Ideal clustering results is impossible since these faces are high dimensional data and the clusters are in different shapes, sizes, and densities.

Page 10: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Ranking by Local Density ScoreWe use the idea of density-based clustering

described by to solve this problem.

We define the local density score (LDS) of a point p (i.e. a face) as the average distance to its k-nearest neighbors.

Page 11: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Ranking by Local Density Score

We do not directly use the Euclidean distance between two points in this feature space for distance(p, q).

Page 12: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Ranking by Local Density ScoreA high value of LDS(p, k) indicates a strong

association between p and its neighbors. Therefore, we can use this local density score to rank faces.

Page 13: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Ranking by Bagging of SVM Classifiers

ProblemOne limitation of the local density score

based ranking is it cannot handle faces of another person strongly associated in the k-neighbor set (for example, many duplicates).

The main idea is to use a probabilistic model to measure the relevancy of a face to person-X, P(person−X|face).

Page 14: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Ranking by Bagging of SVM ClassifiersImproving an input rank list by combining weak classifiers

trained from subsets annotated by that rank list.

We set p=20%

: the maximum Kendall tau distance. (set=0.05)

Page 15: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Ranking by Bagging of SVM Classifiers

The iterations significantly improve the final ranked list.

Page 16: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Experimental Results

Page 17: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Experimental ResultsWe performed a comparison between our

proposed method with other existing approaches.

Text Based Baseline (TBL)Distance-Based Outlier (DBO)Densest Sub-Graph based Method (DSG)Local Density Score (LDS)Unsupervised Ensemble Learning Using

Local Density Score (UEL-LDS)Supervised Learning (SVM-SUP)

Page 18: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Experimental Results

Performance comparison of methods.

Page 19: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Experimental ResultsDistribution of retrieved faces and relevant faces of 16 individuals used in experiments.

Page 20: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Experimental Results- Evaluation Criteria

Nret be the total number of faces returned,Nrel the number of relevant facesNhit the total number of relevant faces

Page 21: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Experimental Results

Page 22: Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics 2-1-2 Hitotsubashi, Chiyoda-ku Tokyo,

Conclusion

Our approach works fairly well for well known people,

where the main assumption that text-based search engines return a large fraction of relevant images is satisfied.

The aim of our future work is to study how to improve the quality of the training sets used in this iteration

(bagging SVM classifiers).