Multilingual Sentiment Analysis Using Latent Semantic Indexing and ...
Detection and Extraction of Artificial Text for Semantic Indexing
-
Upload
richard-rivers -
Category
Documents
-
view
27 -
download
5
description
Transcript of Detection and Extraction of Artificial Text for Semantic Indexing
![Page 1: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/1.jpg)
1/25
Detection and Extraction of Artificial Text for Semantic Indexing
Laboratoire Reconnaissance de Formes et VisionBât. Jules Verne, INSA de Lyon
69621 Villeurbanne cedex, France
January 9th 2002Dagstuhl Seminar on Content-Based Image and Video Retrieval
Christian Wolf and Jean-Michel Jolion
http://rfv.insa-lyon.fr/~wolf/presentations
This presentation can be downloaded from:
![Page 2: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/2.jpg)
2/25
Plan of the presentationIntroductionDetection and trackingEnhancement and binarization of the text
boxesExperiments and resultsOpen problemsConclusion and Outlook
634
291
25
Slides:
This work resulted in a patent submitted by France Télécom on May 23th, 2001 under the reference FR 01 06776.
Enh/Binarization Exp.Results Open problems ConclusionIntroduction Detection
![Page 3: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/3.jpg)
3/25
Content based image retrieval
SimilarityFunction
ResultExample image
Indexing phase
Enh/Binarization Exp.Results Open problems ConclusionDetectionIntroduction
![Page 4: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/4.jpg)
4/25
Similarity measures
similar
similar
Not similar
Enh/Binarization Exp.Results Open problems ConclusionDetectionIntroduction
![Page 5: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/5.jpg)
5/25
Indexing using Text
Keyword basedSearch
Patrick Mayhew
Patrick MayhewMin. chargé de l´irlande de NordISRAELJerusalemmontageT.Nouel...............
ResultKey word
Indexing phase
Enh/Binarization Exp.Results Open problems ConclusionDetectionIntroduction
![Page 6: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/6.jpg)
6/25
Video properties
80 px
12 px 8 px
Enh/Binarization Exp.Results Open problems ConclusionDetectionIntroduction
![Page 7: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/7.jpg)
7/25
Text extraction: general scheme
TrackingDetection of the text in single frames
Image enhancement - Multiple frame integration
Segmentation/Binarisation
OCR
"EVENEMENT""ACTU""SPELEOS""Gouffre Berger (Isére)""aujourd'hui""France 3 Alpes""un spéléologue sauveteur"
Video
Enh/Binarization Exp.Results Open problems ConclusionIntroduction Detection
![Page 8: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/8.jpg)
8/25Text detection by accumulation of horizontal gradients (LeBourgeois, 1997).
Justification: Text forms a regular texture containing vertical edges which are aligned horizontally.
Post processing by mathematical morphology.
Enh/Binarization Exp.Results Open problems ConclusionIntroduction Detection
![Page 9: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/9.jpg)
9/25
Detection in video sequences
Detection per single frame
List of rectanglesper frame
Tracking -keeping track of text occurrences
Suppression offalse alarms
Image Enhancement -Multiple frame integration
Text occurrences
Frame nr.(time)
Enh/Binarization Exp.Results Open problems ConclusionIntroduction Detection
![Page 10: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/10.jpg)
10/25
Image enhancementSuper-resolution(interpolation)
Multiple frame integration:Averaging
Integration of multiple frames to create a single image of higher quality.
M1
M4
M2
M3
An additional weight is included into the interpolation scheme, which decreases the weights of temporal outlier pixels.
Exp.Results Open problems ConclusionIntroduction Detection Enh/Binarization
![Page 11: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/11.jpg)
11/25
Binarization
))1.(1.( Rs
kmT
skmT .
)(: max FL CCaCI
)()1( MmRs
aaMmaT
Niblack:
Sauvola et al.:
m mean of the windows standard deviation of the
windowk parameterR dynamics of the gray
values of the image
s
ImCL
Contrast in the center of the image
s
MmC
max
The maximum local contrast
RMm
CF
The contrast of the window
M minimum gray value of the image
Exp.Results Open problems ConclusionIntroduction Detection Enh/Binarization
![Page 12: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/12.jpg)
12/25
Binarization methods: examples
Original image
Fisher
Fisher (windowed)
Yanowitz B.
Niblack
Sauvola et al.
Our method
Exp.Results Open problems ConclusionIntroduction Detection Enh/Binarization
![Page 13: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/13.jpg)
13/25
Binarization using a priori knowledgeBayesian MAP estimation using prior knowledge on the spatial relationships in the image, modeled as a Markov random field.
Exp.Results Open problems ConclusionIntroduction Detection Enh/Binarization
(In collaboration with David Doermann from the Language and Media Processing Laboratory of the University of Maryland)
![Page 14: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/14.jpg)
14/255 different MPEG 1 videos of resolution 384x288.
62 minutes93000 frames413 text appearances
Enh/Binarization Open problems ConclusionIntroduction Detection Exp.Results
![Page 15: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/15.jpg)
15/25
Detection and OCR results
DETECTION %Pred. Text 301 93,5Pred. Non-Text 21Total 322
Positives 350False alarms 947Logos 75Scene text 72Pos+Log+Scene 497 34,4Total 1444
Detection results
Input Bin. method Recall Precision CostAIM2 Niblack 67,4 87,5 499
Sauvola R=128 53,8 87,6 616,5R=ad 75,0 87,8 384,5R=ad, shift 78,4 90,4 344,5
AIM3 Niblack 92,5 78,1 196Sauvola R=128 69,9 89,6 206R=ad 85,3 92,5 110R=ad, shift 96,2 95,3 51,00
AIM4 Niblack 78,5 92,0 252,00Sauvola R=128 48,6 87,7 490,50R=ad 69,8 84,8 360,50R=ad, shift 80,1 90,4 211,50
AIM5 Niblack 62,1 71,4 501,50Sauvola R=128 66,7 89,3 324,50R=ad 64,8 90,1 328,00R=ad, shift 69,0 91,0 294,50
Total Niblack 73,1 82,6 1448,5Sauvola R=128 58,4 88,5 1637,5R=ad 73,0 88,4 1183R=ad, shift 79,6 91,5 901,5
OCR Results, classified by binarization method
Enh/Binarization Open problems ConclusionIntroduction Detection Exp.Results
True pos.
False pos.
True neg.
False neg.
![Page 16: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/16.jpg)
16/25
Open questions Scene text (general orientations, deformations) Moving text
Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems
![Page 17: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/17.jpg)
17/25
What is scene text?
Video frames
Frames containingscene text
We do not have enough information about the importance of text in the destination domain. How many frames do contain text and scene text?
Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems
Frames containingartificial text
![Page 18: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/18.jpg)
18/25
Detection:From artificial text to scene text
Several constraints have to be removed passing from artificial text to scene text:
The constraints on temporal stability need to be abandoned or at least softened (no initial frame integration)
Text can be aligned in all orientations (Creation of an oriented feature in multiple directions, similar to invariant features)
Contrast is possibly lower because scene text is not designed to be read easily (Is detection of unreadable text necessary?).
Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems
![Page 19: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/19.jpg)
19/25
Text models
Simple Modelssets of edges or vertical strokes...
Complex Modelstemplates, probabilistic models (MRF)...
+Generalize well, respond to many kinds of text
- Many false alarms
+Powerful less false alarms
- Do not generalize well
Assumptions are necessary (on the font, size, style, contrast, color, length, etc.) but not sufficient.
Main problem: Distinction between characters and structures similar to text according to the chosen model.
Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems
![Page 20: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/20.jpg)
20/25
Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems
Sven Dickinson: evolution of models
![Page 21: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/21.jpg)
21/25
What is text?Whatever model we choose, we cannot detect/recognize all kinds of text without solving the general image understanding problem. The best thing we can do is to include richer features into the detection process: a composite model for text.
Structural analysis (e.g. detection and recognition of characters by strokes). Very hard and very unlikely to work in the case of noisy images, low resolutions and difficult fonts.
Statistical modeling of text features (e.g. by learning techniques). Problem: For a robust detection high neighborhood sizes are needed, which lead to combinatorial explosions.
E.g.: Texture based methods for small text and segmentation + perceptual grouping, structural methods for big text.
Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems
![Page 22: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/22.jpg)
22/25
Learning techniques: pro et contra
Bibliography:
Learning directly the gray levels of the input image (Jung 2001)
Learning features, i.e. coefficients of the Haar wavelet (Li and Doermann 2000) or edge strength (Lienhart 2000)
+ Learning is an easy way to handle the complexity of text.
- Text can appear in videos in many different fonts, sizes, styles, colors, orientations etc. Learning all different forms is maybe not feasible.
Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems
![Page 23: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/23.jpg)
23/25
Color processing for detection?
Original image Sobel on grayscale image Sobel on L*u*v* image
101
202
101
1
2
1
),( 0,10,1 xxeuclid IID
Saturating distance or non saturating distance?Reflection processing?
101
202
101
Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems
![Page 24: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/24.jpg)
24/25
Tracking of moving scene textDo we detect the text in single frames (like artificial text), or do we treat the flow in its integrality?
Single frames: Multiple frame integration of moving text needs robust registration of the text boxes in different frames (e.g. rough segmentation into text and background pixels before the registration of the text pixels only) . Robust methods, which are able to track objects in clutter, are needed.
Detection of moving objects, e.g. by optical flow, spatio-temporal methods.
Mosaicing techniques can be employed for image enhancement.
Enh/Binarization Exp.Results ConclusionIntroduction Detection Open problems
![Page 25: Detection and Extraction of Artificial Text for Semantic Indexing](https://reader036.fdocuments.us/reader036/viewer/2022062517/56813273550346895d990ca9/html5/thumbnails/25.jpg)
25/25
Conclusion and Outlook We developed a system for detection, tracking,
enhancement and binarization of artificial text in videos.
The total recognition rate for artificial text is surprisingly high, given the quality of the text, but not yet good enough for indexing purposes.
The remaining problems in text extraction seem to be typical for applications in visual information management: We went as far as we could with low level features. We can’t do the necessary step to semantic information. What is text? Possible definition: text is, what (a human or an OCR) can recognize as text.
We have to include as much a priori knowledge as possible into the process.
Enh/Binarization Exp.Results Open problems ConclusionIntroduction Detection