CTAACS'12 Zemouri Chibani Brik

download CTAACS'12 Zemouri Chibani Brik

of 39

Transcript of CTAACS'12 Zemouri Chibani Brik

Presentacin de PowerPoint

ET-Tahir ZEMOURI , Youcef CHIBANI and Youcef BRIK{tzemouri ; ychibani; ybrik}@usthb.dz

Faculty of Electronic and Computer ScienceUniversity of Science and Technology Houari Boumediene Algiers, Algeria

Combined Binarization Approach for Historical Arabic Document Image

1st Conference on Theoretical and Applicative Aspects of Computer ScienceCTAACS'12 November 25-26, 2012

1Outline

IntroductionBinarizationProposed methodBinarizationExperimental resultsConclusion2ContextIn recent years, the document analysis andrecognition community has shown increasing interest in the processing of historical documents. These old documents often have historical and cultural significance and the aim is to scan them and create digital libraries.

The challenge is to create automatic searchengines that allow the users to find and retrieve only the documents with the relevant content from the entire collection.

3Binarization ?T. ZEMOURI Y. CHIBANI and Y.BRIK1CTAACS 2012

Gray-level imageT=60T=128T=200Notre tude se base particulirement sur le quadrotor4Related worksList of referencesOtsu, N.: A threshold selection method from gray level histogram. IEEE Transactions on System, Man, Cybernetics, vol. 9, no. 1, pp. 62-66 (1979)Ridler, T.W., Calvard, S.: Picture thresholding using an iterative selection method. IEEE Transactions on Systems. Man and Cybernetics, vol. 8, no. 8, pp. 630-632 (1978)Otsus method [1] calculate T in such a way as to minimize the variance between the two distributions.

Isodata [2] threshold by separating iteratively the gray-level histogram into two classes.1- divide the interval of non-null values into two equidistant parts2- take m1 and m2 as the arithmetic means of each class.3- Repeat until convergence, the calculation of the threshold T as the closest integer to (m1+m2)/2 and update the two means m1 and m2.Related worksList of referencesNiblack. W. An Introduction to Digital Image Processing. Englewood Cliffs. New Jersey: Prentice-Hall. 1986. Sauvola. J and Pietikainen. M. Adaptive document image binarization. Pattern Recognition. vol. 33. n. 2. pp. 225236. 2000.Khurshid. K. Siddiqi. I. Faure. C and Vincent. N. Comparison of Niblack inspired Binarization methods for ancient documents. 16th International conference on Document Recognition and Retrieval. vol. 7247. pp. 0U1-0U9. 2009.

Local thresholding (Sauvola)Binarized imageBinarized imageThe proposed methodFig. Histogram of the document image. thresholds extracted with Otsus method, Isodatas method and the average value of the pixels.Test and resultsDatasetsDegraded samples from the National Bibliotheca (BN Algiers). book 1842 - -116 Arabic printed pages

Fig. Representative sample of databaseTest and resultsEvaluation systemPreprocessingDatabasePage separationBinarisationDeskewBorder removalSegmentationTest and resultsEvaluation systemPreprocessingDatabasePage separationBinarisationDeskewBorder removalSegmentationFeature GenerationProjection profUpper profileLower profileNbr vertical pixel transition White/Black13

Fig. An original word image and features used in word image matching(a) Original word image(b) Vertical projection(c)Upper profile(d) Lower profileNumber of vertical transition of pixels white /blackTest and resultsEvaluation systemPreprocessingDatabasePage separationBinarisationDeskewBorder removalSegmentationFeature GenerationDTWProjection profUpper word profLower word profNbr vertical pixel transition White/BlackProjection profUpper word profLower word profNbr vertical pixel transition White/Black

where and 15

(a) Original image

Fig. Binarization results of document image(b) Otsu(c) Isodata(d) Niblack(e) Sauvola(f) NICK(g) Proposed

Table. The Evaluation MeasuresObjectiveDiscrimination between the machine printed and handwritten textResultsEncouraging results by combining Radon energy and statistical features using SVM classifiers with the RBF kernelFuture worksDistinguish machine printed/handwritten with Arabic and Latin texts202335

Thank you