Automatic Ground Truth Generation of Camera Captured Documents Using Document Image Retrieval
description
Transcript of Automatic Ground Truth Generation of Camera Captured Documents Using Document Image Retrieval
Automatic Ground Truth Generation of Camera Captured Documents UsingDocument Image Retrieval
Automatic Ground Truth Generation of Camera Captured Documents Using Document Image RetrievalSheraz Ahmed, Koichi Kise, Masakazu Iwamura, Marcus Liwicki, and Andreas Dengel
Problem to be tackledOCR for camera-captured documents
Convenient Useful Poor OCR performanceOCR resultsOCR response for camera-captured wordsCamera-captured wordsGroundTruthTesseractGOCROCRopusotherwiseutharvdlulee=erecognisesT,-ee=Legislative\LR iild1K4A
Percentpauznx_______e=constructionummuciwwns ione=w==s
Suffer from blur, perspective distortion, illumination change and so onQuantity improves qualityA large quantity of data improves quality of recognition
DatasetRecognition rateLarge-scale datasets are demandedDataset sizeDatasetWider variety of fonts and distortionsExisting datasets on camera-captured textDocumentIUPR DatasetWord-level groundtruth is unavailable100 pagesSceneStreet View House Numbers630,000 numerals
NEOCR5,238 wordsChars74k74,107 characters
Not usable for OCR training
Limitation to use existing datasetsOnly numeralsToo small
Different tendencies from text in document imagesPurposeTo develop a method to easily create a large datasetDatasetSuccessfully groundtruthed one million word images with 99.98% accuracy!A way to create a dataset
Captured imageCropped word image
ProblematicThis is NationalGroundtruthing
Groundtruthing is problematicAutomatic groundtruthing is not reliableManual groundtruthing is laborious and costlyReliable automatic groundtruthing
GOALIdeaUse text information embedded in PDF files
Printed document
PDF file
Captured document image
CaptureGroundtruthing
Text info.IdeaUse text information embedded in PDF files
Printed document
PDF file
Captured document image
CaptureGroundtruthing
Text info.IdeaUse text information embedded in PDF filesHow do we fit the text information into the captured document image?
Printed document
PDF file
Captured document image
CaptureGroundtruthing
Text info.Fitting text information into captured document imageFor scanned document imageSimilarity transformation [Beusekom, DAS2008]
For camera-captured document imagePerspective transformationAffine transformation (approximately)Not applicable to camera-captured caseNo method existsLocally Likely Arrangement Hashing (LLAH)Find the region corresponding to the captured one from 20M pages in real time
Captured image (Query)Search resultDB: 20M pagesTime49ms/queryAccuracy 99.2%Pose is estimated simulateneouslyCorresponding pageCorresponding region
Proposed procedure (1):Document level matchingCaptured image (Query)DB
Digital doc. imagesFeaturesBased on LLAH
Proposed procedure (2):Part level processing
Cropped retrieved imageTransformed captured image
Overlapped imageThis is not the end of the proceedure
Displacement of textProposed procedure (3):Word level processingCropped Retrieved ImageTransformed Captured ImageOverlapped Bounding BoxesFind the closest bounding boxes and select perfectly aligned ones onlyDataset creationDocument images were captured
Dataset creationDocument images were capturedWith a few different camerasDocuments include proceedings, books, magazines and articlesWord and character image were automatically groundtruthedObtained degraded word images
Obtained character imagesEvaluation50,000 word images were randomly selected from one million imagesManual counting revealed that the accuracy was 99.98%The errors were caused by mainly wrong alignment of bounding boxesContributionA fully automatic groundtruthing method for word and character images in camera-captured documents is proposedOne million word images were groundtruthedAccuracy: 99.98%
Amazingly high for a fully automated methodAutomatic Ground Truth Generation of Camera Captured Documents Using Document Image RetrievalSheraz Ahmed, Koichi Kise, Masakazu Iwamura, Marcus Liwicki, and Andreas Dengel
Workaround of groundtruthingSynthetic approach with degradation models [Ishida, ICDAR2005] [Tsuji, KJPR2008]
Questionable to say this represents real degradation
DegradationWords at border
Partially missingWords at border
Can increase confusion between characters: Marked with special flag