12

7
On the Design and Performance of ELC and APB Algorithms for the Reconstruction of Shredded Documents 1 R.Lotus, 2 Justin Varghese 1 Centre for Information Technology & Engineering, Manonmaniam Sundaranar University, Tirunelveli, Tamil Nadu, India 2 College of Computer Science, King Khalid University, Abha, Saudi Arabia AbstractReconstruction of hand torn paper documents is a challenging task in forensic and investigation sciences. In this paper, the design aspects of the important reconstruction algorithms proposed by Edson Justino, Luiz S. Oliveria, Cinthia Freitas (ELC) and Arindam Biswas, Partha Bhowmick, Bhargab B. Battacharya (APB) for hand torn documents are analysed, their reconstructed results are compared and the merits of the algorithms are understood. I. INTRODUCTION Documents store, organize and elucidate information for education, enlightenment, and enrichment of civilization. Reconstruction of torn documents is essential to import information which has wide application in forensic sciences, art conservation, and archaeology [10]. Documents get deteriorated due to insects, moisture, temperature, humidity, constant handling, obliteration and shredding. Shredding can be performed by machine or by hand. Manual reconstruction of shredded document is a time consuming job, needs hard work of experienced personals. Digitization makes the job easier. Automation of reconstruction through image processing algorithms yields effective solution. Wolfson [1] proposed an efficient two curve matching algorithm for puzzle solving. Boundaries are represented by shape feature strings obtained by polygonal approximation. Kong and Kimia [2] resampled the boundaries using polygonal approximation to reduce the complexity in curve matching, and used dynamic programming to align fragments. Recently fields like art conservation, archaeology have adopted jigsaw puzzle solving techniques to reconstruct wall paintings of ancient buildings [12][6], pottery fragments [11], [7]. Justino et al. [5] proposed algorithm to reconstruct hand shredded paper documents. An extracted feature of simplified polygon determines the matching pieces. A.Biswas et al. [8] proposed a method using chain code of contours of fragmented pieces and its Minkowski sum for reconstruction. A. Pimenta et al. [9] proposed algorithm to reconstruct hand shredded paper documents where extracted features of simplified polygon are fed into longest common subsequence (LCS) dynamic programming algorithm. The scores of LCS are used in the modified Prim’s algorithm to determine the matching fragments. L. Zhu et al. [14] proposed a global approach The proposed work analyses the design aspects of important hand torn document reconstruction algorithm

Transcript of 12

Page 1: 12

On the Design and Performance of ELC and APB Algorithms for the Reconstruction of Shredded

Documents1R.Lotus, 2Justin Varghese

1Centre for Information Technology & Engineering, Manonmaniam Sundaranar University, Tirunelveli, Tamil Nadu, India2College of Computer Science, King Khalid University, Abha, Saudi Arabia

Abstract—Reconstruction of hand torn paper documents is a

challenging task in forensic and investigation sciences. In this paper, the design aspects of the important reconstruction algorithms proposed by Edson Justino, Luiz S. Oliveria, Cinthia Freitas (ELC) and Arindam Biswas, Partha Bhowmick, Bhargab B. Battacharya (APB) for hand torn documents are analysed, their reconstructed results are compared and the merits of the algorithms are understood.

I. INTRODUCTION

Documents store, organize and elucidate information for education, enlightenment, and enrichment of civilization. Reconstruction of torn documents is essential to import information which has wide application in forensic sciences, art conservation, and archaeology [10]. Documents get deteriorated due to insects, moisture, temperature, humidity, constant handling, obliteration and shredding. Shredding can be performed by machine or by hand. Manual reconstruction of shredded document is a time consuming job, needs hard work of experienced personals. Digitization makes the job easier. Automation of reconstruction through image processing algorithms yields effective solution. Wolfson [1] proposed an efficient two curve matching algorithm for puzzle solving. Boundaries are represented by shape feature strings obtained by polygonal approximation. Kong and Kimia [2] resampled the boundaries using polygonal approximation to reduce the complexity in curve matching, and used dynamic programming to align fragments. Recently fields like art conservation, archaeology have adopted jigsaw puzzle solving techniques to reconstruct wall paintings of ancient buildings [12][6], pottery fragments [11], [7]. Justino et al. [5] proposed algorithm to reconstruct hand shredded paper documents. An extracted feature of simplified polygon determines the matching pieces. A.Biswas et al. [8] proposed a method using chain code of contours of fragmented pieces and its Minkowski sum for reconstruction. A. Pimenta et al. [9] proposed algorithm to reconstruct hand shredded paper documents where extracted features of simplified polygon are fed into longest common subsequence (LCS) dynamic programming algorithm. The scores of LCS are used in the modified Prim’s algorithm to determine the matching fragments. L. Zhu et al. [14] proposed a global approach

The proposed work analyses the design aspects of important hand torn document reconstruction algorithm

proposed by Edson Justino, Luiz S. Oliveria and Cinthia Freitas [5] (ELC algorithm) and Arindam Biswas, Partha Bhowmick and Bhargab B. Battacharya [8] (APB algorithm), compares their reconstructed results of the algorithm and depicts the merits of these two algorithms.

This paper organised into five Sections. Section II explains the general methodology for the reconstruction of hand torn document. Section III narrates the design aspects of ELC and APB reconstruction algorithms. Experimental results and comparative analysis are provided in Section IV. Section V concludes the paper.

II GENERAL METHODOLOGY FOR THE RECONSTRUCTION OF SHREDDED DOCUMENTS

A general methodology for the reconstruction of shredded documents is illustrated in Fig.1. It consists of pre-processing, feature extraction and reconstruction stages. Pre-processing the scanned images of torn fragments is essential to extract adequate and effective features. Pre-processing are done like contour tracing [8], contour simplification [5]. Features like distance, angle, and colours of vertex to be extracted from pre-processed fragments’ images are based on the reconstruction principles of different algorithms. Fragments satisfying matching criteria are merged to form reconstructed original document.

Fig.1.General methodology for document reconstruction

Scanned images of input fragments

Pre-processing

Merge the two fragments and form a new fragment

Reconstructed document

Feature extraction

Matching criteria

Yes

No

Page 2: 12

III. ELC ALGORITHM

The ELC algorithm is proposed by Edson Justino, Luiz S. Oliveria and Cinthia Freita for the reconstruction of hand shredded documents. The various stages of the algorithm are explained through the following steps:

Step:1-Pre-processing. Scanned images,

of hand

shredded document fragments have irregularities in boundaries. Douglas Peucker (DP) polyline simplification algorithm [3] is implemented on fragments’ contours,

to get well defined, simplified polyline boundary with

reduced irregularities. Contours of scanned images

are

passed through polyline simplification process for reducing he number of vertices in a fragment’s contour and to produce a simplified polygon which approximates the original contour shape. DP uses closeness of contour vertices to the edge of polygon. The polyline simplification starts with initial edge segment, guesses

between the initial vertex, and last vertex,

, intermediate vertices are checked for closeness

to that edge segment. The contour vertices far away from the initial edge segment, whose distance from initial edge segment exceeds the specified tolerance (distance), form simplified polygon edges. The vertices lesser than specified tolerance are discarded. The process produces the simplified edge segment. The procedure is repeated until all the contour vertices fall within specified tolerance of simplification. Finally the chosen vertices form a polygon

which approximates the original contour shape.

Step:2-Feature extraction. The vertices of the pre-

processed and simplified polygon, is subjected to the

Feature Extraction process. For each vertex, of

simplified polygon, where

feature

extraction process computes features like Euclidean

distance, of the vertex with the previous vertex

and Euclidean distance of the vertex with the

next vertex and angle with respect to previous and next

distances, such that,

(1)

(2)

(3)

where, are the co-ordinates of the current vertex,

are the co-ordinates of previous vertex and

are the co-ordinates of the next vertex. Thus for every vertex,

feature list contains vertex coordinate

position, distances with the previous and next neighbour and angle as shown in Table 1.

Step: 3-Matching. The features of the vertices extracted in the previous step determine the degree of matching possibility of any two fragments being compared. Matching criteria necessitates the summation of angles of vertices of two polygons, i.e., of polygon,

and of the polygon,

must be equal to 360. If the complement of angles is 360 then a matching parameter, is set to 1.

The previous and next distances of the

vertex of fragment, and previous and next distances of of the vertex of fragment

are compared.

The fragments considered for matching are allotted a matching degree, such that

(4)

is increased, if polygons, and of fragments i

and j under consideration satisfy certain degrees of matching as follows,

(5)

Step:4-Reconstruction. Once the metric to find matching fragments has been determined, the process enters the reconstruction phase. The algorithm compares each fragment with all other fragments to find best matching as that match which maximizes . The fragments with

maximum are merged to form a new fragment. The features of new fragment are added to the feature list, the merged vertices are removed and the matching process is continued i.e., if fragments i and j

are merged to form a

new fragment, the polygon, of newly formed fragment is added in the fragments list and the whole reconstruction procedure is continued from the first step for the remaining number of fragments.

IV. APB ALGORITHM

APB algorithm is proposed by Arindam Biswas, Partha Bhowmick and Bhargab B. Battacharya for the reconstruction of hand shredded documents. The various stages of the algorithm are explained through the following steps:

Page 3: 12

Step:1-Pre-processing. Contours, of scanned images

of f torn fragments are extracted using differential operators [4].

Table 1.Feature List for ELC Algorithm

Corners (vertices) of each contour are detected using the bending values [13] of discrete points constituting .

Step: 2- Feature extraction. After pre-processing, the feature list is generated for all contours . Feature list for the APB algorithm includes the distance between consecutive corners determined in the clockwise direction. Distance between consecutive corners, and

of individual contours is calculated

in clockwise direction as

(6)

Chain code of edge segment between and of individual fragment’s contour is determined. Edge identification number, and length of the chain code of

edge, are also noted. With all these features, a single height balanced AVL tree is constructed. Euclidean distances of all edges in all contours are stored as primary keys and edge number and their respective feature list are stored as auxiliary keys.

Step: 3- Matching. After the AVL tree construction, for each distance of edge segments of all contours, a search for distances which fall within the range

is performed in the tree. Here

pixels. These distances are arranged in ascending order corresponding to the differences among distances being compared.

Step: 4-Reconstruction. In the reconstruction phase, the best match of contour with the edge number is the

contour with the edge number which corresponds to

the first distance in the ascending array of distances.

Minkowski sum is defined for edge numbers

considering its union operation with a circular disc, ,

of radius . Required transformation like translation and

rotation are performed for these contours. If the edge is contained inside the envelope then the fragments corresponding to the edge numbers, are matching pieces. Otherwise next possible edge from distance array is checked with for reconstruction. If the

match is found, the new fragment which replaces the

matching fragments and . The newly formed contour is added in the fragments list during the second iteration of reconstruction. The first iteration continues until the above said steps are carried out on all the remaining contours.

IV. EXPERIMENTAL RESULTS

The experiments on reconstruction of shredded document were implemented in Matlab on a INTEL core (TM), 2.53GHz machine. Manually shredded test document used for ELC and APB algorithm are shown in Fig.2 and Fig.3. Shredded documents are reconstructed by ELC and APB algorithms.

ELC and APB algorithms design aspects reconstruct the hand torn documents. Fragments which has jigsaw edge segment [Fig.5(a)] will satisfy the matching criteria designed by ELC algorithm. In this algorithm the summation of angles of two vertices of fragments under consideration should be 360 and the respective previous, next distances of fragments considered should be equal. Natural shredding shall not always generate shredded document of the type that can be reconstructed by the matching criteria suggested by ELC algorithm. So the shredded document of test image 2, though are matching document are not merged by the ELC algorithm, since they do not satisfy its matching criteria. The degrees of freedom in matching some of the angles are explained imprecisely. According to normal tearing style of human, the probability of occurrence of such kind of hand torn fragments [Fig.6(a)] is less. So the ELC algorithm cannot produce reconstruction results for the shredded documents of test image 2.

APB algorithm uses distances between the edge segments of different fragments in matching phase. Those fragments whose differences of distance of edges between the vertices of the fragments which lie within predefined

threshold , are taken as matching

document by APB algorithm for merging. The matching fragments suffer from one or two pixel variance due to acquisition defects or due to pre-processing techniques. Though the pre-processing steps of APB algorithm generates one or two pixel variances, the distance

matching criteria with the predefined threshold

overcomes pre-processing limitations of APB algorithm. Since the reconstruction stage of APB algorithm deals only

vertex coordinate

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Page 4: 12

Fig.6(b) fragments do not satisfy the matching criteria of ELC algorithm.

with distance, it works with any type of hand torn fragments.

Comparing the APB and ELC algorithms, the ELC algorithm though passes through improved pre-processing stages is limited in the reconstruction of specific fragments as illustrated in figure(6.a) due to its critical with matching angle criteria as against the APB algorithm which reconstructs all kind of hand shredded document as shown if Fig.6((a)and (b). So the paper suggests APB algorithm as a general purpose reconstruction algorithm for the reconstruction of hand torn document fragments.

IV. CONCLUSION

This work analyses the design aspects and reconstruction results of hand torn fragments of the ELC and APB algorithms. ELC algorithm yields reconstruction for fragments of specified tearing style. APB algorithm reconstructs the hand torn fragments of all tearing styles. The future work will focus on new novel technique for reconstruction of hand torn fragments with less time complexity.

REFERENCES

1. H. Wolfson, “On curve matching,” IEEE Trans. Pattern Anal. and Machine Intell., vol. 12, pp. 483–489, 1990.

2. W. Kong and B. Kimia, “On solving 2D and 3D puzzles under curve matching,” in CVPR, 2001, pp. 583–590

3. David Douglas and Thomas Peucker, “Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,” The Canadian Cartographer, vol. 10, pp. 112–122, 1973

4. R.C.Gonzalez and R.E WOODs, Digital Image processing,Addison-Wesley Pub Co.,1993

5. E. Justino, L. S. Oliveira, and C. Freitas, “Reconstructing shredded documents through feature matching,” Forensic Science Intern., vol. 160, 2005.

6. C. Papaodysseus, T. Panagopoulos, M. Exarhos, C. Trianta fillou, D. Fragoulis, and C. Doumas, Contour-shape based reconstruction of fragmented, 1600 b.c. wall paintings, IEEE Signal Processing, vol. 50, 2002, pp. 1277.1288.

7. H. C. G. Leit.ao and J. Stolfi, A multiscale method for the reassembly of two-dimensional fragmented objects, IEEE Trans. PAMI, vol. 24, 2002, pp. 1239.1251

8. A.Pimenta, E. Justino, L. S. Oliveira, and R,Sabourin “Document Reconstruction using Dynamic Programming,” IEEE,Acoutics,speech and signal processing,2009.

9. F.Kleber and R.Sablanting, “A Survey of Techniques for Document and Archaeology Artefact Reconstruction” IEEE, Document Analysis and Recognition,2009

10. A.Pimenta, E. Justino, L. S. Oliveira, and R,Sabourin “Document Reconstruction using Dynamic Programming,” IEEE,Acoutics,speech and signal processing,2009.

11. F.Kleber and R.Sablanting, “A Survey of Techniques for Document and Archaeology Artefact Reconstruction” IEEE, Document Analysis and Recognition,2009

12. M.Kampel and R.Sablanting, “on 3D mosaicing of rotationally symmetric ceramic fragments”, IEEE, 2004.

13. C.papaodysseus, m.exarhos, M.Panagopoulos, P.Rousopoulos, C.triantafillou and t.panagopoulas, “Image and pattern analysis of 1650 B.C wall paintings and reconstruction”, IEEE trans.,systems.,Man and cybernetics, vol.38, n0.4 July 2008

14. M.-J. J. Wang, W.-Y. Wu, L.-K. Huang, D.-M. Wang, Corner detection using bending value, Patt. Rec. Letrs., 1995,pp. 575.583

15. L.Zhu,Z.Zhou, and D.Hu, “Globally Consistent Reconstruction of Ripped-Up Documents”,IEEE Transactions on Pattern Analysis and Machine Intelligence, VOL. 30, NO. 1, January 2008

Fig.2. Test Image 1 Fig.3 Test Image 2

Fig. 4. Reconstruction result of ELC algorithm

Test image 1

Test image 2

Fig. 5. Reconstruction results of APB algorithm

Test image 1

Fig.6(a) fragments satisfy the matching criteria of ELC algorithm

Fig.3 Test Image 2Fig.2. Test Image 1