WATERMARK SYNCHRONIZATION IN CAMERA PHONES …WATERMARK SYNCHRONIZATION IN CAMERA PHONES AND...

MASTER’S THESIS

WATERMARK SYNCHRONIZATION IN CAMERA PHONES AND SCANNING DEVICES

Pramila A. (2007) Watermark synchronization in camera phones and scanning devices. Department of Electrical and Information Engineering, University of Oulu, Oulu, Finland. Master’s Thesis, 83p.

ABSTRACT The development of Internet and numerous hardware and software applications have created a need for copy and copyright protection of the content. One possible way to fill this need is to use watermarking. The idea of digital image watermarking is to hide information to the image so that a human eye cannot detect the transformations made to the image, but a computer can read the hidden information. This work is focused on the use of watermarking in value-adding services instead of copy and copyright protection applications. The value-adding services offer extra services to the user. Therefore the user does not want to remove the embedded information but prefer to read it and gain access to the hidden data. In this work, two watermarking methods for digital images were developed and analysed. The first method is robust against a print-scan attack, where the image is printed and then scanned before extraction of the watermark. The second method is robust against a print-cam attack, where the image is first printed and then captured with a camera phone.

In both methods, multiple watermarking methods are applied in which multiple watermarks are embedded in the image. In this case, from two to three watermarks are embedded in the image out of which one is so called message watermark that contains a service message such as a link to a website or extra information about the image. Other watermarks are embedded in order to correct the geometrical distortions inflicted to the image by the scanning phase.

The results obtained were promising and success ratios high in both of the methods. Both methods were tested against various unintentional attacks, and it was concluded that both of the methods are robust against geometrical distortions and JPEG compression. Key words: multiple watermarking, value adding services, digital image watermarking, watermarking methods

Pramila A. (2007) Vesileiman synkronointi kamerapuhelimissa ja skannereissa. Oulun yliopisto, sähkö- ja tietotekniikan osasto. Diplomityö, 83 s.

TIIVISTELMÄ Internetin kehittyminen ja lukuisat laitteisto- ja ohjelmistosovellukset ovat luoneet tarpeen sisällön kopio- ja tekijänoikeuksien suojaamiselle. Yksi mahdollinen ratkaisu ongelmaan on vesileimauksen käyttäminen. Digitaalisen kuvan vesileimauksen idea on piilottaa informaatiota kuvaan siten, että ihmissilmä ei pysty erottamaan muutoksia, joita kuvaan on tehty, mutta tietokone pystyy lukemaan piilotetut tiedot. Tässä työssä keskitytään kopio- ja tekijänoikeuksien suojaussovellusten sijaan vesileimauksen käyttämiseen lisäarvopalveluissa. Lisäarvopalvelut tarjoavat käyttäjälle ylimääräisiä palveluita ja siten käyttäjä ei halua poistaa upotettua informaatiota vaan lukea sen ja päästä käsiksi piilotettuun aineistoon. Työssä kehitetään ja analysoidaan kaksi vesileimausmenetelmää digitaalisille kuville. Ensimmäinen vesileimausmenetelmä kestää kuvan tulostus- ja lukuhyökkäyksen, jossa kuva ensin tulostetaan ja sitten luetaan skannerilla ennen vesileiman irrottamista. Toinen menetelmä kestää tulostus- ja kuvanottohyökkäyksen, jossa kuva ensin tulostetaan ja tulostetusta kuvasta otetaan kuva kamerapuhelimella.

Molemmissa menetelmissä käytetään monivesileimausmenetelmiä, joissa kuvaan upotetaan useita vesileimoja. Tässä tapauksessa kuviin upotetaan kahdesta kolmeen vesileimaa, joista yksi on niin sanottu viestivesileima, joka sisältää palveluviestin kuten linkin nettisivulle tai ylimääräistä tietoa kuvasta. Muut vesileimat upotetaan jotta kuvaan lukuvaiheessa tulleet geometriset virheet saadaan korjattua.

Saadut tulokset olivat lupaavia ja onnistumissuhde suuri molemmissa menetelmissä. Molempia menetelmiä testattiin useilla tahattomilla hyökkäyksillä ja lopuksi voitiin päätellä, että molemmat menetelmät kestävät geometrisiä hyökkäyksiä sekä JPEG-pakkauksen. Avainsanat: monivesileimaus, lisäarvopalvelut, digitaalisen kuvan vesileimaus, vesileimausmenetelmät

TABLE OF CONTENTS

ABSTRACT TIIVISTELMÄ TABLE OF CONTENTS FOREWORD LIST OF ABBREVIATIONS AND SYMBOLS 1. INTRODUCTION..................................................................................................8

1.1. History of watermarking ..............................................................................8 1.2. Applications and scenarios...........................................................................9 1.3. Research problem .........................................................................................9

2. TRADEOFFS IN WATERMARKING................................................................11 2.1. Performance considerations .......................................................................11 2.2. Imperceptibility ..........................................................................................12

2.2.1 Visible and invisible watermarks ...................................................12 2.2.2 JND.................................................................................................12

2.3. Robustness..................................................................................................14 2.3.1 Robust and fragile watermarks.......................................................14 2.3.2 Printed images ................................................................................15 2.3.3 Print-scan........................................................................................15 2.3.4 Mobile phone with a camera ..........................................................17 2.3.5 Geometrical attacks ........................................................................18 2.3.6 JPEG transform ..............................................................................21

3. WATERMARKING METHODS ........................................................................24 3.1. Generic watermarking scheme ...................................................................24 3.2. Domains......................................................................................................25

3.2.1 Spatial .............................................................................................26 3.2.2 Fourier domain based methods.......................................................27 3.2.3 Wavelet domain based methods .....................................................29 3.2.4 Methods in other domains ..............................................................31

3.3. Multiple watermarking ...............................................................................32 4. COMMERCIAL DEVELOPMENT ....................................................................34

4.1. Digimarc.....................................................................................................34 4.2. CyberSquash...............................................................................................34 4.3. Bar codes ....................................................................................................36

4.3.1 Short history ...................................................................................36 4.3.2 Operation ........................................................................................37

4.4. Mobot .........................................................................................................38 4.5. Sanyo ..........................................................................................................38

5. PRINT-SCAN RESILIENT WATERMARKING...............................................39 5.1. Frequency domain template .......................................................................39

5.1.1 Embedding......................................................................................39 5.1.2 Extracting .......................................................................................41

5.2. Spatial domain template .............................................................................46 5.2.1 Embedding......................................................................................46 5.2.2 Extracting .......................................................................................47

5.3. Wavelet domain multibit message .............................................................51 5.3.1 Embedding......................................................................................51

1

5.3.2 Extracting .......................................................................................52 5.4. Experiments and results..............................................................................52 5.5. Discussion ..................................................................................................54

6. PRINT-CAM RESILIENT WATERMARKING ................................................56 6.1. Frame detection method .............................................................................56

6.1.1 Embedding......................................................................................57 6.1.2 Extracting .......................................................................................58

6.2. Experiments and results..............................................................................60 6.3. Discussion ..................................................................................................62

7. DISCUSSION ......................................................................................................64 8. CONCLUSION ....................................................................................................66 9. REFERENCE .......................................................................................................67 10. APPENDIXES......................................................................................................71

FOREWORD This work was done at MediaTeam Oulu research group in Information processing laboratory, University of Oulu, Finland. The work was part of Zirion project with a focus on the use of watermarking in value-adding services and reading the watermarks with camera-phones. I would like to give my sincere thank you for all those people who have helped me with this work. Especially, I would like to thank Professor Tapio Seppänen for all the great advice and instructions during the work and Ms.c. Anja Keskinarkaus for all the ideas and help. A huge thank you belongs also to my friends and fellow students with whom I was able to share my thoughts and fears about the work. I would also like to thank my parents for always being there for me and my brother Jani for cheering up my days. Oulu 8.2.2007 Anu Pramila

LIST OF ABBREVIATIONS AND SYMBOLS AC Alternating Current BCH Bose-Chaudhuri-Hocquenghem. Error correction code BER Bit Error Ratio CCITT International Telegraph and Telephone Consultative Committee CMOS Complementary Metal Oxide Semiconductor COP Centre of Projection DA/AD Digital to Analog/ Analog to Digital transform DC Direct Current DCT Discrete Cosine Transform DFT Discrete Fourier Transform DRM Digital Rights Management DS-SS Direct Sequence Spread Spectrum, modulation method DWT Discrete Wavelet Transform EAN.UCC International-Uniform Code Council GF Galois Field GS Guided Scrambling HP Hewlet Packard HSI Hue, Saturation, Intensity. A colour space. HVS Human Visual System IEC International Electrotechnical Commission ISO International Organization for Standardization ITU International Telecommunication Union JND Just Noticeable Difference JPEG Joint Photographic Experts Group MSE Mean Squared Error NTT Nippon Telegraph and Telephone Corporation PC Personal Computer PSNR Peak Signal to Noise Ratio PSPNR Peak Signal to Perceptible Noise Ratio UPC Universal Product Code URL Uniform Resource Locator

1. INTRODUCTION The history of data hiding is long and the amount of applications wide. This chapter contains a brief glance to the history of data hiding and watermarking and applications invented. Then the research problem is presented and some scenarios viewed.

1.1. History of watermarking

Entering into the digital era has brought along many problems to be solved and questions to be asked. Distributing pieces of content, such as music, text, images or video, is now easier than ever before. With a few clicks pieces of content are transferred from one user to another, but the content providers are troubled: with those few clicks, a great deal of data and content that should not move also move, and coins that should drop inside cash machine fail to do so. A while ago watermarking was proposed as a way to restrict the content from moving too freely. The idea was that by hiding information to the piece of content the piracy and illegal copying could be stopped or at least the people behind it could be captured. It has been said that one picture is worth a thousand words, but few people know that a picture may truly contain a thousand words, hidden beneath the surface.

Since ancient times the idea of data hiding has been used in several different applications, some of which are still in use. Hiding information from the enemies has always been vital in wars, and finding enemies’ secret messages has been even more important for changing the course of the battle. Little did the warriors back in the time of Roman Empire know that their efforts in hiding and uncovering messages led to development of early watermarking technologies.

The first ‘real’ paper watermarks were introduced in the late 1200’s in Italy, though the watermarking technology had been invented a thousand years earlier in China. By the eighteenth century the paper watermarks were widely used in trademarking to record manufacturing dates. At that time, watermarks began to be used in money and other documents for copy protection. [1] The first digital watermarking proposals were invented somewhere around the late 1970’s and the early 1980’s, but the term ‘digital watermarking’ itself was not introduced until the late 1980’s [1]. After that hundreds of different digital watermarking methods have been proposed for hiding information in digital content like audio, image and video.

It was not a long time ago when digital watermarking was seen as a solution for many copy and copyright protection problems. There are, however, many critics of the digital watermarking who argue that the watermarks decrease the quality of the content, they are easily destroyed and bring more harm to ordinary consumers than is acceptable. They forget, however, that copyright protection is but one of many kinds of watermarking applications and even the copyright protection algorithms are still under development.

9

1.2. Applications and scenarios

Most of the watermarking technologies through the ages have been developed to protect the content from illegal and unauthorized use, which is still the most important application area, but watermarking is capable of much more. Some service providers have realized this and a new generation of watermarking technologies is rising. In addition to copy- and copyright protection, the watermarking can be used in many ways: It can be employed in fraud detection to tell if the content has been changed. It can also be used for embedding metadata in the content to tell, for example who has made the image or who composed the song, or it can be employed for commercial purposes as value-adding services, where the embedded information is beneficial to the user. This thesis is about the last-mentioned type of applications to answer a question what kind of watermarking is required in mobile phone environment to be able to produce value-adding services. MediaTeam Oulu research group has launched a project called Zirion, where the embedded information is so-called caption data, which offers more information about host media content to the user. The information can be a link to further information, a link to another service, confidential textual data, of another media type such as an audio file or even a functional command. Therefore the embedded information is beneficial to the user, and the main presumption is that the user does not want to destroy the embedded information and intentional attacks are not expected. Although intentional attacks may not appear, some unintentional attacks must be considered. For example, while reading the watermark with a camera phone, digital-to-analog-to-digital (DA/AD) conversion, JPEG compression and some geometrical distortions are almost bound to happen. DA/AD conversion and JPEG compression add noise to the signal and geometrical attacks change the positions of the image pixels, making detecting of the watermark difficult. These kinds of attacks define requirements for the robustness of the watermarking method used.

A possible scenario is show in Figure 1. In the scenario, a user flips through a catalogue and when she sees an interesting piece of merchandise, she orders it by taking a picture of it with her camera phone. The program in the phone processes the image and extracts the watermark which contains the information of the product. Soon, the user receives a packet through post with the merchandise in it.

Another scenario could be linked with periodicals. A watermark can be embedded in an image of a famous rock band in a magazine. The watermark may contain a link to a website to order ticket to a concert or a ring tone from the album. The future scenarios are limited only by imagination.

1.3. Research problem

In print-cam robust watermarking, the watermark should be readable with a

camera phone or digital camera by taking a picture from the watermarked image and then processing it. The aim of the research is focused on finding a method that could recover from geometrical distortions such as rotation, scaling and translation in an environment with high level of distortions.

The research done on the field of value-adding watermarks is limited and only few

10

Figure 1. An example of an application for reading watermarks with a camera phone.

papers have been proposed on reading watermarks with camera phones. The aim of this work is to invent new methods for value-adding watermarking.

For simplicity a watermarking algorithm was first created in print-scan environment where most of the attacks are similar to those of the mobile phone environment. The biggest difference between the print-scan process and taking a picture with a mobile phone camera is the fact that the print-scan process results in a two dimensional problem, whereas the picture taking with a camera phone is clearly a three dimensional problem.

The research work was done on PC environment and with a Matlab® program. Testing of the watermarking algorithm was made with a camera phone by sending the picture of a watermarked image to PC and processing it there. Some restrictions were necessary to make, and consequently, the watermarking methods were not tested with different lighting conditions, paper qualities or image resolutions. Also, the area around the image was left blank to simplify the extraction process.

In this work the term ‘print-cam process’ is used of a process where the watermarked image is printed on a paper and then read with a camera phone by taking a picture of the watermarked image. In some previous papers, ‘print-capture process’ has been used, but the term is somewhat misleading: ‘capturing’ is not specific to cameras but may mean many kinds of image reading, such as filming.

The contents of this thesis are as follows: Chapter 2 explains some properties of watermarks and describes some of the attacks that could be encountered in this kind of application. Chapter 3 gives an introduction to different watermarking technologies in a form of a literature survey. Chapter 4 extends the information in Chapter 3 and gives some examples of commercial applications that are already in use. Chapter 5 proposes a method for value-adding watermarks that is robust against the print-scan process and Chapter 6 proposes another method that is robust against the print-cam process.

11

2. TRADEOFFS IN WATERMARKING All digital watermarks have three properties: imperceptibility, that is how visible the watermark is, robustness, which means how well the watermark resists attacks, and a capacity which tells how much information can be embedded in the image with the watermarking technology in question. These properties are, however, conflicting, and no watermark can have high levels in each at the same time. Here I shall explain the tradeoffs involved and each property in more detail.

2.1. Performance considerations

The idea of watermarking is to embed information in the host data by applying minor changes in such a manner that a human eye cannot perceive the embedded data. The watermark information can be extracted afterwards from the host data by detecting these modifications. There are a wide range of possible modifications that can be done to the host data and the host data can also be, for example, transformed to another domain, such as the Fourier, the wavelet, the DCT (Discrete cosine transform), or even some fractal domain, where the properties of the transform domain can be taken advantage of. [2] Transform domains will be discussed in more detail in Chapter 3.

Each watermark should have three commonly acknowledged properties. They are imperceptibility, capacity and robustness. All of these are important requirements for a good watermark, but no watermark can have high levels in them all at the same time and therefore some trade off will happen between these three. The illustration of the trade off is shown in Figure 2, where the properties are arranged in a triangle. The yellow point shows the chosen point of the trade off for an application where imperceptibility is not highly important, but where capacity and robustness are valued high.

Figure 2. Triangle of tradeoffs while choosing watermark properties.

Watermark is described as robust if it cannot be removed from the signal where it

is embedded without destroying the signal [3]. The watermarks are usually divided in fragile and robust watermarks and the fragile watermarks are meant to get broken

Imperceptibility

Capacity

Robustness

12

when the content is tampered. In value-adding watermarking, the watermarks are generally considered robust but only against unintentional attacks such as geometrical distortions, lossy compression and AD/DA transforms.

Imperceptibility means how visible the watermark is. Usually the imperceptibility is evaluated with HVS (Human Visual System), but also special Just Noticeable Difference (JND) methods can be applied [3]. This means that instead of embedding the watermark to the image with a constant level of intensity the strength at which the watermark is embedded is chosen for each pixel separately.

A third required property of a watermark is capacity. Capacity is the amount of data that can be included in the image by using the specified watermarking method. The capacity requirements vary between different watermarking applications and in value-adding watermarking a small capacity can be sufficient. Out of the three properties, imperceptibility and robustness are explained in more detail in the following sections.

2.2. Imperceptibility

Seeing the watermark and only knowing it is there defines the visible and invisible watermarking respectively. Here the two ways to use watermarking and later the JND method are explained.

2.2.1 Visible and invisible watermarks

Usually watermarks are divided in visible and invisible watermarks, where the idea of a visible watermark in digital image processing is the same as in physical paper watermarking. The visible watermark is usually a transparent logo or the name of the copyright owner, and it is included on the top of the image. It is a simple way to show who owns the image, but it can be easily removed by cropping it off or by using some image processing tool. Therefore it should be placed somewhere on the image where it is difficult to remove without destroying the quality of the image. Invisible watermarks cannot be perceived with human eyes when comparing the original and watermarked images, but a computer accompanied with suitable program can read them. This is possible by employing the properties of HVS and embedding the watermark weakly enough to remain unseen but strongly enough to be robust against certain attacks.

2.2.2 JND

To be able to embed the messages as robustly as possible but so that the watermark stays invisible, the properties of HVS must be studied. The most common way to embed watermarks is to choose first some scaling factor for the strength of the watermark, and use this scaling factor through entire image. It is a simple and efficient way but not very effective. A human does not see all the colours and

13

intensities similarly and therefore embedding a watermark with only one coefficient does not always work as expected. Three properties of the HVS are usually presented: frequency sensitivity, luminance sensitivity and contrast masking. Frequency sensitivity, here spatial frequency sensitivity, means that the high frequencies, that is, fine picture details, are less visible. Luminance sensitivity means that if the background luminance is high, the luminance increase is not perceived. Contrast masking means that an image detail may be difficult to detect in the presence of another detail. [4] These properties of the HVS enable us to use the watermarking technologies and embed the information robustly to the images. Especially, the watermark can be embedded more strongly to image areas where there is some texture and high variation in luminance values and leave the plain image areas unwatermarked. The methods where the embedding strength is selected for each pixel individually, so that the pixel value changes only a certain amount without making the change perceptible, are called JND methods. One JDN model for images with eight bit intensity levels is proposed by Chou and Li [4]. The scaling factor for each pixel is derived from the following equations: { })),(()),,(),,((max),( 21 yxbgfyxmgyxbgfyxJND fb = (1)

)),(()),((),()),(),,((1 yxbgyxbgyxmgyxmgyxbgf βα += (2)

>+−

≤+−=

127),(3)127),((

127),(3)))127/),((1()),((

2/10

2yxbgforyxbg

yxbgforyxbgTyxbgf

γ (3)

115.00001.0),()),(( +⋅= yxbgyxbgα (4)

WyHxforyxbgyxbg <≤<≤⋅−= 0,001.0),()),(( λβ , (5)

where H and W are the height and width of the image, respectively. bg(x, y) is the average background luminance and mg(x, y) is the maximum weighted average of luminance differences around the pixel at (x, y).

{ }),(max),(4,3,2,1

yxgradyxmg kk =

= (6)

∑∑= =

≤≤≤≤+−+−=5

1

5

1

0,0),,()3,3(16

1),(

i j

kk WyHxforjiGjyixpyxgrad (7)

∑∑= =

+−+−=5

1

5

1

),()3,3(32

1),(

i j

jiBjyixpyxbg (8)

G1=

−−−−−

00000

13831

00000

13831

00000 G2=

−

−−

−−

00100

08300

13031

00380

00100 G3=

−

−−

−−

00100

00380

13031

08300

00100 G4=

−

−

−

−

−

01010

03030

08080

03030

01010 B=

11111

12221

12021

12221

11111 (9)

14

The function f1(x, y) models the spatial masking effect, which means that values near edges in an image can be changed much more that the values near constant intensities. The function f2(x, y) defines the visibility threshold due to background luminance and the values T0, γ and λ are chosen as to be 17, 3/128 and ½ respectively. [4]

To test how well the JDN calculations work, Chou and Li developed a so-called PSPNR (Peak Signal to Perceptible Noise Ratio) value. Often, the PSNR (Peak Signal to Noise Ratio) value is calculated with

MSEgolPSNR

25520 10= , (10)

where the MSE is the mean squared error. Unfortunately it cannot tell accurately the perceptual quality of the image and therefore some other methods are needed [4]. The PSPNR value measures the perceptible distortion energy and it is defined as

[ ]{ }

WyHxfor

yxJNDyxpyxpif

yxJNDyxpyxpifyx

yxyxJNDyxpyxpE

golPSPNR

fb

fb

fb

<≤<≤

≤−

>−=

⋅−−=

0,0

),(),(ˆ),(,0

),(),(ˆ),(,1),(

),(),(),(ˆ),(

25520

210

δ

δ

, (11)

where the ),(ˆ yxp denotes the reconstructed pixel at (x, y) and ),( yxJND fb the

original JND profile. [4]

2.3. Robustness

Watermark that will not endure even the most common data processing such as scaling is useless. Even the fragile watermarks that are meant to get broken will not ideally break by accident. This chapter presents some of the attacks expected, including JPEG compression and geometrical attacks. The print-scan process and picture taking with a mobile phone are viewed separately.

2.3.1 Robust and fragile watermarks

Most of the watermarks are designed so that they resist many kinds of attacks meant to destroy the embedded information, but some watermarks are made to get broken. These watermarks are called fragile watermarks and their purpose is to detect tampering of the image. If the image is purposefully changed, the watermark will get destroyed.

Robust watermarks are required to survive through almost all attacks, but designing such watermarks is very difficult. It is practically impossible to design a

15

watermarking system that would resist all kinds of attacks. With a careful analysis of system requirements, however, it is possible to design a watermarking system that is robust against most probable attacks in the required environment. One way to deal with different kinds of attacks is to use multiple watermarking. That is, few watermarks are embedded in the image each of which is designed to recover from different kinds of attacks. Multiple watermarking is explained in more detail in section 3.3.

2.3.2 Printed images

Most of the time, when taking about the print-scan process, only the scanning process is considered and the printing process is neglected. However, the printing process also inflicts attacks to the watermarking process. It is generally acknowledged that the printing quality varies between different printers. Perry et al. [5] experimented with different printers and concluded that the end products of different printers vary across different manufacturers and even between identical models from the same manufacturer. Paper quality obviously affects the quality of the printed image, and Perry et al. reminded that also ink density has an effect on the result [5]. These results show that the printing process should not be neglected while reading watermarks from printed images but carefully considered.

2.3.3 Print-scan

Print-scan process has many similarities with photographing and therefore it has been used as a prerequisite when defining a watermarking system for camera phones. In the process, watermarked image is first printed and then scanned, and as a consequence, the watermark should be robust against various kinds of attacks. Figure 3 shows the user interface of the Epson 15000 scanner, where the user defines the scanning area with a dash line quadrilateral. A large portion of the background of the image is also being cropped along, and the watermark is no longer in the centre of the scanned image. The watermark should endure through geometrical transformations, such as rotation, scaling and translation, but it should also be readable after DA/AD transform and noise addition and it should not get broken by slight cropping of edges. Some research has been done recently on the properties of the print-scan process, but the problem is complex, because the distortions during the print-scan process are printer/scanner-dependent and time-variant even for the same printer/scanner [6, 7]. While trying to find properties that are invariant to the print-scan process, Solanki et al. [7] studied the print-scan–properties of the discrete Fourier transform (DFT) magnitudes and concluded that:

1. The low and mid frequency coefficients are preserved much better than the high frequency ones.

2. In the low and mid frequency bands, the coefficients with low magnitudes see a much higher noise than their neighbours with high magnitudes.

16

Figure 3. The user interface of the Epson GT-15000 scanner and scanning area selected by a user.

3. Coefficients with higher magnitudes see gain of roughly unity. 4. Slight modifications to the selected high magnitude low frequency

coefficients do not cause significant perceptual distortion to the image. These properties were further studied by He and Sun [6], who introduced three more properties including:

5. Most textures can be preserved, or, most relationships between DFT coefficients are preserved though individual DFT magnitude may vary.

6. The dynamic range of intensity values is reduced, that is, the original range between 0-255 becomes 70-250 after the print-scan process.

7. The distribution of pixel values after the print-scan process look roughly like a spindle as in Figure 4.

Figure 4. The intensity distribution of the print-scan process. X-axis represents the original intensity while the Y-axis represents the print-scanned intensity.

These results give some guidelines to design a working print-scan robust watermarking system.

scanned background

image

17

2.3.4 Mobile phone with a camera

While the print-scan process is clearly a two dimensional problem, taking a picture with a camera phone, that is, the print-cam process, is a three dimensional one. All attacks that occur in the print-scan process will also occur in the print-cam process. This is not by any means the end of the story, but photographing with a camera phone introduces an abundance of attacks to watermarking systems. Some of the attacks explained here are due to the mobile phone camera properties and some are interlinked with the camera lens. The camera phone itself presents some technological constraints that need to be considered. One of the biggest problems is the low processing power of the camera phones, which sets new requirements for the watermarking system. The application must be lightweight and its memory consumption must not exceed certain limits if the watermark processing is done in the phone. The watermarking system should also be robust against JPEG-compression because in most of the camera phones the captured image is automatically compressed before saving. The JPEG-compression is explained in more detail in section 2.3.6. The cameras in mobile phones are not of high quality, and, although the qualities approach those in digital cameras, they are still far behind. At present, the best cameras in the mobile phones are of resolution two megapixels or more but camera phones like that are still rare. It must be remembered that the amount of megapixels is not the whole truth, but also the quality of optics has a huge impact on the quality of an photographed image. Even high quality optics will not entirely save the image from pincushion and barrel distortions shown in Figure 5. In barrel distortion, straight lines in real world bow outward in images, whereas in the pincushion distortion straight lines bow inward. In both distortions, the amount of distortion is bigger close to image edges. Fortunately, this type of distortion can be corrected easily because in every camera the properties of the lens stay the same, and therefore the parameters that define the amount of distortion can be determined beforehand for each camera lens.

Figure 5. Chess patterned reference image a) original image b) barrel distorted image c) pincushion distorted image.

While correcting barrel distortions, all that is needed to know are the properties of the camera lens. These properties can be found out by taking one or more pictures from a reference image and analysing the pictures. A reference image can be for example a chess board image, where black and white squares alternate as in Figure 5.

18

The properties of the lens stay the same in every picture taken, and after the properties have been once found, the barrel distortion can always be inverted. In this work, separate software is used for correcting barrel distortions. Camera Calibration Toolbox for Generic Lenses is a Matlab toolbox made by Kannala which is freely available in the Internet [8]. The toolbox is based on the generic camera model and enables correction of barrel distortions as well as several other corrections [9]. The calibration of the camera was done by using a calibration cube shown in Figure 6.

Figure 6. Reference cube for the Calibration Toolbox for Generic Lenses. Other kinds of distortions that should be corrected are the effects of three dimensional world, that is, perspective distortions. It is practically impossible to set the camera so that it is entirely perpendicular to the image and therefore the picture taken will be slanted.

2.3.5 Geometrical attacks

Robustness against geometrical attacks is necessary in designing a print-scan or print-cam robust watermarking system. In this research, mostly rotation, translation and scaling are studied, but also barrel distortion and perspective transformations are paid attention to.

In Figure 7, there is as an example photograph of a watermarked image that has been taken with a Nokia N90 camera phone. As seen, there is a visible barrel distortion in the image and also the perspective has somewhat changed: the right side of the image is slightly narrower than the left side. These distortions make the reading of the watermark difficult, and without a proper watermarking technology all the information embedded in the image could be lost.

The previously proposed methods for reading the watermark from distorted images can be divided roughly in two main categories. The first one is to find out the geometrical transformations that the image has gone through and then apply an inverse transform [10]. The other way is to embed the watermark in a transformation invariant domain, such as the Fourier-Mellin domain [11].

While taking a picture with a camera, it is practically impossible to keep the camera perfectly straight and perpendicular to the object as shown in Figure 8. Therefore some perspective transformations will happen. A perspective transform

19

Figure 7. An image taken with a N90 camera phone where some distortions have occurred.

is a result of projecting a three dimensional scene on the two dimensional image plane. Usually perspective transformation is well approximated by an affine transformation and is equivalent to the composed effects of translation, rotation, scaling and shear. Here, homogeneous coordinates are utilized to define the transformation matrixes because all affine transformations can be represented as matrix multiplications in homogeneous coordinates [12]. Homogeneous coordinates are explained in more detail in Appendix 1.

COP

Figure 8. When the camera is not perpendicular to the object perspective, transforms will happen.

Image plane

COP

Image plane

20

Translation is defined as an operation that displaces image points by a fixed distance in a given direction. It is possible to describe translation of point P to point P’ by specifying a displacement vector d by

,' dPP += (12)

for all points P on the object. The homogeneous coordinate forms of these points and the vector are

,

1

= y

x

P

′

=

1

'' y

x

P ,

=

0

y

x

d α

α

, (13)

from where we can see that

.

,

y

x

yy

xx

α

α

+=′

+=′ (14)

This result can be represented as a matrix multiplication [12]:

TPP =′ , (15)

where

.

100

10

01

= y

x

T α

α

(16)

For scaling where a fixed point, that is, the point that is unchanged by the

transformation, is at the origin, the two corresponding equations to (12) are

,

,

yy

xx

y

x

β

β

=′

=′ (17)

where βx and βy are the scaling coefficients for x and y dimensions, respectively [12]. These equations can be combined as

SPP =′ , (18) where the transformation matrix is

=

100

00

00

y

x

S β

β

. (19)

21

The third basic transformation matrix, rotation, can be derived similarly. The fixed point is again set at the origin and the equations for rotation are

,

,

Θ+Θ=′

Θ−Θ=′

oscynsixy

nsiyoscxx (20)

where Θ is the rotation angle counter clockwise about the origin. [12] The matrix form is

RPP =′ , (21) where

ΘΘ

Θ−Θ

=

100

0

0

sconsi

nsiosc

R . (22)

The motivation for using transformation matrices to represent transformations is that transformations can be combined and inverted. This can be done by using matrix multiplication [12]. For example, if T is translation matrix and R rotation matrix, we get

RTaCab == , (23)

where a is some vector, C is the new combined transformation matrix of R and T, and b is the resulting translated and rotated vector. The order of the transformation matrixes is important, because RT in not the same as TR. Here RT means that the vector a has been translated first to some location and then rotated around origin. If the equation had been TR, the vector a would have been first rotated and then the rotated vector would have been translated.

2.3.6 JPEG transform

In many kinds of image processing applications, compression algorithms play an important role. Reducing the file size of an image is necessary for storage and transmission. Especially in mobile phone environment, where the space is scarce, heavy compression is used. One of the most well known and widely used algorithms is JPEG (Joint Photographic Experts Group), which is usually defined as a lossy compression algorithm. This means that some of the information the image contains is lost during compression. Losses are not usually noticeable with human eyes but affect on the watermark extraction quality. Most of the time when JPEG is being talked about the image compression standard is meant. Actually the abbreviation JPEG means a joint ISO/CCITT (International Organization for Standardization / International Telegraph and Telephone Consultative Committee) committee group that has published several standards including ISO/IEC IS 10918-1 | ITU-T Recommendation T.81, which is referred

22

often as ‘JPEG’. The standard was approved by ISO and CCITT, which is now called ITU-T (International Telecommunication Union, Telecommunication Standardization Sector), in 1994. [13] JPEG is designed to be an efficient coding scheme for continuous tone (multilevel) still images and it was intended to become the first international digital compression standard for still images. It has four encoding modes under which various coding algorithms are defined:

1. Sequential encoding 2. Progressive encoding 3. Lossless encoding 4. Hierarchical encoding

The implementations are not required to cover all of these, but the baseline system is based on sequential coding. [14, 15] Coding algorithms are mainly based on two dimensional DCT (discrete cosine transforms) except the lossless encoding scheme that employs predictive processes. In the lossless encoding, predictive coding and entropy coding are used. The resulting compression ratio is only about 2:1 but because no information is lost, the decoded image is an exact replica of the original image unlike in DCT coding schemes where some of the information is always lost in quantization. [14]

In the DCT based encoding process, samples of an image are grouped into 8x8 blocks, each of which is transformed with DCT into a set of 64 coefficients. Each of the coefficients is then quantized by a different uniform quantizer, where the quantization step-sizes are based on a visibility threshold of 64-element quantization matrices. The standard does not specify default values for quantization tables but lets the applications specify values for their particular task. [14, 15] After quantization entropy coding is applied, the DC coefficient is differentially encoded by using previous quantized DC (Direct Current) coefficient to predict the current DC coefficient. The 63 AC (Alternating Current) coefficients are transformed into one dimensional sequence with a zigzag scan shown in Figure 9. The one dimensional sequence is then entropy coded by using either Huffman or Arithmetic coding. For the baseline system, only the Huffman coding is used. [15]

The JPEG transform is for the moment the most commonly used image compression standard. However, the situation will change in the near future when the new JPEG2000 (ISO 15444) standard gains ground. The JPEG2000 standard uses wavelet transformations instead of Fourier domain, and it is claimed to be able to compress images up to 200 times with no appreciable degradation in quality. [16]

23

Figure 9. A zigzag scan of quantized DCT coefficients.

24

3. WATERMARKING METHODS Methods to embed watermarks are not limited to one domain but the watermark can be embedded in almost any transformation domain available. There can even be multiple watermarks which are then embedded in the image: some to the same domain and some to different domains. Explaining all the different watermarking methods proposed is practically impossible, and therefore, only some of the most important methods concerning our application are explained. The first section explains the basic watermarking scheme, the second section is actually a brief literature survey of the previously proposed methods in different domains and the third section is about the multiple watermarking.

3.1. Generic watermarking scheme

Watermark can be embedded in an image with many ways. Some researchers exploit the properties of transform domains, others create transform domains of their own with properties they need. Here I shall present some methods used in the different domains focusing mostly on the blind print-scan attack resilient watermarking methods. In blind watermarking methods, the original image is not needed in the extraction process, whereas in the non-blind ones the original image is required. Before embedding the watermark, the pixels of an image are usually divided into luminance and chrominance components. It is possible to embed the watermark in some colour information, but the most common way is to use luminance information. [2]

The watermark itself is usually a pseudorandom noise signal consisting of the integers {-1, 0, 1}, and the amplitude of the signal is low compared to the image amplitude to prevent the watermark from being visible. The only constraints are that the watermark signal should not correlate with the image content and the energy in the pseudorandom signal should be uniformly distributed. The most straight-forward way to embed a watermark is thus to add the pseudorandom signal with a suitable gain factor to the luminance values of the pixels of an image. [2]

The basic watermark embedding process is illustrated in Figure 10, where the watermarked image IW(x, y) is obtained by adding the pseudorandom sequence W(x,

y) to the original image I(x, y). The corresponding formula is

),(),(),( yxkWyxIyxIW += , (24)

where the pseudorandom sequence W(x, y) is multiplied by a small gain factor k. [2] The previously embedded watermark can be detected by calculating the cross-correlation

∑ ∑−

=

−

=

++=1

0

1

0, ),(*),('),(

M

m

N

n

WI jnimWnmIjiR (25)

between the possibly watermarked image I’W(x, y) and the complex conjugate of the

25

Figure 10. Generic watermark embedding procedure. pseudorandom sequence W(x, y). If the result of the correlation exceeds some predefined threshold, the watermark is detected.

The detector can make two kinds of errors. It may detect a watermark even if there is none, an error known as the false positive, or the detector may not detect the watermark even if there is one, an error called the false negative. Generally the false positives are considered as a worse kind than the false negatives because if the existing watermark is not found, the image can be checked again and again whereas the false positives cannot be corrected but the watermark is assumed to be detected even if it is not. [2] By using the aforementioned method only one bit can be embedded. To increase the payload, the image can for example be divided into several blocks or sub-images and embed a bit of a string of information in each of these sub-images, as did Smith and Comiskey [17]. Figure 11 [2] illustrates a similar method.

3.2. Domains

This section focuses on some of the most common watermarking domains. First some methods embedding the watermark in spatial domain are dealt with, then, some methods working on Fourier domain and wavelet domain are discussed in a similar way. The last section is for the methods working on other domains, not mentioned here.

26

Figure 11. Embedding watermark in blocks.

3.2.1 Spatial

Nowadays, a robust watermark is required to hold on through many kinds of attacks, out of which geometrical attacks are considered the most difficult ones to recover from. Kostopoulos et al. [18] tried to solve this problem by embedding multiple cross-shaped patterns in the image. Their method seemed to improve the robustness of a watermarked image against small amounts of rotation, translation and scaling, but it was very vulnerable to noise and more sophisticated attacks. Methods that rely on synchronization template like the method by Kostopoulos et al. are not generally valued high - for it is easy for an attacker to remove the template after which the watermark cannot be read. Template embedding methods are, nevertheless, a very robust way to recover from geometric distortions, and while designing value-adding services only unintentional attacks must be considered and consequently the template removal attacks are not expected. A large number of template embedding methods have been proposed and studied for their great ability to recover from geometrical distortions. Kutter proposed [19] a method for recovering from general geometric transformations. The idea of this method was multiple embedding of the same watermark on shifted locations in the image. The method can be seen as some sort of spread spectrum watermarking except that he used an extra step that was used for predicting of the embedded watermark and thus increase the performance of the detector. Watermarks could then be predicted and correlated to determine the geometric transformation.

27

Deguillaume et al. [20], too, proposed a method based on repetition. In the method a periodic pattern was embedded in an image in order to get a high number of peaks after autocorrelation from the magnitude spectrum of Fourier transform. After autocorrelation Hough or Radon transform could be applied to determine the regular grid shaped template. With the orientation of the grid, it was possible to determine the parameters of the general affine transform applied to the image.

3.2.2 Fourier domain based methods

Fourier transform is one of the most famous transforms used in signal processing. It was named after Joseph Fourier, a French mathematician and physicist, who lived during Napoleon’s time and was the first to suggest that any function of a variable can be expanded in a series of sines of multiples of the variable. This was not true, however, but the suggestion that it might be true, even partially, was a breakthrough. [21]. The two dimensional discrete Fourier transform (2D-DFT) of f(i,k) is defined as

∑ ∑−

=

−

=

+−

=1

0

1

0

2

),(),(N

i

M

k

N

kn

M

imj

ekifnmFπ

, (26)

where f(i,k) is an N-by-M array and j2= -1. The result F(m,n) is a complex signal, with real and imaginary parts, out of which the magnitude and phase of the Fourier transform can be determined. The magnitude and phase of a Fourier transform are described respectively as

),(),(),( 2Im

2Re nmFnmFnmF += , (27)

= −

),(

),(),(

Re

Im1

nmF

nmFantnmθ , (28)

where FRe is the real part of the transform and FIm is the imaginary part. [22]

The inverse transform is defined as

∑ ∑−

=

−

=

+

=1

0

1

0

2

),(1

),(N

n

M

m

N

nk

M

mij

enmFNM

kifπ

. (29)

From these equations it can be seen that a rotation in spatial domain follows a rotation in frequency domain, that is,

( )θθθθ

θθθθ

oscvinsuinsvoscuF

oscyinsxinsyoscxf

+−+

⇔+−+

,

),( (30)

and scaling in spatial domain corresponds to scaling in frequency domain, that is,

28

⇔

b

v

a

uF

abbyaxf ,

1),( . (31)

Because of the properties of the Fourier domain presented above, the Fourier

transform is a powerful tool in watermarking. One of the most difficult and crucial phases in detecting a watermark is the ability of the watermarking system to find the watermark from the distorted image. When using Fourier domain, the problem decreases significantly by noting that the magnitudes of the Fourier domain are invariant to translations in spatial domain but not to rotation and scaling [22]. A translation in the spatial domain is a phase shift in the frequency domain.

Some research has been done on the watermark templates in the frequency domain. Pereira and Pun [10] embedded a template in the middle frequencies of the Fourier domain magnitudes. The template they used did not contain any information but was merely used for detecting the transformations the image had gone through. The template consisted of approximately 14 points embedded in the magnitudes of Fourier domain. The points were embedded uniformly along two lines at different angles and by finding these lines from transformed and watermarked image, the amount of rotation and scaling could be determined. The actual message was embedded by using spread-spectrum methods into the Fourier domain between two radii occupying a mid-frequency range. Their method has been criticised because the template is easy to remove and thus the actual watermark is also lost. The method is nevertheless quite robust against rotation, scaling and noise. [10]

Another similar method was proposed by Lee and Kim [23]. They embedded a pseudorandom sequence into the middle frequencies of the input image as in Figure 12 and used cross-correlation at different radii to find the sequence, as illustrated in

Figure 12. Composing the circular template.

29

Figure 13. Since the sequence was pseudorandom, they could derive the amount of rotation by finding the position of the cross-correlation peak. The drawback of this method was that the rotation angle could only be calculated at the precision of 1° and the amount of translation could not be found. On the other hand, the method is fairly fast and relatively simple to use.

Figure 13. a) The spectrum of the watermarked image. b) Searching of the template.

The magnitudes of the Fourier domain are generally used for their invariance to translation in spatial domain. In some papers the idea of the invariance in Fourier magnitude domain has been developed further and domains that are invariant to translations, rotations and scalings have been researched. O’Ruanaidh and Pun used Fourier-Mellin transform based invariants for watermarking [11]. There is one drawback in this method, however: it works only against rotation, scaling and translation distortions and not against aspect ratio changes or shear, for example. For more information about watermarking in spatial and frequency domains, see paper by Hartung and Kutter [24].

3.2.3 Wavelet domain based methods

From the Fourier transform, we know that most of the signals can be expressed as a series of sines and cosines. The Fourier transform is an efficient way to analyze a signal, but even if we get to know all the frequencies in a signal, we would not know when the frequencies are present. The solution for this is to divide the signal into small segments and analyse them separately. After that, we have some kind of knowledge on when and where the frequencies appeared, whereas dividing the signal we come up against Heisenberg’s uncertainty principle, which states that it is not possible to determine the exact frequency and the exact time of occurrence of frequency in a signal simultaneously. The problem seems to be unsolvable, but the wavelet transform offers a possible solution. The wavelet transform employs a fully scalable modulated window, which is shifted along the signal and the spectrum is calculated for every position. The

30

process is then repeated multiple times with a slightly different length of the window. The final result is a collection of time-frequency representations with different resolutions of the signal, the so-called multiresolution analysis. [25]

The discrete wavelet transform of f(m) is usually defined as

∑−

=

∗ −=1

0, )()(),(

N

m

s mfms τψτγ τ . (32)

where, the * corresponds to a complex conjugation. The formula describes how the function f(m) is decomposed into a set of basis functions )(, xs τψ , called the wavelets.

A set of wavelet basis functions, { )(, xs τψ }, can be generated by translating and

scaling the basis wavelet )(xψ as

=

sss

τψτψ τ

1)(, (33)

where the s is a scale factor, τ is the translation factor and the single basic wavelet

)(mψ is the so-called mother wavelet. [25]

One of the first wavelet transforms and probably the most applied transform is the Haar transform which was invented before the term wavelet. It is the simplest possible wavelet and its wavelet function is of the form [25]

<≤−

<≤

=

timesotherat

x

x

x

,0

15.0,1

5.00,1

)(ψ , (34)

Wavelet domain is neither translation nor rotation invariant, but it is often used

because of the many advantages it has compared to other domains. In Fourier domain, the transform applies sinusoidal waves as basis functions and thus the Fourier transform is only localized in frequency. By contrast, wavelets are described as waves with a limited duration and therefore are localized in both time and frequency. This space-frequency representation is good at localizing image features, such as edges and textured areas which might be neglected while working in the Fourier domain. [22, 26] Another main advantage is wavelet domain’s superior HVS modelling capabilities compared with other domains. A reason for that are the similarities of the wavelet transforms to the multiple channel models of the HVS. The frequency decomposition of the wavelet transform resembles the signal processing of the HVS, so that both of them divide the image into frequency channels that respond to an explicit spatial location, a limited band of frequencies and a limited range of orientations. [26]

Wavelet transform of an image is also usually fast to calculate. This is due to a low linear complexity O(n) compared for example with DCT (Discrete Cosine Transform), applied over an entire image, which has complexity of O(n*log n). Transmitting of a transformed image is also fast due to the multi-resolution

31

representation of the image because hierarchical processing can be done in a straightforward way. [26] Normally, wavelet watermarking methods are categorized by the wavelet coefficients in which the watermark is embedded and especially between approximation coefficients which contain the low-frequency information and other coefficients, that is, the detail sub-bands that represent the high-frequency information in horizontal, vertical and diagonal orientation. These detail sub-bands are shown in Figure 14. [26]

Figure 14. a) Wavelet coefficients calculated with Haar function in Matlab. b) Structure of the wavelet coefficients in the image a).

Barni et al. [27] embedded a binary pseudorandom sequence in the DWT (Digital Wavelet Transform) coefficients of the three largest detail sub-bands of the image, that is, vertical detail (LH1), horizontal detail (HL1) and diagonal detail (HH1), by using visual masking so that the watermark could be embedded with maximum energy. The watermark was detected by using a correlation between the marked wavelet transform and the watermarking sequence. The detection results obtained were really good while dealing with image cropping, because the watermark energy could be kept as high as possible for the similarity of DWT decomposition to the models of HVS and therefore even small portions of the image were sufficient to correctly guess the embedded code.

Watermarking in wavelet domain resembles watermarking in spatial domain. Many of the basic techniques and methods used in spatial domain can be also employed in the wavelet domain. Another aspect to be considered when designing a watermarking system in wavelet domain is the upcoming JPEG2000 standard, which works in wavelet domain. That is, however, only one of the reasons why the wavelet domain appears to be so attractive right now.

3.2.4 Methods in other domains

Spatial, Fourier and wavelet transforms are not the only transformation domains that have been used in the field of digital watermarking. Many other domains have been

32

researched and their qualities investigated and exploited. Most of the other domains, however, are variations or extensions of well-known Fourier or wavelet domains. Hadamard transform is an example of generalized class of Fourier transforms. The difference between these two transforms is that the basis functions of the Hadamard transform are variations of a square wave rather than sinusoid. The Hadamard transform has only 1 and -1 as elements in its kernel matrix and the simplicity of the Hadamard transform is a significant advantage in processing time over some other transforms.

Quite a few researches have been published concerning Hadamard transform in watermarking and one of them is a method proposed by Gilani and Skodras [28]. In their technique, the watermark is embedded in the perceptually most significant spectral component of an image. The image is first Haar wavelet transformed and then the lowest frequency band is Hadamard transformed. The result is then zigzag scanned and the watermark is embedded in those coefficients. The extraction method is fairly similar but the zigzag scanned coefficients are cross-correlated with the watermark generated by a secret key. Some research has also been done in Gabor [29] and Fresnel [30] transform domains, but the research in these domains has not evolved a great deal. The methods developed are not robust enough against print-scan attacks or geometrical distortions. Another promising domain is the fractional Fourier domain, which is the generalization of the classical Fourier transform Error! Reference source not found.. Not much watermarking research has been done on this domain because the idea of it is similar to that of wavelet domain and so it remains to be seen if it is a suitable domain for watermarking. Lot of Discrete Cosine Transform (DCT) based watermarking algorithms have also been proposed. Some of them use block based algorithms, where the image is divided into blocks and the watermark is embedded in those blocks. These methods, however, are not generally robust against geometric transformations and are not examined here. There exist various transform domains of which only some are studied with watermarking. All of them have good or even superior qualities but also some side effects. Therefore, if a transform domain is to be used, it must be selected carefully and the properties of the environment where the watermark is used must be kept in mind.

3.3. Multiple watermarking

Multiple watermarking means in short embedding more than one watermark in the image. Lähetkangas [32] studied the problem of multiple watermarking in her Master of Science thesis. She studied cases where there were multiple watermarks and multiple users who wanted to embed information in digital images. To analyse different multiple watermarking techniques, she developed a new classification system for the multiple watermarking methods. The previous method classification system was developed by Sheppard et al. [33]. They divided the multiple watermarking methods into three classes: re-watermarking, segmented watermarking and composite watermarking.

33

In re-watermarking, the multiple watermarks are embedded by adding them one by one on top of each other. This method is fast and simple but it can also be used as an attack in some circumstances. The lastly embedded watermark can destroy the previously embedded watermark and thus one must be careful while choosing the watermarking methods. Another drawback of this method is that every embedded watermark decreases the quality of the watermarked image and consequently the PSNR value also drops. [33] Another way to embed multiple watermarks in an image is to divide the image into segments and embed watermarks each in its own segment. This is called segmented watermarking, and it does not degrade the image more than embedding only one watermark. It has some limits, however, because when the amount of segments rise, their size decreases and watermark embedding to smaller segments becomes harder. [33] A third way to use multiple watermarking is to use composite watermarking by building a composite watermark from a collection of watermarks. The watermarks can be for example pseudo random sequences that are combined and then embedded in the image as usual. The composite watermark will be separable if the different watermarks are orthogonal or, like in the case of pseudo random sequences, uncorrelated. [33] Lähetkangas motivated her work with value-adding watermarks in addition to DRM (Digital Rights Management) problems and discussed about the multiple watermarking hiding methods from various points of view. She, too, divides the multiple watermarking methods to three classes, but this time the classes are the basic algorithm, methods to divide watermarking space and multiple watermarking hiding techniques. [32] The basic algorithm is a watermarking algorithm that is applied to hide one watermark once. The basic algorithms used in multiple watermarking form the basis and set limits for performance. It is possible to take advantage of the properties multiple basic algorithms in multiple watermarking. [32] Instead of classifying re-watermarking and segmenting watermarking separately [33] Lähetkangas [32]combines them under methods to divide watermarking space. She claims that they define if the multiple watermarks are embedded in the content over each other or in parallel with each other. The third class of the Lähetkangas’ classification system is the multiple watermarking hiding techniques. They define the order in which the watermarks are embedded and who is embedding the watermarks. In some applications there might be several users who want to embed watermarks to prove ownership and rights to use. For example, the creator of the image may want to embed creator information in the image, whereas the distributor of the content may wish to embed copyright information in the image. Some information might be protected and therefore everyone cannot be allowed to access the embedded information as is. [32] Most commonly the multiple watermarking algorithms are applied to enhance the robustness of a watermark, but the development of the digital world has brought new application areas. In the digital world, the media should be playable on different platforms and devices and with different programs. The multiple watermarking techniques can be applied to help the adaptation of the content to the various environments by embedding watermarks in the content, each of which can contain information about the functionality of the content, settings required and programs needed [32].

34

4. COMMERCIAL DEVELOPMENT Although digital watermarking has been around only for a little while, some commercial applications have been initiated, Digimarc being probably the most famous one. Here some of the commercial applications in value-adding watermarking are introduced.

4.1. Digimarc

Digimarc Corporation is based in Beaverton, Oregon, but it has international offices also in London and Mexico. Digimarc is a developer of digital watermarking solutions and it offers security and brand protection solutions to global corporations and government entities. Although Digimarc has focused mainly on digital rights management issues, it has launched a different kind of initiative based on watermarking to enhance mobile computing and commerce. [34, 35] The goal of the Digimarc’s initiative is to provide a service for camera phone users to navigate from printed materials to a URL for a website with one click. That is, the printed material contains a watermark that can be read with a camera phone. The phone then recognizes the image and sends it to Digimarc’s registry to determine what to do with it: whether to direct the user to some website or to an e-commerce application. The registry contains information about the user, for example how he wants to pay his purchases and so on. To be able to use the service, the user must have a downloaded Digimarc’s client to his phone. [35] The Digimarc has acknowledged the problem that unlike barcodes an invisible watermark is not apparent to the naked eye. This causes problems in how to let the consumers know about the watermark in the materials. The Digimarc’s solution to this is, at least initially, to partner with an e-commerce or catalogue company. The users of a catalogue are assumed to be comfortable in using a device to select catalogue items. [35] The Digimarc’s initiative has roused interest at least in Japan where MediaGrid has licensed Digimarc’s technology. In July 25, 2006, Digimarc announced a launch of a digital watermarking pilot in Japan in “Amusement Café Maid in Japan” café. The idea of the pilot is to offer customers with a camera phone possibility to interact with digitally-watermarked print materials. The materials may contain links to online content such as a theme-oriented city guide or a mobile phone wallpaper featuring favourite characters. The pilot was rolled out by MediaGrid and Success Corporation, a leading developer of games and video games in Japan. [34]

4.2. CyberSquash

CyberSquash is developed by NTT (Nippon Telegraph and Telephone Corporation) Cyber Solutions Laboratories. It is defined as an Internet Access Platform that makes use of watermarking technologies. In this system, a watermark, indicating an URL for a desired homepage, is embedded in a printed image, which can be read with a

35

web camera or a mobile phone with an i-appli digital camera. The image is then processed and the user is directed to the specified homepage. [36]

There are two types of CyberSquash software that are used for reading the watermarks: active-X version and i-appli version. The Active-X version is developed to read watermarks with a Web camera on a PC and i-appli version is developed to read watermarks with a mobile phone equipped with a digital camera. The i-appli version works only in NTT DoCoMo’s i-mode mobile phones and it is created in Java programming language. [36] In CyberSquash the watermark is embedded in the image in four phases: first error correction coding is applied and the received code is modulated by using Direct Sequence Spread Spectrum (DS-SS) modulation. The modulated code is then permuted with pseudorandom sequence to reduce the imbalance of robustness among bits. In the third step, the modulated and interleaved code is embedded in the image by applying two dimensional pattern modulation in small blocks, as illustrated in Figure 15, where the patterns are two, two dimensional sine curves with 90° rotational symmetry. The actual embedding is done by multiplying the watermark pattern with an embedding strength factor and superposing the original image on the watermark pattern. Adaptive pattern superposition can be also employed to improve the balance between the image quality and the robustness of the watermark. [37]

Figure 15. 2D Pattern modulation in CyberSquash application.

The method presented above is not robust against geometric distortions and therefore the writers placed a frame around the watermarked image to recover synchronization. The frame also works as an indicator showing that the watermark has been embedded. After the image has been read with a camera, the frame is recognized and the four corner points are located. Out of these locations parameters of the affine transform and scale can be determined. The parameters determined are more like approximations than exact values and thus the corrected image may contain small geometric distortions. The embedded watermark is designed such that it is robust against such small distortions. [37]

36

In the watermark detection, the scaled and geometrically corrected image is filtered with a pre-processor to increase the robustness of the watermark. After filtering, the image is divided into small individual blocks and the energy of the frequencies corresponding to the two sine curves is calculated on each block. By calculating differences between two energy levels the sign of the embedded sequence is obtained. When the embedded sequence is determined, the sequence is de-scrambled with pseudorandom permutation and demodulated. The last step is to put the sequence through an error correction process. [37]

The CyberSquash trial was initiated in 2003 and was planned to be around for six months. However, after the trial, the CyberSquash application has disappeared from the news headlines and it seems that the development of it has stopped.

4.3. Bar codes

Bar codes are not really watermarks, but the application areas of the bar codes are so similar to those of watermarking that it is necessary to introduce them. The first section contains a short history of bar coding and some description about the applications where they are used. The second section tells how the bar codes work.

4.3.1 Short history

Bar codes are usually thought as a rival of watermarking in the field of value-adding services, because they can be used in similar applications. There are, however, many applications where either one is clearly more suitable. For example, in catalogues where space is scarce, watermarking is clearly a better solution than bar codes. On the other hand, in advertising where the user needs to be informed separately about the extra data included in the image, a bar code might be more suitable than watermarking. Bar codes were invented on 1949 when a young graduate student, Joseph Woodland, draw idly some dots and dashes on the sand. He was trying to figure out how to read automatically information about a product and he knew that Morse codes were the key to solve the problem. While lying on the beach he finally understood how it should be done and so the idea of bar codes was created. [38] Joseph Woodland and his partner, Bernard Silver, received a patent on bar codes in 1952 (US Patent 2,612,994) but it was not a rapid commercial success. Although the idea was ready for commercial world, the technologies that were needed in the bar code scanners were expensive or yet to be found. It took fifteen years before the first commercial use of bar codes and it was at mid seventies when the bar codes finally came in to the stores. This was enabled by the invention and development of lasers and integrated circuits which came affordable in the 1960s and made the bar code scanners simple and profitable. [39] One of the first standards created was the UPC (Universal Product Code), now officially as EAN.UCC-12, (International-Uniform Code Council), which is still in use in USA and Canada. In the early 1970s, US grocery industry was trying to find a way to reduce costs. They reasoned that automating the grocery checkout process

37

could do this, and after two years’ effort they announced an UPC and UPC bar code symbol on April 1, 1973. First item bought by using this system was a package of Wrigley's gum sold in Marsh's Supermarket in Troy, Ohio on June 26, 1974. [38] Nowadays, bar codes are used in multiple ways and even some programs have been published for camera phones that read bar codes. The critics who favour watermarking over bar codes claim that the bar codes are ugly and require extra space. However, right now the bar codes can contain more information that watermarks and they are more robust in mobile applications. Technology has surely advanced from the dates when the first bubblegum packet was sold with bar codes.

4.3.2 Operation

Bar codes are often described as a machine-readable representation of information printed on some surface. The traditional bar code consists of bars and spaces of alternating diffuse reflectivity, usually black and white parallel stripes, as illustrated in Figure 16. The bar codes in the figure were generated with a bar code online demo [40]. The information in the bar codes is encoded to the bars and spaces along one dimension, horizontal, and therefore the vertical height of the bar code has no specific meaning. It only makes the reading of the bar code easier. [41]

Figure 16. Examples of UPC-A bar codes. There are two main ways to encode information to bar codes. The first one is to divide the piece of code into 1’s and 0’s and then paint 1’s with black as bars and 0’s as spaces as in Figure 16, in which is an example of UPC-A codes. The second way is to use width coding, that is, assign each bit to a bar or space and make that element wide if the bit is 1 and narrow if the bit is 0. For example, bar code standard Code 39 is a width code. [41] The encoded information can be read by various technologies. The most common ones are cameras and lasers and although the technologies are different, the idea is the same: when the scanner reads a bar code, it detects only reflections of light. The black stripes will not reflect any light whereas the white stripes reflect most of the light back. [41] As the technologies evolved, it was realized that the traditional one dimensional bar codes were not good enough. Data Matrix is a 2-dimansional bar code standard consisting of black and white dots as in Figure 17. The Data Matrix code includes four basic elements: two solid-line locators, two synchronization lines, data area and a quiet zone. The data area contains, obviously, the encoded binary data, whereas the quiet zone is the empty narrow area around the data matrix. The two solid-line locators are solid perpendicular lines that indicate the data area boundaries and the orientation of the data matrix. The two synchronization lines opposite the solid-line locators indicate the sample modules. [42]

38

Figure 17. Data Matrix bar code encoding “MediaTeam”. Bar codes have been used in practically any imaginative application. They have been embedded into groceries, aeroplanes, cars, images and even into fashion and tattoos. They have been used in monitoring movement, merchandise and tracking of objects. Bar codes have been around for a long time, but their story is far from over.

4.4. Mobot

Mobot does not use watermarking technologies for offering value adding services, but it is worth mentioning for the wide publicity it has received in USA. Mobot does not require any kind of barcode, logo or special symbol, nor does it need any kind of client software in the mobile device [35]. Instead of watermarks, Mobot’s solution is based on image recovery, pattern recognition and image matching capabilities. This enables Mobot to support all camera phones in the marked regardless of camera accuracies. [43] The user needs only to snap a photo with his/her camera phone of the interesting ad and send it to Mobot server which then analyses the image and in turn sends the user whatever the advertiser wants him/her to receive. The data user receives could be for example a coupon, a giveaway or additional information about the product but for consumers to actually receive the giveaways and offers from the advertisers, they must first register with the company. All this is already in use in the Jane magazine, a magazine for young women, which has launched promotion “Jane talks back”. [35]

4.5. Sanyo

A Japanese electronics company, Sanyo, too, has done some research on watermarking. Takeuchi et al. from Sanyo Electric Co., Ltd. proposed a method to read watermarks from printed images with a camera phone. The actual watermark is embedded in their method with guided scrambling (GS) techniques. Unlike many other watermarking methods that are tested with cameras, the method by Takeuchi et al. compensates also the radial distortions. The coefficients of the correction model are calculated by using a chessboard calibration pattern as a preliminary work. It was assumed in the paper that the coefficients would not change between phones of the same model and a database of the coefficients that could be referred by a product name of the camera phone and focal length during photo acquisition was created. The perspective distortion was compensated by calculating four corners of the image. Unfortunately the more specific publications of this method are written in Japanese and the information is not thus available for the international audience. [44]

39

5. PRINT-SCAN RESILIENT WATERMARKING Printing and scanning of an image produces a set of distortions to the watermarked image, as explained in section 2.3.3. In this chapter, a watermarking method is proposed which is resilient to print-scan attacks. The block diagram of the watermarking system is shown in Figure 18. The proposed method consists of three parts: three separate watermarks. The first watermark is embedded in the frequency domain to recover the image from rotation and scaling attacks. The second watermark is embedded in spatial domain to recover from translation attack and the third watermark is the multibit message which is embedded in the wavelet domain. The last two sections of this chapter discusses about the experiments done and results achieved, respectively, to validate the use of the method. Every watermark embedded can be considered as an attack against formerly embedded watermarks. The order in which the watermarks are embedded is thus carefully chosen but it could be any other. Here, the multibit message is most fragile of the three watermarks and therefore it is embedded last.

Figure 18. Block diagram of the proposed print-scan robust method.

5.1. Frequency domain template

Fourier domain has an advantage over other domains concerning watermarking but this may also be its drawback: invariance to translations. This property is used here for determining the amount of rotation and scale, because it is a lot easier to find the watermark from Fourier transform domain when translation need not to be worried about. The translation invariance forces us to find a different method for determining the amount of translation, which is introduced in section 5.2. The next section explains the embedding process, the second the extracting process.

5.1.1 Embedding

The template watermark is embedded in the magnitudes of the Fourier domain. The first thing to do before embedding is to transform the luminance values of an image to the Fourier domain which results in two Fourier images, real and imaginary parts. In Fourier domain representation, low frequencies are located to the corners of the transformed image. Before processing the image the low frequencies are moved to the centre and then the magnitudes of the transform are calculated.

Embed template in Fourier domain

Extract the message

Invert rotation and scale

Invert translation

Host image Embed template in spatial domain

Embed multibit message

Taking a

picture

40

Template After the magnitudes of the Fourier transform have been determined, the template is embedded. To recover from rotation and scaling distortions, a template is embedded in the magnitudes of the Fourier transform of the image in a somewhat similar manner to the method by Lee and Kim [23], where a pseudorandom template sequence of length 180 bits is embedded in the middle frequencies of magnitudes of Fourier transform. The template of a pseudorandom sequence of 1’s and 0’s is embedded in the middle frequencies of the image in a form of a sparse circle. This process is illustrated in Figure 19. The points in the figure are exaggerated to make them visible to the eye in printed material.

Figure 19. Embedding a pseudorandom sequence in the Fourier domain of an image.

The template is symmetrical around its origin because the magnitude component of Fourier transform has the origin of symmetry. Every point on the circle is added to the Fourier domain at an angle π/20 from each other. The value of π /20 is chosen for convenience but it could be different. The values of the pseudorandom sequence that differ from 0 form peaks to the Fourier domain when embedded. Therefore all the points of the pseudorandom sequence are not visible but the 0’s appear as gaps in the circle. The strength at which the values are embedded varies with local mean and standard deviation. This is because the embedding strength should clearly be larger close to the low frequencies, where, in general, are the highest values of the Fourier transform.

The decision to embed values in the middle frequencies is a compromise. The low frequencies of a Fourier transform contain most of the energy in an image. Therefore all the changes made to the low frequencies are highly visible in the image and especially so because the watermark signal should be embedded very strongly so that the energy of the image would not be overwhelming. On the other hand, the high

41

frequencies are very vulnerable to various kinds of attacks, for example to the JPEG compression.

The result of the watermark embedding process can be seen in Figure 20 where a small magnified piece of image is shown on the upper left corner of each of the images. The magnified portion of the image shows more clearly the effect of watermark embedding than the image itself. Some of the variation in the quality of the image will be flattened during the printing process and thus it is possible to embed the watermark more strongly that it would be when distributing the image in digital form.

a) b) Figure 20. a) Original image b) original image after embedding the watermark.

When embedding the watermark peaks in the magnitudes of a Fourier transform,

the watermark spreads over the entire image. This fact enables the watermark to be robust against slight cropping. Cropping, on the other hand, inflicts noise in Fourier domain, but if the template is embedded robustly enough, it will hold through it.

5.1.2 Extracting

Figure 21 shows the image after the print-scan process which has rotated and scaled the image. To find the embedded template from the scanned image, the image is first padded with zeros to its original geometry, a square. If the image is not padded with zeros beforehand, the template circle would be stretched to an oval and the extraction process would be more difficult.

The extraction of the template from the Fourier transform domain is mainly about locating the peaks. From here on we can think of the image itself as noise and the watermark as the information to be preserved. To find the hidden information, we must first filter out the noise, that is, the image.

42

Figure 21. The watermarked image after print and scan process in which some distortions have occurred.

Wiener filtering The first thing to do after calculating the magnitudes of the Fourier transform is to use Wiener filtering. The Wiener filtering removes some of the noise and helps in finding the peaks. To find the peaks, the Fourier transform of the image is Wiener filtered and the filtered transform image is subtracted from the distorted transform image. Wiener filter minimizes the mean square error between an estimate f̂ and the

original image f

( ){ }22

f̂fEe −= . (35)

Wiener filter is usually defined in the frequency domain with a formula

),(/),(),(

),(*),(

2vuPvuPvuH

vuHvuG

fn+= , (36)

where Pf(u,v) and Pn(u,v) are the power spectra of the original image and noise respectively. In the formula, Pn(u,v)/Pf(u,v) can be replaced with a constant, which can be approximated roughly beforehand. [22] Finding peaks by cross-correlation The Wiener filtering helps in finding the peaks of the template from the noisy environment and so the template can be found by using cross-correlation. To reduce

43

the noise, the Wiener filtered image of the Fourier transform magnitude domain is further thresholded before applying cross-correlation. The thresholding is applied so that a point is selected as one if the local mean around the point exceeds certain predefined limit.

There are two things that we know about the template: the pseudorandom sequence and that the template is shaped like a circle around origin. What we want to know are the value of radius and the angle of rotation. The searching of the rotation and scale factors is processed in two phases. In the first stage, a rough estimation of the rotation angle and scale factor is determined and in the second, finer results are achieved. Since the Fourier transform magnitudes are invariant to shifts in spatial domain, it is enough to search for the template circle around the origin. To find the circle, every possible radius must be searched. This could very easily lead to an exhaustive search, but because the image is in digitized form, it is enough to search first only integer valued radii and find out the exact value later in the second stage.

It is not needed to examine all the radii because at the low frequencies the ‘noise’ from the image is overwhelming. When calculating a cross-correlation of pseudorandom sequence and a highly noisy signal, the result may show a high correlation between the two signals even if there is none. Therefore some of the low frequency radii can be discarded and the search area resembles an annulus between two predefined frequencies f1 and f2 as in the paper by Pereira and Pun [10].

The detection of the template circle is performed as follows: first a radius is selected and a one dimensional sequence corresponding to the radius in the Fourier transform is extracted as in Figure 13 [23]. The sequence is cross-correlated with the pseudorandom sequence by using a cross-covariance function which is related to cross-correlation. The cross-covariance can be defined as a cross-correlation of mean removed sequences

( )

<−

≥

−

−+

= ∑ ∑∑−−

=

−

=

−

=

0,

0,11

)()(

*

1

0

1

0

**1

0*

mmc

myN

yxN

mnxmc

yx

mN

n

N

i

ii

N

i

i

xy, (37)

where x is a sequence of the image at some radius with length N and y is the pseudorandom sequence interpolated to the length N. The maximum of the resulting cross-covariance is saved to a vector. After the integer radii between frequencies f1 and f2 are examined, the maximum is selected from the vector containing maximums of the cross-covariances, which is shown in Figure 22. When a rough estimation of the radius of the template circle is found, the locations of the template peaks are extracted. The peaks are found by examining the space at wide ±2 pixels around the previously found radius. Every point at this space is examined by calculating a local mean about the point and deciding whether the point is a peak high enough or not. The point is selected to be a peak if the value in that point is 3 times bigger than the local mean and if the peak is a maximum on that area.

The difficulty of finding the peaks is obvious when looking at the magnitudes of the Fourier transform as in Figure 23. The points are sharp and clear in the original image but stretched and spread in the distorted image. The low frequencies are strongly visible in the distorted image because some of the white scanned background of the image is still in on calculations, whereas the original image contains only the image and no scanned background.

44

Figure 22. The vector containing maximums of the cross-correlations.

Figure 23. Magnitudes of the Fourier transform of a) distorted image b) original image.

Determining rotation and scaling parameters After the peaks are found they are transformed into polar coordinates and divided into π /20 segments accordingly to their angle. Extra points can be discarded inside these segments, because we know that the points should be at angle ~π/20 from each other. The resulting piece of signal is then cross-correlated with the embedded pseudorandom signal and the maximum of the cross-correlation signal shows the amount of rotation in a multiple of π /20. For example, if a cross-correlation peak is at a, the amount of rotation is roughly a* π /20. After the rotation has been found in a multiple of π i/20, a more accurate value can be determined. The angles of the peaks are subtracted with the original angles of the

45

embedded pseudorandom sequence and the value for rotation is thus received by taking a median from the resulting values.

The scale factor is calculated by taking a trimmed mean from the radii of the peaks and dividing the value with the original radii of the embedded pseudorandom sequence. When the scaling and rotation parameters are found, the distortions can be inverted with matrix operations explained in section 2.3.5. The image after rotation and scaling is show in Figure 24.

Figure 24. The image after correction of rotation angle and scaling. The algorithm for the extraction method is as follows: 1. Pad the image with zeros to its original geometry 2. Calculate the Fourier domain magnitudes 3. Apply Wiener filtering to remove noise 4. Find out a rough estimate of the rotation angle and scale factor

4.1. Apply thresholding 4.2. Search through integer radii between frequencies f1 and f2 with cross-

correlation between a radius and the pseudorandom sequence used in embedding

4.3. Select the radius with the maximum cross-correlation value as the radius of the circle

4.4. Calculate cross-correlation between the sequence at selected radius and pseudorandom sequence

4.5. Select the location of the cross-correlation peak as a rough estimate of the rotation angle

5. Refine the estimate 5.1. Calculate the exact locations of the template peaks with the rough estimates 5.2. Take a median from the angles of the peaks to get the angle of rotation 5.3. Take a trimmed mean from the radii of the peaks to get the amount of scale

46

5.2. Spatial domain template

Translation, that is, how far from its original place the watermark has shifted, describes the location of the watermark in the image. Locating the watermark is not so easy a task as it may appear at first sight. The watermark has probably been rotated and scaled and those transforms must be inverted before the watermark is located accurately. Also, there is no point in rummaging the whole image around while searching the starting point of the watermark, but we should be able to restrict the search somehow. Here a separate watermark has been embedded in spatial domain to serve as a template. Locating the watermark is now faster than the exhaustive search but sets its own impact to the imperceptibility of the watermark load.

5.2.1 Embedding

The watermarking system should be robust against translation attack but the Fourier transform magnitudes are invariant to translations. Therefore, another watermark is needed to recover the image from translation attack. Template The template watermark for recovering from translation attack is embedded in the spatial domain and the shape of it is shown in Figure 25. The template consists of two similar parts, one for the horizontal translations and the other for the vertical translations. A template part, either horizontal or vertical, is build with a small pseudorandom sequence of size 127. The sequence is an m-sequence and the length of the sequence is carefully chosen to be robust enough. A longer sequence would be sensitive to small rotations but, on the other hand, a shorter sequence would be difficult to find with cross-correlation.

Figure 25. The template embedded in the image in order to recover the message watermark from a translation attack.

47

The m-sequence is repeated across part of the image, separately horizontally and vertically. The horizontal pattern is formed as follows: The first line is embedded in a suitable row so that the final pattern is similar to that in Figure 25. The second line is an exact copy of the first line, but the third and fourth lines are skipped. This is done because the pattern should not be visible to human eye in the final image and, if all the lines are used, the periodical pattern of the template shows. The fifth line is otherwise similar to the first line, but the m-sequence is shifted to the right by 2 pixels. Therefore the pattern seems to be oblique. The vertical pattern is similar to the horizontal case, but the lines are columns instead of rows.

5.2.2 Extracting

The cross-correlation is used here to extract a watermark from spatial domain but the problem is accuracy. Quarter-pixel interpolation Trying to find the template pattern from an image will not always end as expected, because of all the geometrical distortions discussed in section 2.3.3. The main idea here is to calculate cross-correlation with the embedded m-sequence and every other row or column, shift the value with a suitable amount and add all the results together. The reason for this is that the averaging of the cross-correlation results diminishes noise.

From the two resulting sequences, a sequence for each rows and columns, it is possible to see how much the image has been translated to each direction. The problem with this is the fact that the amount of translation can be determined only with accuracy ±1 pixels, but in real world the image may be shifted only ⅛ pixels, for example. An error of ½ pixels may very well destroy the reading of a watermark. Therefore interpolation methods are applied for achieving better precision. Here a quarter-pixel interpolation is applied by doing half-pixel interpolation twice and a bilinear interpolation is applied to determine the values at the midpoint between the pixels. Bilinear interpolation of a point is calculated by taking the closest 2x2 neighbourhood of known pixel values surrounding the unknown pixel value. The unknown pixel value is then calculated with a weighted average of the four surrounding pixel values, as illustrated in the Figure 26. If all the distances from the known pixel locations to the unknown pixel are equal, the interpolated value is simply the sum of known pixel values divided by four.

Figure 26. Bilinear interpolation of an unknown pixel value.

(x1, y2)

(x1, y1)

(x2, y2)

(x2, y1)

(x, y)

48

Determining translation parameters The determination of the translation parameters requires some processing power. The rotation and scale corrected image is interpolated to a quadruple of its size by determining the three values at equal distances between known pixels. Also, the embedded m-sequence is interpolated to quadruple of its size for cross-correlation. The extraction of the watermark is done in two phases, separately for horizontal and vertical translations. Unfortunately, finding the translation parameters is not straightforward, but the determined values must be suitably combined before the values for the horizontal and vertical translations are found. The cross-correlation with the embedded m-sequence is calculated for every other row and the results of the cross-correlations are shifted with two and added up to a vector, so that the possible cross-correlation peak is strengthened. The amount of translation can be calculated from the location of the peak because we know the size of the original image and the place where the translation template should be. In Figure 27 there is a filtered plot of the resulting cross-correlation sequence.

Figure 27. Cross-correlation image of the horizontal template used for determining the amount of the translation in the image.

In Figure 28, there is an example of the template extraction process where the x1, x2, y1 and y2 are unknowns. The original size of the image was 512x512 and the template watermark was embedded between lines 64 and 192 as explained in preceding sections. After the print-scan process, the image is distorted and there are borders around the image, resulting from an incorrect cropping while scanning the image.

The image size is somewhat larger than the original image size due to the background cropped along the image. In Figure 28 the rotation and scale have been corrected from the distorted image and it can be assumed that the distorted image now contains the watermarked image in its original size, but we just do not know its exact location.

To be able to determine the location of the image, the unknowns in Figure 28 should be solved. The image is first interpolated to achieve an accuracy of 0.25 pixels. After interpolation, the amount of translation can be determined by calculating a cross-correlation between the embedded and interpolated m-sequence and every other line. There is no need to calculate cross-correlation with every line

49

Figure 28. The translation template after spatial shift. because the template is always similar in interpolated image in eight succeeding lines. In the original image, the template is similar in two succeeding lines. The cross-correlations are added up to each other, by first shifting each line with one more than the previous line and then summing it to previous results. The shifting is done so that the possible peaks that declare the location of a template line in the cross-correlation sequences are added up to each other and so strengthened. The same process is repeated for the vertical dimension, resulting in two cross-correlation sequences and two peaks.

The locations of the peaks will not tell the translation parameters right away because of all the shifts and add ups. To find the translation parameters, we must remove the effect of the image from the locations of the cross-correlation peaks. In this example, it means that we take the locations of the peaks in the two cross-correlation sequences and subtract the known location of the translation template in the original image. That is, we add up 192, the location of the template on a first template row, and 127, the length of the template, and the number of shifts before the first template line, that is 32, because the cross-correlations are calculated with only every other line. All these values are multiplied with four, because the image has been interpolated to the quadruple of its size and subtracted from the locations of the two peaks. This way, we find two values that include the translations.

The two values are not enough to find four unknowns, however. Therefore we calculate two more values from the other end of the cross-correlation sequences. That is, we take the length of the sequences and subtract the locations of the peaks from those. These new values can then be processed as above and two more values are received.

The four unknowns can be solved from the following equations.

50

1221

221

1

2121

121

2

4

3

2

1

yxval

yxval

yxval

yxval

+=

+=

+=

+=

(38)

where the val1-val4 are the values extracted above and x1, x2, y1 and y2 are the unknown translation parameters. The final image after translation is shown in the Figure 29. The image can now be extracted from the background and the actual value-adding watermark can be read.

Figure 29. Watermarked and print-scanned image after correction of translation, rotation and scaling.

A pseudo code representation of the extraction method is as follows: 1. Apply quarter-pixel interpolation to the image 2. Process horizontal part of the template

2.1. Calculate cross-correlation with every other row of the image 2.2. Shift every cross-correlation result with one more that the result of the

previous row so that the peaks are in a same line 2.3. Add all the results together 2.4. Find the maximum peak and calculate the distance from both ends of the

sequence 2.5. Remove the location of the template in the original image

3. Process vertical part of the template 3.1. Same as above but columns as rows

4. Solve the amount of translation from the received results

51

5.3. Wavelet domain multibit message

The wavelet domain is very sensitive to small geometrical distortions and all the geometrical distortions must be removed before the message watermark can be read from the wavelet domain. That is the reason why the template watermarks are used to remove the effects of rotation, scale and translation from the image. This section explains how the message watermark is embedded with a spread spectrum technique and extracted with a method based on a thresholded correlation receiver.

5.3.1 Embedding

Error-correction coding Before embedding the message watermark, the message was protected with an error-correcting coding and (15, 7) BCH (Bose-Chaudhuri-Hocquenghem) codes were chosen for their widespread use and simplicity. BCH codes are multilevel, cyclic, variable-length codes applied to correct multiple random error patterns and the BCH (15, 7) is especially able to correct two errors. The BCH codes are based on the idea of adding parity bits to the code word to check if any changes have occurred. [45] When calculating BCH codes Galois Fields are applied. Galois fields are called also finite fields because they contain only finite number of elements. For example a Galois field GF(q) is a field with q elements where q is a finite number. Every Galois field has at least one primitive element a such that every field element except zero can be expressed as a power of a. [45] The BCH design rule requires that there are twice as many powers of a as the error correction capacity t. If q=2m, where m is any integer ≥ 3, the elements of the field can be represented by polynomials whose coefficients are elements of the field GF(2), that is, 0 and 1. The block length of a such code is n=2m-1 and the error correction capacity of the code is t<(2m-1)/2. [45] The final code word consists of two parts: the message part and the remainder for checking the message. The remainder part, that is, the part that contains parity bits, is calculated with generator polynomial. Here, for BCH (15, 7), the generator polynomial is x8+x7+x6+x4+1 which has been calculated with Matlab. The length of the code word is then 15 and the length of the message is 7. The code word can be generated by multiplying the corresponding polynomial of the message word with generator polynomial.

Spread Spectrum Technique The technique used here is similar to some extent to that by Keskinarkaus et al. in [46]. The message bits that are protected with BCH code are embedded in the image in wavelet domain. The image is decomposed to the first level sub-bands using Haar wavelets and the watermark is embedded in the detail coefficients and while the approximation coefficients are left unmodified. In the [46] the watermark was embedded in approximation coefficients to gain better robustness, but here the

52

properties of the detail coefficients are employed, because they contain higher imperceptibility properties. Especially the horizontal detail coefficients are used. As in [46] the watermark is embedded with

=⋅−=

=⋅+=

0),()()(

1),()()(*,

**,

*,

**,

messagebitkmnYnY

messagebitkmnYnY

flfl

flfl

β

β, (39)

where Y* is an image which has already been watermarked with the templates in Fourier and spatial domain. )(*

, nY fl is the sub-band of Y* in the lth resolution level

and fth frequency orientation. )(**, nY fl is a new watermarked sub-band, where **

means that multiple watermarking has been applied. β is a scaling coefficient to control the embedding strength and m(k) is the m-sequence the length of which controls the chip rate for spreading. After the message has been embedded, the inverse wavelet transform is applied to the image. The amount of distortion and noise presented by the multiple watermarking is evaluated with eye and PSNR value. The results of the evaluation are presented in the upcoming sections.

5.3.2 Extracting

All the geometrical distortions must be corrected before the watermark can be read from the wavelet domain. After correcting the distortions the wavelet transform is applied to the watermarked image and the detail coefficients are divided into small segments of the same size as the m-sequence used for embedding the message. The message watermark is extracted by calculating a mean removed cross-correlation between the coefficient segment and the m-sequence. The result of the correlation is analyzed and the message bit is chosen to be 1 if the correlation value is above a certain threshold value and 0 otherwise.

5.4. Experiments and results

The image used in the experiments was the famous Lena image of size 512x512 pixels. The message was embedded in the image with spread spectrum techniques and before embedding the message was error coded with (15,7) BCH code which has error correction capability of 2 bits. After error correction the length of the message was 135 bits. After embedding the message and the template watermarks the image was JPEG compressed with different compression ratios. Compression ratios examined here were 100, 80, 60 and 40. The images used in the testing are included to the end of this work as Appendix 3. It was noticed that different printers give different printing qualities and thus for the experiments two printers were used. Most of the work was done with Hewlet Packard ColorLaserJet 5500 DTN printer but one image was printed with Hewlet Packard ColorLaserJet 4500 DN printer. The result of the latter

53

printer was significantly darker than the result with the 5500 DTN printer, as can be seen from the next image. All the images were of physical size of 10.3cm x 10.3cm.

The scanner utilized was Epson GT-15000 scanner and every image was scanned 50 times with 300dpi and then 50 times with 150dpi and saved to uncompressed tiff-format. The image was rotated randomly between separate scanning times and the rotation angle varied between -45 and 45 degrees. Also, the scanned image area was changed, that is, how much white was left around the scanned image.

a) b)

Figure 30. Lena image printed with different printers and then scanned. a) printed with HP LaserJet 5500 DTN printer, b) printed with HP LaserJet 4500 DN printer.

The quality of the image was tested with PSNR and PSPNR values after embedding the image and compressing it with JPEG. The results of the PSNR and PSPNR calculations are collected to Table 1. From the large values in the table it is possible to see that the embedding method works well and the quality of the images stay fine through the embedding process. It must be noted, however, that the printing process flattens somewhat the pixel values and the watermark will be even more difficult to perceive. This works for the perceptibility of the watermark but against the robustness of it. Table 1. PSNR and PSPNR after compression JPEG Compression Ratio PSNR PSPNR 100 39.5 57.6 80 37.3 49.9 60 36.5 47.7 40 35.9 46.0

The images were scanned with two different printing resolutions, but before extracting the watermarks, the image areas scanned with printing resolution of 300dpi were scaled to 25%. This was done to reduce the computational complexity and processing times. The results were gathered to two tables, Table 2 and Table 3.

The results of the reliability calculations of the extraction process are shown in Table 2. The table shows the calculated success ratios, that is, percentage of times when the message was extracted correctly. On the rows, there are different

54

compression ratio values for images before printing. The columns of the table show the results with two different scanning resolutions. Table 3 contains the average BER (Bit Error Ratio) values for different images. As in the previous table, Table 3 is organized with the same way: on the rows, there are different compression ratio values for images before printing. The columns of the table show the results for two different scanning resolutions. The BER was calculated from the received message before error correction by comparing the received bits to the embedded bits. Thus the BER value here represents the quality of the channel, that is, how many bit errors occur between printing and scanning. The value in the bracket indicates the average BER when the extraction process was not a success. Table 2. Success ratio with different compression ratios and scanning settings JPEG Compression Ratio

300dpi 150dpi averaged

(uncompressed) 86.0% 90.0% 88.0% 100 94.0% 94.0% 94.0% 80 92.0% 92.0% 92.0% 60 78.0% 82.4% 80.2% 40 62.0% 58.0% 60.0% 100 (4500 DN) 14.0% 21.6% 17.8%

Table 3. Average BER with different compression ratios and scanning settings JPEG Compression Ratio

300dpi 150dpi averaged

(uncompressed) 6.5% (33.5%) 4.5% (32.7%) 5.5% 100 4.6% (50.2%) 3.6% (30.2%) 4.1% 80 3.5% (20.8%) 4.0% (31.3%) 3.8% 60 9.1% (33.2%) 6.8% (27.9%) 8.0% 40 14.9% (29.5%) 11.4% (23.0%) 13.2% 100 (4500 DN) 19.6% (21.6%) 21.5 % (22.9%) 20.6%

5.5. Discussion

After combining the results in the two tables it is clear that the method is fairly robust against rotation, scale and translation attacks. The method is also robust against some JPEG compression, but more work should be done to improve the reliability of the method.

From the results in Table 2 it is possible to see that the success ratio decreases when the quality of the JPEG compression decreases. This was expected, but it was surprising how large the impact of selecting the printer actually is. By comparing the

55

first and last lines on Table 2, the importance of selecting the printer is visible. The first line shows the results when a JPEG compressed image with a compression ratio of 100 is printed with HP LaserJet 5500 DTN, whereas the last line shows the results for a similar image printed with HP LaserJet 4500 DN. The scanner used was the same for the both images all the time. The results show remarkable difference in the watermark reading reliability and therefore a remark can be made that also the printer quality should be considered when designing a watermarking system and not solely the scanner quality.

In this method the value-adding watermark was embedded in the detail coefficients of the wavelet transform. However, the degradation after JPEG compression of the image hits most severely to details in an image and so the high frequency wavelet detail coefficients may not work very well. Instead of embedding the watermark to detail coefficients it would be interesting to study if embedding the watermark to approximation coefficients would increase the reliability.

In the method of Keskinarkaus et al. [46], the watermark was embedded in the approximation coefficients with fairly similar method as used here. The robustness was only tested with degrees of -15 -15 in the method by Keskinarkaus et al., but in previously described method, the rotation angle was varied between -45 and 45 degrees and the method was found robust against the rotations. The method of Keskinarkaus et al. was however more robust against JPEG compression, where even success ratios of 100% were reported with compression ratio of 80%.

The results in Table 3 show that JPEG compression affects strongly to the BER when the compression ratio is equal or greater than 60%. With compression ratios equal or greater that 60%, the amount of bit errors is still manageable. If the compression ratio is smaller than 60%, the extraction of the watermark and correcting of the bit errors gets difficult.

While comparing the method with previous methods it can be seen that the method works very well. For example, the method works better in comparison with the block-based method by He and Sun [6]. In their experiments, they got BER values of 15%, while in the previously described method the BER values were most of the time under 5%. Not until the images were compressed beforehand with JPEG compression ratios under 60, did the BER values got worse. One reason for high differences between the methods is the fact that capacity is significantly higher in the method of He and Sun, which weakens the robustness.

The exact BER values are included in the end of this work as graphs to Appendix 2. From the graphs is possible to see that in most of the cases the message is not totally lost but been covered under noise. These kinds of messages could be saved with a stronger error correction coding. On the other hand, sometimes the BER closes to 0.5 and then the message cannot be read anymore. In this kind of situation the correction process of geometrical distortions has probably failed. It was noted that the bit error rate was acceptable when the JPEG compression ratio was above 60% and so quite a lot of improvement could be done by using better error correcting codes. Unfortunately the capacity of the method is not high and therefore the error correction codes should be carefully chosen in the future works so that the information rate would be optimal for the task.

56

6. PRINT-CAM RESILIENT WATERMARKING Print-scan robustness is a good requirement to begin with in watermarking systems but the number of applications is limited. The print-cam process would have a great deal more applications because many people carry around a camera phone in their pockets but the attacks are more severe. The print-scan process introduces many of the attacks that are present in the print-cam process but in a simplified form. For example, in the print-scan process the image may be translated in horizontal and in vertical directions in the scanned image, whereas in the print-cam process also the distance between the image and the camera varies. This three dimensionality of the problem makes the extraction of the watermark more difficult than in the print-scan process and different synchronization methods are required.

Here a frame is added around the image and a method for finding the corner points of the frame is proposed. With the corner points, the affine transformation parameters are determined to approximate and invert perspective transformations. The block diagram of the proposed system is shown in Figure 31. The method for embedding and extracting the multibit message is the same as in section 5.3. Unfortunately, the multibit message watermark is very sensitive against even small distortions and although the rotation, scale and translation are inverted with the affine transform the inversion process is not accurate enough and thus a more specific method is needed to find the exact amount of translation. Here the same method for determining translation is used as in section 5.2. Before extracting any of the watermarks the barrel distortions are inverted with the Camera Calibration Toolbox [8]. All of the pictures taken with a camera do not contain lens distortions, but in some cameras they are so severe that their correction cannot be neglected. The lens distortions such as barrel distortions occur due to the lens properties and therefore the parameters for correction transform need to be calculated only once. The parameters can be determined beforehand with a reference image as explained in section 2.3.4.

Figure 31. Block diagram of the proposed print-cam robust method.

6.1. Frame detection method

Not much research has been done on the field of reading watermarks with a camera phone but this is no wonder - for the camera phones have been around only from the

Embed visible frame

Extract the multibit message

Inverting barrel distortions

Process frame information

Determine the amount of translation

Host

image

Taking a picture

Embed multibit message

Embed translation watermark

57

year 2000 when the Sharp corporation announced the first camera phone ever. Only during the last few years, have the camera resolutions grown high enough for watermark detection and the first commercial applications have been invented as explained in the chapter 4.

The problem of reading watermarks with camera phones is somewhat similar to that of the print-scan process, but the biggest difference is the extra dimension, the effects of which need to be considered. While in a two dimensional problem we examined a planar surface from the level of the surface, in the three dimensional problem we examine the surface from somewhere above. In the simplest case of the three dimensional problem, the optical axis is perpendicular to the plane and the resulting picture can be analysed with same ways as the two dimensional case. If the plane is tilted relative to the optical axis, reading the watermark gets more difficult because the relative distances between the points on the surface plane and the camera have changed.

As there is no way to get to know the amount of tilting, the only acknowledged solution to this is to use affine transformation as an approximation to inverse the effects of the perspective distortion. The method used in here is a modified version of that by Katayama et al. [47], where a frame was added around the image and the corner points were calculated to determine affine transform parameters.

6.1.1 Embedding

The frame embedded here is identical to that by Katayama et al. [47]. A frame is added along the outside of the image as in Figure 32. To separate the frame from the image, the frame is added at a small distance from the border of the image. The distance is related to the width of the frame so that it is possible later to determine the exact location of the image. The colour of the frame was chosen to be blue, but it could be any other with an intensity level different enough from the background. The frame width and the width of the gap between the image and the frame were decided to be 5 pixels.

Figure 32. Framed image.

58

6.1.2 Extracting

In the method by Katayama et al. [47], the extraction of the frame was performed with frame detection filters and thresholds. A point was judged to be part of the frame if the result of the frame-detection filter at that point was bigger that a predefined threshold value. The correct threshold value varies from image to image and even over same image with lighting and therefore thresholding was not used here but a different kind of a method was developed. The beginning of this method is similar to Katayama et al.’s [47]. As in their method, the picture taken is divided with a crosswise line to upper and lower sections. It can be assumed that the watermarked image lies somewhere around the centre of the captured image and so the crosswise line is assumed to cross left and right sides of the frame. The frame sides can thus be found by searching along that line. At this point of calculations, we do not know the scale of the picture and so we do not know the width of the frame. However, it can be estimated by differentiating pixel intensity in the crosswise direction and calculating the width from the positions of the maximum and minimum values. The process of frame detection is illustrated in Figure 33.

Figure 33. The frame is found by searching along a crosswise line and advancing up and down the found side of the frame.

When the width of the frame is known, the information can be used in the frame

detection filter. The frame detection filter matrix is of size 3xn where the n is two pixels more than the width of the frame. Along the both sides of the matrix there are values of n-2 and the middle of the matrix is filled with -2s. For example, for frame width of 5 pixels the frame detection filter is of the form

−−

−−

−−

=

2555552

2555552

2555552

F , (40)

For each point to be examined a convolution value is calculated with

59

∑∑= =

=3

1 1i

n

j

ijij FIFrameValue , (41)

where I is a small part of the image centred about the point to be examined. The algorithm begins from the midpoint of the left edge of the captured image. From there the calculations advance to the right searching for the left side of the frame. After the location of the left side of the frame has been found it can be traced up- and downwards to find the side of the frame, as shown with red coloured lines in Figure 33. The side can be slanted and therefore one pixel to the left and right from the current side position should also be examined instead of examining only the pixel directly above or below the current position.

Unlike in the method by Katayama et al. [47] where the corners and sides of the frame were determined with a threshold value a different approach has been chosen here which does not require thresholding. The examining of the pixels is done with the frame detection filter described earlier. From the three pixel values examined on every row, the one with the maximum filter value is chosen to be part of the frame. At some point the calculations go over the point where the side of the frame ends but the calculations are continued nevertheless. The values after that are not correct but the length of the incorrect segment is assumed to be small compared with the length of the frame side. Therefore we can take all the points of the frame just calculated and approximate them with a straight line. This same procedure is repeated to all the sides of the quadrilateral frame. After the straight lines are approximated, the corners of the quadrilateral can be approximated from the intersections of the lines.

The approximations of the corners from the intersections of the straight lines are not entirely correct and therefore a small area around the points is chosen and inspected further. The approximations are further specified by selecting a small area around the intersection and determining the exact location of the corner point by correlating a small corner image with the small area around the assumed corner. By using correlation we can determine the exact location of the crossing and the corners of the quadrilateral frame are thus found as shown in Figure 34.

Figure 34. The found corners of the frame.

60

The correction of the perspective distortion is done with the following equations [47].

positionpicturecamerayx

positionpictureoriginalyx

ybxa

cybxay

ybxa

cybxax

:),(

:),(

1

1

00

222

00

111

′′

++

++=′

++

++=′

. (42)

The algorithm for the extraction process is as follows: 1. Find the width of the frame

1.1. Divide the image with crosswise line to upper and lower sections 1.2. Differentiate the pixel intensities in the crosswise direction 1.3. Select the width from the positions of the maximum and minimum values

2. Determine the frame detection filter 3. Locate the corner points with frame detection filter

3.1. Start from the middle of the left side of the image 3.2. Find all the sides of the frame

3.2.1. Advance to the right until a side of the frame has been found 3.2.2. Trace the frame up and downwards and examine also the points one

pixel to the left and right of the current side position 3.2.3. Select maximum of the three points to be part of the frame 3.2.4. Rotate the image to find other sides and go back to 3.2.1 until all the

sides of the frame has been found 3.3. Approximate the points with straight lines 3.4. Calculate the intersections of the lines

4. Refine corner locations with correlation 5. Correct perspective distortions

6.2. Experiments and results

The experiments were done with the 512x512 Lena image and (15,7) BCH coded message of the length 135 bits as in the print-scan method described earlier. The message watermark was embedded in the image as in section 5.3. and because the frame cannot correct translation attack accurately enough, a template watermark was embedded in the spatial domain as in section 5.2. The frame was attached around the image to recover the image from geometrical and perspective transforms.

The research was done with five images: one uncompressed image and three JPEG compressed images which are shown in Appendix 5. Compression ratios examined are 100, 80 and 60 and the images were printed with Hewlet Packard ColorLaserJet 5500 DTN printer. Before printing the images out PSNR and PSPNR values were calculated for each of the images. The values were gathered up to Table 4 and from the values it is possible to see that the qualities of the images stayed high even after embedding process.

61

Table 4. PSNR and PSPNR after embedding process JPEG Compression Ratio PSNR PSPNR (uncompressed) 39.2 59.5 100 39.0 58.5 80 37.1 48.6 60 36.6 47.1

Every image was photographed 100 times with resolution 800x600 and 100 times

with resolution 1600x1200. In advance to the photographing, the image was pinned against a wall to make it straight but no special arrangements were done to prevent the camera from moving: the pictures were taken as perpendicularly as possible to the image on the wall, but freehandedly.

The camera phone used was Nokia N90 with CMOS (Complementary Metal Oxide Semiconductor) 2 megapixel camera with focal length of 5.5mm. This information is useful for determining the camera parameters when correcting the barrel distortions with the Camera Calibration Toolbox. The available image resolutions in the camera were 640x480, 800x600 and 1600x1200, but it was found that the lowest resolution level is too low for watermark extraction. The resulting success ratios of the method are displayed in Table 5. The rows show the images with different compression ratios where as the columns show the results with different resolution settings of the camera. In the experiments, the images taken with resolution 1600x1200 were scaled to 25% prior estimating the parameters. This was done to reduce the computational complexity and thus processing speed and memory consumption. Table 5. Average success ratio with different compression ratios and capturing settings JPEG Compression Ratio

resolution 800x600

resolution 1600x1200

averaged

(uncompressed) 75.0% 96.0% 85.5% 100 90.0% 90.0% 90.0% 80 82.0% 92.0% 87.0% 60 31.0% 69.0% 50.5%

Table 6 shows the average BER values of the experiments. The arrangement of the rows is similar to that in the previous table: the rows show BER values for images with various compression ratios whereas the columns show results for different resolution values. The BER values for the table were calculated before error correction because calculations were done to examine how many errors would be expected in the process, not how well the error correction coding performs. The value in parentheses indicates the average BER when the extraction process was not successful.

62

Table 6. Average BER with different compression ratios and capturing settings JPEG Compression Ratio

resolution 800x600

resolution 1600x1200

averaged

(uncompressed) 5.6% (11.3%) 3.1% (20.0%) 4.4% 100 4.6% (15.8%) 3.1% (9.3%) 3.9% 80 4.8% (8.9%) 3.6% (5.5%) 4.2% 60 10.7% (11.8%) 6.5% (11.9%) 8.6%

6.3. Discussion

The results in Table 5 show that the method is very promising and works well in the test case. Unfortunately, the method is fairly sensitive to distortions and some restrictions were necessary to make the method work: for example the image was taken as perpendicularly as possible above the watermarked image. This is due to the wavelet domain multibit message watermark, which requires nearly perfect correction of geometrical distortions. The multibit watermark is especially frail against tilting of the optical axis. If the optical axis is tilted, some parts of the image appear to be closer that the other. In the camera image, the parts that are close are presented with high resolution but parts that are further away are presented with lower resolution. In some cases, the resolution could be too low and the correction algorithm cannot correct distortions accurately enough and message will not be extracted correctly. The multibit message watermark in the wavelet domain requires high resolution and so it will be destroyed if the tilting of the optical axis is too high. The choice of resolution at which the image is taken is important. The succession ratios were significantly better when the resolution level of 1200x1600 was used. From the results it can be deduced that the 2 megapixel camera seems to be enough for reading a watermark correctly from a printed image but higher resolution would obviously be better. This is not a problem as the cameras evolve rapidly in the mobile phones and even while this work is being documented phones with better camera capabilities are being published. Even with this camera and camera properties, the BER values of the method are acceptable. When looking at the values in Table 6 it is possible to see that the method would work better with stronger error correction coding. The values in parentheses are BER when the method was not successful, that is, when the error correction failed. The values are evidently below 0.5, which is the limit around which error correction is not possible with any coding technique. Some more specific information about the calculated BER values is included to Appendix 4. The images show that the BER values are usually at the same low level but now and then there are peaks to the level 0.5. This indicates that the extraction of the watermark has failed completely. In these cases it can be assumed that the synchronization process has somehow failed. One of the possible reasons why the message is too erroneous to be read correctly in addition to the tilting of the optical axis is the compression of the image beforehand. Table 5 shows that the method is robust against slight JPEG compression but deteriorates rapidly as the compression ratio increases. This was

63

expected, but even though the JPEG compression worsens, the results, the success ratios, are still above 90% with compression ratios of 80%.

It must be noted, however, that the compression ratios reported here tell only the amount of compression applied to the image before printing. More compression will occur when the image is taken with the camera phone which compresses the image before saving it to the memory. From this point of view, the watermarking method is even more robust against JPEG compression. Comparing of the method developed here with other similar methods is difficult because only few watermarking methods have been proposed for the camera phones. The method by Nakamura et al. [37] also used a frame around the watermarked image to correct perspective distortions but the way they handled the results was different from mine. They reported as high success ratios as 100% when the picture was taken straight above the watermarked image, but this value cannot be compared with the method proposed earlier because they used a much stronger error correction coding and the camera phones used were different. Also, the capacity of their method was lower and smaller message was embedded in the image: only 16 message bits were embedded against 63 bits embedded here. This, too, enhanced the robustness in their favour and it is claimed that the method proposed here would compete well with theirs with similar settings. The experiments were done here in a noise free environment and many distortions were neglected. For example, impacts of lighting were not considered and the light around the image was stable through the experiments. In the future it is important to do research with different light conditions and variable lighting, as the reflections of the light from the image will affect to the extraction of the watermark. Another difficult thing to be examined is the distortions around the image. Rarely in the real life, is the image placed alone to the page. Often the image is surrounded with other images and text. The method must be improved to handle these kinds of distortions. Nevertheless, the method is promising and a great deal of information was gained in the research for the future use.

64

7. DISCUSSION No one knows what the future brings but we can always make good guesses. Even now, more and more content moves around without wires between portable devises, cell phones, laptops, PDAs and so on. A growing number of people have cell phones in their pockets accompanied with mp3 players and digital cameras, but even the limits between devices are diminishing. A cell phone may now contain in itself a media player and a video camera and still be available for consumers at a reasonable price. As the properties of devices blend together to one device, so does different media formats. With watermarking, music files can be included in image files and links to websites are embedded in both of them. In this work, two watermarking methods were proposed for value adding watermarking. The first method was robust against print-scan attacks which were considered as prerequisites for the second, print-cam method.

The print-scan process works in two dimensional world in which the user should own a scanner to be able to read the watermark, but from the users point of view it would be easier if the watermark could be read by taking a picture of the watermarked image with a camera phone and the watermark could be read at anytime in anywhere.

A motivation for this work was the lack of publications discussing about reading watermarks with digital cameras or camera phones. Only few papers were found and almost all of them were developed for commercial purposes. This indicates that there is a demand for print-cam robust watermarking systems.

The methods presented here were fairly similar in spite of the different environments they were required to work in. In the print-scan robust method, the focus was on inverting some geometrical distortions, that is, rotation, scale and translation. Along with that, the print-cam robust method focused on correcting the effects of perspective distortions. In both of the methods, multiple watermarking methods were employed, where the multibit watermark was embedded in the wavelet domain and one or two template watermarks were embedded to recover from geometrical distortions. The parameters for inverting translation were determined with the same watermark in both methods, but the rotation and scale were calculated with different kinds of watermarks: with a template in Fourier domain in the print-scan robust method and with a visible frame in the print-cam method. Both methods seemed to work very well, and, with a stronger error correction coding, the results would have been even better.

While storing the bits of media for future use, the compression algorithms have a huge role to play. Right now the most popular image compression format is the JPEG compression and every watermarking system should be robust against it. Here it has been shown that both of the methods are robust against JPEG compression with a compression ratio of 60. Compression ratios less than that are rarely used because after compression ratio of 60 the compression starts to decrease the quality of the image.

In the future work, the reliability of the methods will be improved and new synchronization methods will be developed. The focus will be transferred to the print-cam robust methods and print-scan robustness will be only the first step towards print-cam robust systems. The next generation of camera phones will be

65

published soon and the resolutions in the cameras will increase. Soon the qualities of the camera phones will exceed the qualities of the present digital cameras and thus the research will be done in the near future with digital cameras instead of existing camera phones.

66

8. CONCLUSION The aim of this work was to find a method to read a watermark from a printed image with a camera phone. As a prerequisite for the problem, a print-scan robust watermarking method was developed an examined. Based on the results achieved with the print-scan robust watermarking method, a print-cam robust method was proposed. In both of the methods multiple watermarking methods were applied successfully and the results obtained were promising. In the print-scan robust method, three watermarks were embedded in the image: two template watermarks were embedded in order to recover the watermark from rotation, scale and translation attacks and the third watermark embedded was the multibit watermark, containing the actual message. One of the template watermarks was a pseudorandom sequence embedded in the magnitudes of the Fourier domain in a form of a circle around the centre of the magnitude coefficients. With this watermark, the rotation angle and the amount of the scale of the image occurred at the scanning process could be inverted. Since the magnitudes of the Fourier domain are invariant to translation, a second watermark was required and embedded in the spatial domain. The multibit message watermark was embedded at last in the wavelet domain. In the print-cam robust method, two invisible watermarks were embedded in the image and a visible frame was added around the image. The visible frame was necessary so that the perspective distortions could be approximated and inverted with affine transformation. The two invisible watermarks embedded were the same as in the print-scan robust method. One of the invisible watermarks was a template watermark, embedded in the image in order to recover the watermark from translation attack, whereas the other invisible watermark embedded was the multibit message watermark. The methods were tested by taking multiple pictures with a scanner and camera phone from a watermarked image. The success ratios and BER values were calculated for both of the methods with various resolutions levels. In the print-scan robust method, the results with resolution levels did not have any significant differences but in the print-cam robust method the difference was obvious: the higher resolution level gave better results regardless of the compression level of the image used. The methods were also tested by compressing the test image beforehand with different JPEG compression ratios. The results were expected, as the success ratio decreased and the BER increased while the compression ratio decreased. However, the results were acceptable until the compression ratio went below 60 and thus it can be concluded that both of the methods are robust against JPEG compression. The results of the methods do not reach 100%, however, but with better error correction coding the value could be approached. The future work includes improving of the print-cam method and moving to use digital cameras instead of camera phones. This is due the fact that the qualities of the cameras in cell phones are getting better and soon they will reach the qualities of modern digital cameras.

67

9. REFERENCE [1] Cox, I.J., Miller M.L. & Bloom J.A. (2002) Digital watermarking. Morgan Kaufman publishers, Academic Press, USA, 542 p. ISBN 1-55860-714-5 [2] Hanjalic, A., Langelaar, G.C., van Roosmalen, P.M.B., Biemond, J. & Langendijk, R.L. (2000) Image and Video Databases: Restoration, Watermarking and Retrieval. Elsevier Science B.V., Amsterdam, Netherlands, 445 p. ISBN 0-444-50502-4 [3] Mäkelä K. (2000) Digital Watermarking and Steganography. Diploma Thesis. University of Oulu, Department of Electrical Engineering, Oulu, Finland. [4] Chou, C-H & Li, Y-C. (1995) A Perceptually Tuned Subband Image Coder Based on the Measure of Just-Noticeable-Distortion Profile. In: IEEE Transactions on circuits and systems for video technology. Dec 1995, Vol. 5, Issue 6, pp. 467 - 476. [5] Perry, B., MacIntosh, B. & Cushman, D. (2002) Digimarc MediaBridge – The birth of a consumer product, from concept to commercial application. In: Proceedings of SPIE Security and Watermarking of Multimedia Contents IV, Jan 21-24, San Jose, California, USA, Vol. 4675, pp. 118-123. [6] He, D. & Sun, Q. (2005) A Practical Print-scan Resilient Watermarking Scheme. In: IEEE International Conference on Image Processing (ICIP), Sept. 11-14, 2005, Vol. 1, pp. I - 257-60. [7] Solanki, K., Madhow, U., Manjunath, B.S. & Chandrasekaran, S. (2004) Estimating and Undoing Rotation for Print-scan Resilient Data Hiding. In: IEEE International Conference on Image Processing (ICIP), Oct. 24-26, Vol. 1, pp. 39-42. [8] Camera Calibration Toolbox for Generic Lenses. (read 27.8.2006) URL: http://www.ee.oulu.fi/mvg/mvg.php?page=calibration. Matlab® version 6.5 or later with the Image Processing Toolbox and Optimization Toolbox is required. [9] Kannala, J. & Brandt, S. S. (2006) A Generic Camera Model and Calibration Method for Conventional, Wide-Angle, and Fish-Eye Lenses. In: IEEE Transactions on pattern analysis and machine intelligence. Aug 2006, Vol. 28, No. 8, pp. 1335-1340. [10] Pereira, S. & Pun, T. (June 2000) Robust Template Matching for Affine Resistant Image Watermarks. In: IEEE Transactions in Image processing, June 2000, Vol. 9, Issue 6, pp. 1123-1129. [11] O’Ruanaidh, J.J.K. & Pun T. (1997) Rotation, scale and translation invariant digital image watermarking. In: IEEE Proceedings of International Conference on Image Processing, Oct. 26-29, 1997, Santa Barbara, California, USA, Vol. 1, pp. 536-539.

68

[12] Angel, E. (2006) Interactive Computer Graphics: a top-down approach with OpenGL -4th edition. Addison Wesley, Pearson Education Inc., USA. 784 p. ISBN 0-321-32137-5 [13] JPEG Home Page (read 27.6.2006) URL: http://www.jpeg.org/jpeg/index.html [14] Rao, K.R. & Hwang, J.J. (1996) Techniques and Standards for Image, Video and Audio Coding. Pretience Hall PTR, New Jersey, USA. 563 p. [15] ISO/IEC JTC1 10918-1 | ITU-T Recommendation T.81. (1992) Information technology- Digital Compression and Coding of Continuous-tone Still Images: Requirements and Guidelines. Terminal Equipment and Protocols for Telematic Services. CCITT. [16] Johnson, R. C. (read 27.6.2006) JPEG2000 wavelet compression spec approved URL: http://www.eetimes.com/story/OEG19991228S0028 EETimes, Dec. 28, 1999 [17] Smith, J.R., Comiskey, B.O. (1996) Modulation and Information hiding in Images. In: Proceedings of the First Information Hiding Workshop May 30 –June 1, an Isaac Newton Institute, University of Cambridge, UK, Vol. 1174, pp.207 - 226 . [18] Kostopoulos, V., Skodras, A.N., & Christodoulakis, D. (2000) Digital Image Watermarking: On the Enhancement of Detector Capabilities. In: Proceedings of Fifth International Conference on Mathematics in Signal Processing, Warwick, Dec. 18-20, 2000. [19] Kutter, M. (1998) Watermarking resisting to translation, rotation and scaling. In: Proceedings of SPIE Multimedia Systems and Applications, Boston, MA 1998, Vol. 3528, pp. 423-431. [20] Deguillaume, F., Voloshynovskiy, S. & Pun, T. (2002) Method for the Estimation and Recovering from General Affine Transforms. In: Proceedings of SPIE, Electronic Imaging 2002, Security and Watermarking of Multimedia Contents IV, Vol. 4675. pp. 313-322. [21] Joseph Fourier (read 2.8.2006) Wikipedia, the free encyclopedia. URL:http://en.wikipedia.org/wiki/Joseph_Fourier [22] Castleman, K.R. (1996) Digital Image processing. Prentice-Hall, Jew Jersey, USA, 1996. 667 p. ISBN 0-13-211467-4 [23] Lee, J.-S. & Kim, W.-Y. (2004) A Robust Image Watermarking Scheme to Geometrical Attacks for Embedment of Multibit Information. In: Proceeidng of Advances in Multimedia Information Processing - PCM 2004: 5th Pacific Rim Conference on Multimedia, Nov 30 - Dec 3, Tokyo, Japan, Part 3, pp.348-355. [24] Hartung, F. & Kutter, M. (1999) Multimedia Watermarking Techniques In: Proceedings of the IEEE, Jul, Vol. 87, Issue, pp. 1079-1107.

69

[25] Digitaalinen kuvankäsittely (2005) Lecture notes based on the book: Gonzalez, R.C., Woods, R.E.: Digital Image Processing, Prentice Hall, 2002. 793 p. ISBN: 0-20-118075-8 [26] Meerwald, P., Uhl, A. (2001) A survey of wavelet-domain watermarking algorithms. In: Proceedings of SPIE, Electronic Imaging, Security and Watermarking of Multimedia Contents, Jan 20-26, San Jose, California, USA, Vol. 4314, pp. 505-516. [27] Barni, M., Bartolini, F., Capellini, V., Lippi, A. & Piva, A. (1999) A DWT-based technique for spatio-frequency masking of digital signatures. In: Proceedings of the SPIE/IS&T International Conference on Security and Watermarking of Multimedia Contents, Jan 25-25, San Jose, California, USA, Vol. 3657, pp. 31-39. [28] Gilani, S.A.M. & Skodras, A.N. (2001) Watermarking By Multiresolution Hadamard Transform. In: Proceedings of the European Conference on Electronic Imaging & Visual Arts (EVA2001), Mar 26-30, Florence, Italy, pp. 73-77. [29] Fotopoulos, V., Krommydas, S. & Skodras, A.N. (2001) Gabor Trasform Domain Watermarking. In: Proceedings of International Conference on Image Processing, Oct. 7-10, Vol. 2, pp. 510-513 . [30] Kang, S. & Aoki, Y. (1999) Image Data Embedding System for Watermarking Using Fresnel Trasnform. In: IEEE international Conference on Multimedia Computing and Systems, Jun. 7-11, Vol. 1, pp.885-889. [31] Bailey, D.H. & Swarztrauber, P.N. (1991) The Fractional Fourier Transform and Applications. In: SIAM Review, vol. 33, Issue 3, pp. 389-404. [32] Lähetkangas, E. (2005) Digitaalisen kuvan vesileimaus, Diploma Thesis, University of Oulu, Department of Electrical and Information Engineering, Finland. [33] N. P. Sheppard, R. Safavi-Naini & P. Ogunbona. (2001) On Multiple Watermarking, In:Workshop on Security and Multimedia at ACM Multimedia 2001, Ottawa, Canada, pp. 3-6. [34] Digimarc (read 4.10.2006) Digimarc Mobile E-Commerce Pilot Debuts at Popular Tokyo-based “Maid in Japan” Café. Press Release on 25.7.2006 URL: http://www.digimarc.com/media/release.asp?newsID=478 [35] Marek S. (read 9.5.2006) Camera Phones Click With Marketers. URL:http://www.wirelessweek.com/article/CA457747.html?text=mobot&stt=001 Wireless Week, Oct. 1, 2004 [36] NTT Corporation (read 6.6.2006) NTT Develops "CyberSquash" Internet Access Platform using Electronic Watermarks. URL: http://www.ntt.co.jp/news/news03e/0307/030707.html News release on 7.7.2003

70

[37] Nakamura, T., Katayama, A., Yamamuro, M. & Sonehara, N. (2004) Fast Watermark Detection Scheme for Camera-equipped Cellular Phone. In: ACM International Conference Proceeding Series; Volume 83. Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia. College Park, Maryland. URL: http://portal.acm.org/citation.cfm?id=1052395

[38] Brown, Stephen A. (read 8.6.2006) A History of the Bar Code. URL: http://eh.net/encyclopedia/article/brown.bar_code EH.net Encyclopedia. [39] Seideman, T. The history of barcodes. (read 8.6.2006) URL: http://www.basics.ie/History.htm. Article appeared in American Heritage of Invention and Technology, Forbes Publication [40] TEC-IT, Team for Engineering and Consulting in Information Technologies, Bar Code Online Demo, TBarCode OCX. (Used 19.7.2006) http://www.tec-it.com/asp/main/startfr.asp?mainmenu=Software&sbmenu= Online&redirect=demo/playground.asp&LN=1 [41] Pavlidis, T. Swartz, J. & Wang, Y.P. (1990) Fundamentals of bar code information theory. In: IEEE Computer Society, Apr., Vol. 23, Issue 4, pp. 74-86 [42] The 2D Data Matrix barcode (2005), In: Computing & Control Engineering Journal, Dec. 2005-Jan. 2006, Vol. 16, Issue 6, pp. 39. [43] Reiter (read 20.7.2006) Reiter’s Camera Phone Report. URL: http://www.wirelessmoment.com/barcodes_and_scanning_camera_phones/index.htm l [44] Takeuchi, S., Kunisa, A., Tsujita, K. & Inoue Y. (2005) Geometric Distortion Compensation of Printed Images Containing Imperceptible Watermarks, In: International conference on Consumer Electronics, 2005. ICCE. 2005 Digest of Technical Papers. 8-12 Jan. 2005. pp. 411 – 412. [45] Peterson, W. W. & Weldon, E. J. (1972) Error-correcting codes, 2nd edition. Colonial Press, Inc., USA. 560 p. ISBN 0-262-16-039-0 [46] Keskinarkaus, A., Pramila, A., Seppänen, T. & Sauvola, J. (2006) A Wavelet Domain Print-scan and JPEG Resilient Data Hiding Method. In: Proceedings of 5th International Workshop on Digital Watermarking, Lecture Notes on Computer Science, Nov. 8-11, Jeju Island, Korea, Vol. 4283, pp. 82-95 [47] Katayama, A., Nakamura, T., Yamamuro, M. & Sonehara, N. (2004) New High-speed Frame Detection Method: Side Trace Algorithm (STA) for i-appli on Cellular Phones to Detect Watermarks. In: Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia, ACM International Conference Proceeding Series, Oct 27-29, 2004. Vol. 83, pp.109-116

71

10. APPENDIXES Appendix 1 Homogeneous coordinates Appendix 2 BER figures for the print-scan method Appendix 3 Images used in the print-scan process for testing Appendix 4 BER figures for the print-cam method Appendix 5 Images used in the print-cam process for testing

Appendix 1 Homogeneous coordinates 72

HOMOGENEOUS COORDINATES [12] Homogeneous coordinates are generally used in the field of computer graphics to simplify transformations and projections. In Affine geometry they have a special role because every affine transform can be represented as a matrix multiplication. In homogeneous coordinate system the transformation of an n dimensional vector is done in n+1 dimensional space. Let’s consider a point P located in three dimensional space defined by a point P0

and the vectors 21 ,νν and .3ν Usually the point P located at (x, y, z) is represented

with a column matrix

=

z

y

x

p , (1)

where x, y and z are components of basis vectors in this point, so that 3210 ννν zyxPP +++= . (2)

However, this representation is not very good, because it can be confused with a representation of a vector

321 ννν zyxW ++= , (3)

which does not have any starting point and can be placed anywhere in the space. Homogeneous coordinates offer a solution for this problem by introducing an extra dimension. Then, point P can be written uniquely as

=

1

z

y

x

p , (4)

because from equation (2)

[ ]

=

0

3

2

1

1

P

zyxPν

ν

ν

. (5)

Similarly, the vector w can be written in column matrix representation as

Appendix 1 Homogeneous coordinates 73

=

0

z

y

x

w , (6)

and

[ ]

=++=

0

3

2

1

321 0

P

zyxzyxWν

ν

ν

ννν . (7)

We can see that the representations of a point and a vector are now different and cannot be confused anymore. By using same derivation, an arbitrary transformation matrix

,

=

ihg

fed

cba

s (8)

can be represented in homogeneous coordinates as

.

1000

0

0

0

=ihg

fed

cba

s (9)

Although the transformations are now done in four dimensional space to solve a three dimensional case when the homogeneous coordinates are used, less arithmetic work is required. The uniform representation of affine transformations makes composing of successive transformations far easier and faster than in three dimensions. In addition, modern computers are able to use parallelism to speed up homogeneous coordinate operations.

Appendix 2 BER figures for print-scan method 74

The BER figures presented here are extracted from the tests of print-scan method proposed earlier. On every y-axel show the BER value and x-axel give the number of image examined. CR means JPEG compression ratio and last number in the figure caption tells the resolution used in scanning.

Figure A.2.1 BER (CR=uncomp, 300dpi). Figure A.2.2 BER (CR=uncomp, 150dpi).

Figure A.2.1 BER (CR=100, 300dpi). Figure A.2.2 BER (CR=100, 150dpi).


Appendix 2 BER figures for print-scan method 75



Figure A.2.9 BER (CR=100, 300dpi) Figure A.2.10 BER (CR=100, 150dpi) (HP LaserJet 4500 DN). (HP LaserJet 4500 DN).

Appendix 3 Images used in print-scan process for testing 76

Figure A.3.1 Original image before watermarking.

Figure A.3.2 Image after watermark embedding and JPEG compression with compression ratio of 100.

Appendix 4 BER in print-cam method 79

The BER figures presented here are extracted from the tests of print-cam method proposed earlier. On every y-axel show the BER value and x-axel give the number of image examined. CR means JPEG compression ratio and last number in the figure caption tells the resolution used when the picture was taken.

Figure A.4.1 BER (uncomp., 800x600). Figure A.4.2 BER (uncomp., 1600x1200).

Figure A.4.3 BER (CR=100, 800x600). Figure A.4.4 BER (CR=100, 1600x1200).


Appendix 4 BER in print-cam method 80


Appendix 5 Images used in print-cam process for testing 81

Figure A.5.1 Original image before watermarking.

Figure A.5.2 Image after watermark embedding before JPEG compression.


Figure A.3.3 Image after watermark embedding (JPEG CR = 100).


WATERMARK SYNCHRONIZATION IN CAMERA PHONES …WATERMARK SYNCHRONIZATION IN CAMERA PHONES AND...

Documents

Transcript of WATERMARK SYNCHRONIZATION IN CAMERA PHONES …WATERMARK SYNCHRONIZATION IN CAMERA PHONES AND...