[Matt Goldfield]

Secure Data Hiding in Wavelet Compressed Fingerprint Images

Nalini K. Ratha IBM T. J. Watson Research

30 Saw Mill River Road, Hawthorne, NY 10532 ratha @ us. ibm.com

Jonathan H. Connell IBM T. J. Watson Research

30 Saw Mill River Road, Hawthorne, NY 10532

jconnell @ us.ibm.com

Ruud M. Bolle IBM T. J. Watson Research

30 Saw Mill River Road, Hawthorne, NY 10532 bolle @ us.ibm.corn

ABSTRACT With the rapid growth of the Internet, electronic commerce revenue now amounts to several billion US dollars. To avoid fraud and misuse, buyers and sellers desire more secure methods of authentication than today's userid and password com- binations. Automated biometrics technology in general, and fingerprints in particular, provide an accurate and reliable authentication method. However, fingerprint-based authentication requires accessing fingerprint images scanned remotely at the user's workstation, a potentially weak point in the security system. Stored or synthetic fingerprint images might be fraudulently transmitted, even if the communication channel itself is encrypted. In this paper we describe an algorithm for secure data hiding in wavelet compressed fingerprint images to alleviate this problem. Assuming the image capture device is secure, then only the decompressor on the server can locate the embedded message and thereby validate the submitted image.

Keywords Authentication, biometrics, fingerprints, WSQ compression, watermarking, data hiding

1. INTRODUCTION The past few years have seen an explosive growth of B2C

(business-to-customer) activities over the Internet. The to- tal dollar value of these web-based transactions is now over several billion US dollars. At present, the buyers are au- thenticated by service providers using only a combination of userid and password (at most). The critical information about the transaction, such as the credit card number and the amount, are sent over the web using secure encryption methods. However, current systems are not capable of assur- ing that the transaction was initiated by the rightful owner of the credit card. As Internet revenues grow, credit card

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page, To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific pem~ission and/or a fee. ACM Multimedia Workshop Marina Del Rey CA USA Copyright ACM 2000 1-58113-311-1/00/11 ...$5.00

owners and credit card issuers are likely to be increasingly concerned with the reliability and security of transactions.

One way this can be enhanced is with the help of automated biometric authentication. Biometrics is a rapidly expanding area and focuses on identifying people based on innate physiological or behavioral characteristics. Exam- ples of biometrics include fingerprint, face, iris and voice. All automated biometrics-based person authentication sys: terns operate by first acquiring a biometrics signal from the user, either locally or remotely. The signal is then analyzed to extract invariant features, and finally matched against a previously registered template.

Fingerprint-based authentication systems are the most advanced and accepted of the biometrics technologies. They have been used for more than a century in law-enforcement agencies and have been progressively automated over last three decades. Recent developments in fingerprint sensing technology that allow a fingerprint to be acquired without using the traditional ink and paper method have enabled the use of fingerprints in many non-criminal applications. As these sensors become cheaper, fingerprints will be an ob- vious choice for authentication in wide ranging applications because of its mature technology and its legal standing.

However, in both Web-based or other on-line transaction processing systems, it is undesirable to send uncompressed fingerprint images to the server. A typical fingerprint image is in the order of 512 x 512 pixels with 256 gray levels, re- sulting in an image size of 256 Kbytes. Unfortunately, many standard image compression methods have a tendency to distort the high-frequency spatial structural ridge features of a fingerprint image. This has lead to several research pro- posals regarding domain-specific compression methods. As a result, an open wavelet-based image compression scheme (WSQ) proposed by the FBI [1] has become the de facto standard in the industry because of its low distortion even at very high compression ratios.

Typically, the compressed image is transmitted over a standard encrypted channel as a replacement for (or in ad- dition to) the user's PIN. Yet because of the open compression standard, transmitting a WSQ compressed image over the Internet is not particularly secure. If a compressed fingerprint image bit-stream can be freely intercepted (and decrypted), it can be decompressed using readily available software. This potentially allows the signal to be saved and fraudulently reused.

One way to enhance security is to use data-hiding tech-

127

niques to embed additional information directly in compressed fingerprint images. For instance, assuming that the embedding algorithm remains inviolate, the service provider can look for the appropriate watermark to check that the submitted image was indeed generated by a trusted ma- chine. Several techniques have been proposed in the litera- ture for hiding digital watermarks in images. Bender et ai. [4] and Swanson et al. [7] have presented excellent surveys of data-hiding techniques. Petitcolas et ai. [8] provide a nice survey and taxonomy of information hiding techniques. Hsu and Wu [3] describe a method for hiding watermarks in JPEG compressed images. Most of the research, however, addresses issues involved in resolving piracy or copyright issues, not authentication.

Our approach is motivated by the desire to create on-line fingerprint authentication systems for commercial transactions that are secure against replay attacks in particular. To achieve this, the service provider would issue a different verification string for each transaction. The string would be mixed in with the fingerprint image before transmission. When the provider receives the image back he can then de- compress and check for the presence of the correct verification string. This guards against resubmission of stored images. The method proposed here hides such messages with minimal impact on the decompressed appearance of the image. Moreover, the message is not hidden in a fixed location (which would make it more vulnerable to discovery) but is, instead, deposited in different places based on the structure of the image itself. Although our approach is presented in the framework of fingerprint image compression, it can be easily extended to other biometrics.

The following sections detail the original WSQ compression scheme and our message embedding extensions. We describe the FBI standard WSQ fingerprint compression algorithm in Section 2. Our data-hiding algorithm is presented in Section 3. Section 4 gives results of our algorithm. Conclusions and future work are presented in Section 5.

2. WSQ FINGERPRINT COMPRESSION Here we give a short review of the FBI standard WSQ

fingerprint compression. More details are available in [1]. Block diagrams of the WSQ encoder and decoder are shown in Fig. l(a).

In the first step, the input image is decomposed into 64 spatial frequency subbands using perfect reconstruction mul- tirate filter banks. The filters are implemented as a pair of separable 1D filters. The two filters specified for Encoder 1 of the FBI standard are plotted in Fig. l(b) and (e). The sub-bands are the filter outputs obtained after a desired level of cascading of the filters as described in the standard (see Fig. 2(b)). For example, sub-band 25 corresponds to the cascading path of '00,10,00,11' through the filter bank. The first digit in each binary pair represents the row operation index. A zero specifies lowpass filtering on the row (column) while a one specifies highpass filtering on the row (column).

An interesting aspect of the WSQ algorithm is the way it handles the image at the boundary. Instead of simply periodizing the image at the boundaries in both the dimen- sions, the standard specifies symmetric extension transforms (SET) which essentially mirror the image across the boundaries. By extrapolating the signal in this way, the discrete wavelet transform results in the same number of coefficients as there were pixels in the original image. The details of the

SET are available in [1] and also in [2]. There are two more stages to WSQ encoding. The second

stage is a quantization process where the discrete wavelet transform (DWT) coefficients are transformed to integers with a small number of discrete values. This is accomplished by uniform scalar quantization for each sub-band. There are two characteristics for each band: zero of the band (Zk) and width of the bins (Qk). These parameters must be chosen carefully to achieve a good compression ratio without in- troducing significant information loss. The Zk and Qk for each band are transmitted directly to the decoder. The final stage is Huffman coding of the integer indices for the DWT coefficients. For this purpose the bands are grouped into three blocks. In each block, the integer coefficients are remapped to numbers between 0-255 as per a translation table described in the standard. This translation table encodes run lengths of zeros and large values. Negative coefficients are also translated in a similar way by this table.

Our data-hiding algorithm works on the quantized indices before this final translation (i.e. between stages 2 and 3). Note that we assume the message size is very small compared to the image size (or, equivalently, the number of DWT coefficients). The Huffman coding characteristics and tables are not changed; the tables are computed as for the original coefficients, not after the coefficient altering steps described in next section.

3. DATA-HIDING ALGORITHM Our method is intended for messages which are very small

(in terms of bits) compared to the number of pixels in the image. To hide such a message during the image encoding process, there are three basic steps as described below.

• Site selection set S: Given the partially converted quantized integer indices, the role of this stage is to collect the indices of all possible coefficient sites where a change in the least significant bit is tolerable. Typi- caily we start by excluding all sites in the low frequency bands. Even small changes here can affect large regions of the image. Next we pick as candidates only those coefficient sites having large magnitudes. This leads to relatively small percentage changes in the values and hence minimal degradation of the image. Note that among the quantizer indices there axe special codes to represent run lengths of zeroes and large integer values, as well as other control sequences. We avoid all coefficient sites incorporated into these values. In our implementation, we only select sites with translated indices ranging from 107 to 254 but excluding 180 (an invalid code).

• Random number seeding: We then select the sites from candidate set S which will be modified in a pseudo- random fashion. To retain predictability in encoder and decoder, we choose the seed for our random number generator based on the sub-bands that are not considered for alteration. For example, in the selection process we leave the contents of sub-bands 0-6 unchanged in order to minimize distortion. We typically choose values at fixed sites within these bands, although in principle we could choose any statistic from these bands. Selecting the seed in this way ensures both that the message is embedded at varying loca- tions (but based on the image content), and that the

128

WSQ Encoder

~ Wavelet ~ Quantizer~-~ Huffman .i Compressed da~ TransformJ I Encoder

I Filters I I Quant. Tabl~ [ Huff Tables[

WSQ Decoder

. u . Ou.nt [ Compressed da~ Decoderl Decoder]l ~Wsvel;t TranS. I Image I

I Hu.Tabiesl I Quant. Tabl• [ Filters ] (a)

(b) (c)

F i g u r e 1: W S Q a lgor i thm. (a) overv iew; (b) and (c) analysis f i l ters.

embedded message can only be read if the proper seed selection algorithm is known by the decoder.

• Bit setting: The message to be hidden is translated to a sequence of bits. Each bit will be stuffed into a site chosen semi-randomly from the list of suitable sites. That is, for each bit we choose a site from the set S based on the seeded pseudo-random number generator. If the selected site has already been used, the next randomly generated site is chosen instead. We then change the low order bit of the value at the selected site to be identical to the current message bit. Half the time this result in no change at all to the coefficient value.

• Bit saving: optionally, we can save all the original low order bits and append them to the compressed bit stream as a user comment field (an appendix). The ap- pended bits are random samples in general, and hence are uncorrelated with the hidden message.

For the decoder, there are also three steps. The first two steps are the same as the first two steps described in the encoder: constructing the set S and selecting the seed for the random number generator. The third step is run the pseudo-random number generator to select specific sites in a particular order. The least significant bits of the values at these sites are then concatenated to recover the original message.

If a restoration appendix is included, the decoder can optionally restore the original low order bits as it goes. This allows perfect reconstruction of the image (up to the original compression) despite the embedded message. Because we were careful in selecting modification sites, the restored decompressed image will be nearly the same as the decompressed image with the message still embedded. In practice, the error due to the embedded message is not perceptually

significant, and does not affect subsequent processing and authentication.

Using this process only the right decoder can locate and extract the message from the compressed image during the decoding process. This message might be a fixed authentication stamp, or personal ID information which must match some other part of the record (which might have been sent in the clear). Thus, if an uncoded or improperly coded bit stream is given to the special decoder, it will fail to extract the expected message and hence can reject the image.

4. R E S U L T S

A grayscale fingerprint image and its 64 bands are shown in Fig. 2. The input image shown in 2(a) was acquired using a live-scan optical fingerprint scanner. It was then been compressed to 0.75 bits per pixel (10.7 : 1 compression) to yield a representation consisting of the quantized and translated bands comprising 13490 bytes. In this representation our algorithm found 1287 sites where message bits can be hidden. We randomly modified all of these sites. After decompres- sion (but not restoration) we obtain the image shown in Fig. 2(c). Note that any standard decoder would yield this image; it just could not extract the encoded message. As can be seen, we can hide a message of considerable length in the image, probably sufficient for e-commerce transactions, without substantially affecting the image quality.

5. C O N C L U S I O N S

In this paper, we have described a robust data-hiding algorithm in the wavelet-compressed domain for fingerprint images. The proposed algorithm is simple and can be easily implemented in hardware. We have tested the system on fingerprint images using the FBI-standard WSQ compression scheme.

The basic algorithm can be easily extended to other compressed image domains such as medical images, satellite ira-

129

(a) (c)

(b)

F i g u r e 2: W S Q resu l t s . message .

ages, and other classes of biometrics images such as faces. The key is to select sites for value-modification that have low visual impact, and guarantee that some sites are left unmodified for pseudo-random number seed generation.

Furthermore, many versions of the same algorithm are possible by using different random number generators or partial seeds. This means it is possible to make every implementation unique without much effort; the output of one encoder need not be compatible with another version of the decoder. This has the advantage that cracking one version will not necessarily compromise another version.

Currently, we are examining the possibility of recovering the original data bits at the message sites without using an appendix in order to get back the fully correct decompressed image without any loss due to data hiding.

6. R E F E R E N C E S [1] "WSQ Gray-scale Fingerprint Image Compression

Specification", U.S. Federal Bureau of Investigation, 1993.

[2] C. M. Brislawn, J. N. Bradley, R. J. Onyshczak, and T. Hopper. "The FBI compression standard for

(a) f i nge rp r in t image; (b) i t s 64 s u b b a n d s ; (b) r e c o n s t r u c t e d i m a g e w i t h e m b e d d e d

digitized fingerprint images", in Proc. of SPIE, Vol. ~8.~7, Denver, Aug. 1996, pages 344-355.

[3] C. T. Hsu and J. L. Wu, "Hidden digital watermarks in images", IEEE Trans. on image processing, Vol. 8, No. 1, Jan. 1999, pp. 58-68.

[4] W. Bender, D. Gruhl, N. Morimoto, and A. Lu, "Techniques for data hiding", IBM Systems Journal, Vol. 35, No. 3 & 4, 1996, pp. 313-335.

[5] N. Memon and P. W. Wong, "Protecting digital media content", Communication of the ACM, Vol. 41, No. 7, July 1998, pp. 35-43.

[6] S. Mallat, "Wavelets for vision", Proc. of the IEEE, Vol. 84, No. 4, April 1996, pp. 604-614.

[7] M. D. Swanson, M. Kobayashi and A. H. Tewfik, "Multi-media data embedding and watermarking technologies", Proc. of the IEEE, Vol. 86, No. 6, June 1998, pp. 1064-1087.

[8] F. A. Petitcolas, R. J. Anderson, and M. G. Kuhn, "Information Hiding - A survey", Proc. of the IEEE, Vol. 87, No. 7, July 1999, pp. 1062-1078.

130

[Matt Goldfield]

Documents

Transcript of [Matt Goldfield]