[IEEE 2012 IX International Symposium on Telecommunications (BIHTEL) - Sarajevo, Bosnia and...

4
2012 IX International Symposium on Telecommunications (BIHTEL) October 25-27, 2012, Sarajevo, Bosnia and Herzegovina 978-1-4673-4876-8/12/$31.00 ©2012 IEEE A new approach to relatively short message steganography Angel Sanchez 1 , Aura Conci 2 , Ensar Zeljkovic 3 , Narcis Behlilovic 3 , Vedran Karahodzic 3 1 Departamento de Ciencias de la Computation (Univ. Rey Juan Carlos), 28 933 Mostoles (Madrid),Spain 2 Instituto de Computação (Universidade Federal Fluminense), 24210-240, Niterói (Rio de Janeiro),Brazil 3 Faculty of Electrical Eng., University of Sarajevo, Sarajevo, Bosnia and Herzegovina [email protected] Abstract – The rapid growth of Internet users and the increasing range of data types which are exchanged over this network (video, audio, text messages...), emphasize the security problem that this way of communication has. The flood of multimedia contents in the structure of transmitted data has made the appearance of images in this network quite normal. This revived the use of steganography (hiding data within images) in order to hide information to avoid unauthorized access. A much used technique for this purpose, the LSB (Least Significant Bits) technique, still leads to visible changes in the original image, which was chosen to be the message carrier. These differences make quite a path for a cryptanalyst to doubt the authenticity (independence) of the picture itself. However, by using GA (Genetic Algorithm), the differences between the original image and the image embedded with secret data can be reduced. However, the difference between the original image and the image with embedded information still remains, while the achieved improvements are paid with an increase of computational complexity. Naturally a question arises: Can the image be embedded with information in a way so that it does not undergo any changes? Most fast responses would be that it is not possible. This paper shows that this is in fact possible. Keywords - Steganography; LSB Substitution; Image Encryption; Genetic Algorithm; Path Relinking; Information Security; Data Protection I. INTRODUCTION The expansion of data transmission through Internet made the process of improving data protection inevitable. The broadband Internet access has enabled a bigger flow of video content through this network. The path of steganography is once again widely open. When data protection is based only on steganography, it is often related to the LSB technique. The difference between the original picture and the one with data embedded in it is still evident. This difference tends to be decreased, because it is a signal for hackers and cryptanalyst’s. There have been some improvements by altering the modified (embedded) picture with a substitution matrix [7-11, 12, 17, 18]. However, the unwanted effect is the increased complexity of computational procedures. Strengthening the proposed solutions [17, 18] with an existing cryptographic algorithm AES (Advanced Encryption Standard) can partially improve this unwanted effect. Because AES has a K level security from the aspect of cryptography (all known cryptanalytic attacks have an equal chance to compromise the algorithm), the offered increase in security of data transfer with a combination of the AES algorithm and steganography [17, 18], can be seen as a overdo due to the lack of knowledge about the degree of commercial implementations of quantum computers by the scientific community. Beside the inevitable difference between the original image and that one with protected content [5-12, 17, 18], most similar researches have a disadvantage of focusing only on the protecting and transmitting stages of the process. The process of receiving the message and the removal of the protection is, however, equally important. Looking at this from an aspect of complexity and implementation, these processes are considered symmetric. It is known that the process of decryption, considering AES, is considerably more demanding then the process of encryption. What about the cancelation of the substitution matrix on the receiving end? Does this process have an equal level of complexity as the generation of the same matrix, generated on the transmitting end to reduce distortion of the original picture? The authors of the noted solutions [11-18] analyze problems only on the application level of the OSI system. What if some bits change during the transfer of data through the network due to interference and noise? What is the robustness of the offered solutions in these conditions? Is it possible to introduce improvements that will make the system more robust in this sense? Is it possible to find a solution for injecting data in a picture, so that the picture is not deformed? Does it always have to be the LSB region, as the region to position the information that should be transmitted? Section II will bring a brief summary of enhancements obtained from the result in [17, 18] alongside the open questions which are also listed. Then, section III will elaborate

Transcript of [IEEE 2012 IX International Symposium on Telecommunications (BIHTEL) - Sarajevo, Bosnia and...

Page 1: [IEEE 2012 IX International Symposium on Telecommunications (BIHTEL) - Sarajevo, Bosnia and Herzegovina (2012.10.25-2012.10.27)] 2012 IX International Symposium on Telecommunications

2012 IX International Symposium on Telecommunications (BIHTEL)October 25-27, 2012, Sarajevo, Bosnia and Herzegovina

978-1-4673-4876-8/12/$31.00 ©2012 IEEE

A new approach to relatively short message steganography

Angel Sanchez1, Aura Conci2, Ensar Zeljkovic3, Narcis Behlilovic3, Vedran Karahodzic3

1Departamento de Ciencias de la Computation (Univ. Rey Juan Carlos), 28 933 Mostoles (Madrid),Spain 2Instituto de Computação (Universidade Federal Fluminense), 24210-240, Niterói (Rio de Janeiro),Brazil

3Faculty of Electrical Eng., University of Sarajevo, Sarajevo, Bosnia and Herzegovina [email protected]

Abstract – The rapid growth of Internet users and the increasing range of data types which are exchanged over this network (video, audio, text messages...), emphasize the security problem that this way of communication has.

The flood of multimedia contents in the structure of transmitted data has made the appearance of images in this network quite normal. This revived the use of steganography (hiding data within images) in order to hide information to avoid unauthorized access.

A much used technique for this purpose, the LSB (Least Significant Bits) technique, still leads to visible changes in the original image, which was chosen to be the message carrier. These differences make quite a path for a cryptanalyst to doubt the authenticity (independence) of the picture itself.

However, by using GA (Genetic Algorithm), the differences between the original image and the image embedded with secret data can be reduced.

However, the difference between the original image and the image with embedded information still remains, while the achieved improvements are paid with an increase of computational complexity. Naturally a question arises: Can the image be embedded with information in a way so that it does not undergo any changes?

Most fast responses would be that it is not possible. This paper shows that this is in fact possible.

Keywords - Steganography; LSB Substitution; Image Encryption; Genetic Algorithm; Path Relinking; Information Security; Data Protection

I. INTRODUCTION The expansion of data transmission through Internet made

the process of improving data protection inevitable. The broadband Internet access has enabled a bigger flow of video content through this network. The path of steganography is once again widely open. When data protection is based only on steganography, it is often related to the LSB technique. The difference between the original picture and the one with data embedded in it is still evident. This difference tends to be decreased, because it is a signal for hackers and cryptanalyst’s. There have been some improvements by altering the modified (embedded) picture with a substitution matrix [7-11, 12, 17, 18]. However, the unwanted effect is the increased complexity of computational procedures. Strengthening the proposed

solutions [17, 18] with an existing cryptographic algorithm AES (Advanced Encryption Standard) can partially improve this unwanted effect. Because AES has a K level security from the aspect of cryptography (all known cryptanalytic attacks have an equal chance to compromise the algorithm), the offered increase in security of data transfer with a combination of the AES algorithm and steganography [17, 18], can be seen as a overdo due to the lack of knowledge about the degree of commercial implementations of quantum computers by the scientific community. Beside the inevitable difference between the original image and that one with protected content [5-12, 17, 18], most similar researches have a disadvantage of focusing only on the protecting and transmitting stages of the process. The process of receiving the message and the removal of the protection is, however, equally important. Looking at this from an aspect of complexity and implementation, these processes are considered symmetric.

It is known that the process of decryption, considering

AES, is considerably more demanding then the process of encryption. What about the cancelation of the substitution matrix on the receiving end? Does this process have an equal level of complexity as the generation of the same matrix, generated on the transmitting end to reduce distortion of the original picture?

The authors of the noted solutions [11-18] analyze

problems only on the application level of the OSI system. What if some bits change during the transfer of data through the network due to interference and noise? What is the robustness of the offered solutions in these conditions?

Is it possible to introduce improvements that will make the

system more robust in this sense? Is it possible to find a solution for injecting data in a

picture, so that the picture is not deformed? Does it always have to be the LSB region, as the region to position the information that should be transmitted?

Section II will bring a brief summary of enhancements

obtained from the result in [17, 18] alongside the open questions which are also listed. Then, section III will elaborate

Page 2: [IEEE 2012 IX International Symposium on Telecommunications (BIHTEL) - Sarajevo, Bosnia and Herzegovina (2012.10.25-2012.10.27)] 2012 IX International Symposium on Telecommunications

the idea of finding a solution for injecting information data in an image, without deforming the image.

A comparative analysis of experimental results, elaborated

in section II and III, is given in section IV. The concluding remarks in section V, comment the achieved experimental results and indicate the future directions of research.

II. LSB SUBSTITUTION – A HYBRID OF GENETIC AND PATH RELINKING ALGORITHMS FOR STEGANOGRAPHY

Substitution matrix is used to achieve the reduction of observed differences between the original image and the original image, which is complemented by data. But LSB substitution in its every variant leaves recognizable statistical clues.

In case if, in accordance with LSB method, k least significant bits in the original image are replaced, it is still possible to design even (2k)! different substitution matrices. That creates a new problem: how to decide which of these substitution matrices is the best solution?

In papers [17, 18], authors presented some possibilities in choosing the favorable substitution matrix.

It is designed with a hybrid heuristic approach, based on a combination of Genetic Algorithm and Path Relinking approaches, additional reductions of the observed discrepancies are possible.

Each of the proposed solutions still lead to the correction of the original LSB bits. If the LSB bits of the modified image are extracted, then some certain irregularities (deformations) can be registered. Such information is enough of a signal for stegoanalytics to pay extra attention on that image.

III. PROPOSED APPROACH – METHOD OF LINEAR ALGEBRA (MLA)

Every digital picture can be interpreted in 2D with an appropriate array of pixels. Eight bits are usually reserved for every pixel, if they are related to a grayscale images. For color images, 24 bits per pixel are typically used. In this context, the symbols associated with individual pixels can be systematized in a rectangular matrix form, where the increased dimensions of such a matrix is an indication of a better image quality (e.g. HD television uses the 1280 × 720 and 1920 × 1080 pixel resolutions). Images sent within electronic communications usually don’t have such high resolutions (the 512 × 512 pixels resolution is more than enough). The initial idea to embed data with an attribute of secrecy into an image was initially based on the LSB method. The differences between the original image and the one formed after the substitution of the LSB bits (with data bits that are intended to be kept secret) are easily detectable with algorithms of steganalysis. Therefore, procedures for reducing these differences were developed in [7-11,18]. Using GA algorithm (Genetic Algorithm) or a combination of GA and PR (Path Relinking) algorithms, the level of difference between the original and stego image can be significantly reduced.

Regardless of the evident positive changes in all these newly created solutions, the difference between the original

and modified image still exists. In these circumstances, one question naturally emerges: Is it possible to hide a message in an image in a way that no changes appear in the image?

With the current results of known steganography techniques, most respondents would, without much hesitation, answer: No, it is not possible.

But it is worth to ask if there is a possibility to answer yes.

The pixels from the original image are associated with a matrix, formally marked S, while A is a sub matrix, which can be located in any part of the S matrix (elements of submatrix A are also elements of original matrix S). It should be noted that this feature will significantly contribute to the complexity of the steganalysis procedure, compared to the solutions which are primary based on the LSB method (according to the LSB method submatrix A must be located in precisely defined part of the matrix S).

Let the message that should be embedded in the original picture, can be represented through the elements of a matrix P. In order to maintain the simplicity of the following procedures, let matrix A and P should have identical dimensions, although that is not a requirement for the algorithm as it will be shown later. Based on the matrix A and P shown in the matrix equation in Figure 1, it is easy to form the X matrix on the transmitting side. Similarly, as shown in Figure 1, based on the received matrix X and matrix A, matrix P can be calculated on the receiving side, so the initial message is obtained, if X is known on the receiving side.

It should be noted, that alongside the original image, matrix X and the information of submatrix’s A location (and dimension) in matrix S should be forwarded from the transmitting to the receiving side. The information about the position of submatrix A inside matrix S, must remain a secret during transport and, therefore, must be hidden, i.e. disguised. On the other hand, it is not necessary to specially protect matrix X, because, for each potential interceptor, its data represents disjointed symbols without special meaning.

Figure 1. Base matrix equations of the MLA method

By closely examining the structure of Figure 1, it becomes clear why no modification to the image are expected using the MLA method (one part of the image will be used to form matrix X and that’s the only role of the image in the given method). Since the image is not modified, it will not look suspicious to a cryptanalyst. Even by using appropriate computer methods, determining that there was a modification to the image is very hard.

An additional advantage of the MLA method is the independency in the selection of submatrix A, namely the bits of the image, which will be used to calculate matrix X.

Page 3: [IEEE 2012 IX International Symposium on Telecommunications (BIHTEL) - Sarajevo, Bosnia and Herzegovina (2012.10.25-2012.10.27)] 2012 IX International Symposium on Telecommunications

Potential problems of the proposed MLA method, when superficially analyzed, can be:

a) the appearance of negative symbols while solving the presented matrix equation

b) the unambiguity of matrix X

Therefore, in the continuation of this paper, these topics are particularly elaborated.

A. How to eliminate “negative bits” in the MLA method Considering an image by its pixels and by its matrix

interpretation, all symbols within it, have numerical values between 0 and 255. Matrix [A] and [P] should have same dimensions. If this is not the case, the task of a potential attacker is further complicated, however, the exchange of data between the sender and receiver is also becoming more complex.

Therefore, assuming no additional restrictions, matrix [X] will be unambiguously determined by the equations in Figure 1.

The following equation is valid for its individual elements: , , , (1)

In equation (1), i and j respectively correspond to the row and column locations, in the corresponding matrix. Since matrix P is the message matrix, under the hypothesis that the message is encoded according to the ASCII table, the values of its symbols will remain in the pre-defined set of values of matrix A.

Certain elements of matrix X (which is transmitted from the “transmitter” to the “receiver”) could, due to the equation set, take values outside of the specified interval [0,255]. Specifically, if there is a position (i,j) where: , , (2)

then , 0 (3)

However, this situation can be easily avoided by using the binary representation of the matrix elements. Therefore, all symbols in matrices A and P will be represented as binary values 0 and 1. Using the algebra module 2 operations, the following rules for addition and subtraction are valid: 0 0 0; 0 1 1; 1 0 1; 1 1 0 0 0 0; 0 1 1; 1 0 1; 1 1 0 (4)

Therefore, matrix X (regardless of the structure of matrices P and A) can also contain values 0 and 1 only, just like the initial matrices A and P.

B. The unambiguity of the MLA method One of the features of the MLA method is that it equally

considers problem on the transmitting and receiving end, problems related to the information that should be hidden from the public. Additionally, this question naturally imposes: How to prove that the person on the receiving side (in accordance with the MLA method) can unambiguously decrypt the transferred message?

At an arbitrary point (i,j) on the transmitting side, according to the MLA method, the following operation was performed:

, , , 2 (5)

On the receiving side (where the original message is reconstructed according to Figure 1.), the following equation is valid for each individual element:

, , 2 (6)

This, according to equation (1), is:

, , , 2 2 (7)

If , , , 0,1 (8)

then equation (7) becomes:

, , 2 , 2 , (9)

On the other hand, if , , , 0,1 (10)

which is an isolated case because only here the value obtained by subtracting is different from when the operation mod 2 is applied, equation (7) becomes: 0 0 1 2 2 0 1 2 1 (11)

Therefore, in any case, the same bits were obtained on the receiver side as in the original message, which concludes that the reproduction of the message was successful.

The accuracy of the above conclusion was tested on several simple messages, like:

(1) “This is message.”; (2) “Faculty of Electrical Engineering”; (3) “Cryptography and system security”, with the help of MATLAB.

IV. EXPERIMENTAL RESULTS Experimental results were formed embedding the three

messages into the original picture, shown in Figure 2. Embedding each message was analyzed in four ways: in conditions where the message was embedded in the original image using the LSB method, firstly substituting one bit, then two and finally substituting three bits. The fourth variant was created as a result of embedding the selected message in the original image, but using the advocated MLA approach.

Figure 2. The original image in which messages will be written

Page 4: [IEEE 2012 IX International Symposium on Telecommunications (BIHTEL) - Sarajevo, Bosnia and Herzegovina (2012.10.25-2012.10.27)] 2012 IX International Symposium on Telecommunications

After the described four tests were done, for each message, the amount of different pixels (in the original and embedded image) was analyzed. The results (in terms of pixel changes) are summarized in Table I and Table II.

TABLE I. DIFFERENT VARIANTS OF INSERTING THE MESSAGES INTO THE PICTURE AND THE NUMBER OF CHANGED PIXELS IN THE IMAGE

Inserting messages into the picture

The number of changed pixelsLSB 1

bit LSB 2

bits LSB 3

bits MLA

Message 1 34 29 20 0 Message 2 72 52 33 0 Message 3 79 46 33 0 Average 61.67 42.33 28.67 0

TABLE II. COMPARATION OF DIFFERENT VARIANTS BY THE PERCENTAGE NUMBER OF PIXELS THAT ARE CHANGED

Inserting messages into the picture

The percentage of changed pixels (%) LSB 1

bit LSB 2

bits LSB 3

bits MLA

Message 1 26.56 22.65 15.63 0 Message 2 27.27 19.69 12.5 0 Message 3 30.86 17.97 12.89 0 Average 28.23 20.1 13.67 0

The results presented in tables I and II confirm that the proposed MLA approach does not lead to any change to the original image, while making the embedded information available to the recipient.

V. CONCLUSION A new approach to injecting a relatively short message into

an image was presented in this paper. The proposed approach, without a substantial increase in computational complexity, succeeds to conceal data in an image without causing any changes to the original image. This way, the new approach managed to achieve improved results, which have been achieved using the LSB method or the advanced LSB method incorporating the genetic algorithm and path relinking. Unlike similar previous considerations, this paper equally treats both sending (hiding) of the message as well as receiving (reconstructing) the original message.

In future studies, it would be useful to focus more attention on:

a) how to improve the safety of sending the X matrix? b) how to forward the information of the position of the

matrix A in the S matrix, in order to reconstruct the data matrix P using A and X?

c) analyze the effect the size of the data P matrix has on the efficiency of the MLA approach.

REFERENCES [1] J.K. Jan, Y.M. Tseng, “On the security of image encryption method”,

Inform. Process Lett 60, 1996, pp. 261-265. [2] N. Bourbakis, C. Alexopoulos, “Picture data encryption using scan

patterns”, Pattern Recognition 25, 1992, pp. 567-581. [3] H.J. Highland, “Data encryption: a non-mathematical approach”,

Comput. Security 16, 1997, pp. 369-386. [4] N Behlilovic, M Hadzialic, A Mustafic, ''Application of Advanced

Encryption Standard in Virtual Private Networks Implementation'', International Scientific and Professional Meeting-The Life and Work of Nikola Tesla“, Zagreb, June 2006, Croatia

[5] W. Bender, D. Gruhl, N. Morimoto, A. Lu, “Techniques for data hiding”, IBM Systems J 35 (3, 4), 1996, pp. 313-336.

[6] Z. Duric, M. Jacobs, S. Jajoolia, “Information hiding: Steganography and steganoanalysis”, Handbook of Statistics 24, 2005, pp. 171-187.

[7] Ran-Zan Wang, Chi-Fang Lin, Ja-Chen Lin, “Image hiding by optimal LSB substitution and genetic algorithm”, Pattern Recognition 34 (3), 2001, pp. 671-683.

[8] J. K. Hao, E. Lutton, E. Ronald , M. Schoenauer, D. Snyers, “A Template for Scatter Search and Path Relinking”, Lecture Notes in Computer Science, 1363, 1997, pp. 13-54. Updated and extended in February, 1998 by Fred Glover.

[9] M. Laguna. “Scatter search and path Relinking: methodology and applications.”Available at http://leeds faculty.colorado.edu/laguna/presentations/puebla.ppt.

[10] Y. Rochat, É. D. Taillard, “Probabilistic diversification and intensification in local search for vehicle routing”, Journal of Heuristics 1, 1995, pp. 147 – 167. Available at: http://ina2.eivd.ch/collaborateurs/etd/articles.dir/crt95_13.pdf

[11] C. R. Reeves, “Genetic algorithms, path relinking and the flowshop sequencing problem”, Evolutionary Computation Journal (MIT press) 6 (1), 1998, pp. 230-234.

[12] Chin-Chen Chang, Ju-Yuan Hsiao and Chi-Shiang Chan, “Finding optimal least-significant-bit substitution in image hiding by dynamic programming strategy”, Pattern Recognition 36 (7), 2003, pp. 1583-1595.

[13] Yu-Chen Hu, “High-capacity image hiding scheme based on vector quantization”, Pattern Recognition 39, 2006, pp. 1715-1724.

[14] Shiuh-Jeng Wang, “Steganography of capacity required using modulo operator for embedding secret image”, Applied Mathematics and Computation, 164 (1), 2005, pp. 99-116.

[15] Chih-Ching Thien and Ja-Chen Lin, “A simple and high-hiding capacity method for hiding digit-by-digit data in images based on modulus function”, Pattern Recognition 36 (12), 2003, pp. 2875-2881.

[16] G. Boato, V. Conotter, F. G. B. De Natale, and C. Fontanari, “A joint asymmetric watermarking and image encryption scheme” SPIE 2008.

[17] A. Brazil, Path Relinking and Aes Cryptography in Color Image Steganography, M. Sc. Dissertation, Computer Institut, UFF (avaiable at http://www.ic.uff.br/PosGraduacao/lista_dissertacao.php?ano=2008

[18] A L Brazil, A Sanchez, A Conci, N Behlilovic « An Hybrid of Genetic and Path Relinking Algorithms for Steganography” , , Proceedings of 53st International Symposium ELMAR 2011, pp. 285, Zadar, Croatia