REPORT.pdf

CONTENTS

CHAPTER 1 INTRODUCTION

CHAPTER 2 LITERATURE REVIEW

2.1 Introduction

2.2 Modern Cryptosystem

2.3 Information Theoretic Approach

CHAPTER 3 SPURIOUS KEYS AND UNICITY DISTANCE

3.1 Introduction

3.2 Spurious keys Analysis

3.3 Spurious Keys Analysis Using Natural Language Model

3.3.1. Introduction

3.3.2. Code Points Mapping Technique

CHAPTER 4 TEXT-SPACE BOUNDARY IN STREAM CIPHERS

4.1 Introduction

4.2 Mathematical Modeling

4.2.1 Code-point Multimapping

4.2.1.1 Spurious Keys Analysis Using Natural Language Model

4.3 Index Mapping

4.4 A Security Model for Numeric-String

CHAPTER 5 CONCLUSIONS

CHAPTER 1

INTRODUCTION

INTRODUCTION

Cryptography is used everywhere for secured communication. It basically implements

algorithm to encrypt and decrypt data using a secret key. The difficulty in retrieving the original

message from its corresponding cryptogram without knowing the secret key determines level of

security offered by the system. The strength of cryptography can be measured by understanding the

need of computational height to break the algorithm and via information theoretic approach.

Complexity of algorithm and size of key are the two factors determining the strength of

cryptography but tradeoff with the performance parameter of the system. Information theoretic

approach deals with different statistical parameters to measure the strength of cryptography on the

basis of property of text-space.

The characteristics of the encrypted/decrypted texts is a matter of concern regarding modern

crypt-system. The plain text encrypted using modern cryptographic algorithm produces a cipher text

which has no language property or readability and can easily be marked. Similarly, when a cipher

text is decrypted using a incorrect key, the decrypted output simply doesn't look like a text. This

characteristics assists adversaries to easily mark the secret text which is encrypted and comfortably

proceed towards the unique solution by elimination of incorrect keys. There exists certain set of

keys within the key-space which gives text-like-text as decrypted output and these keys are spurious

keys. Spurious keys can't be easily eliminated by cryptanalysis. The attacker require certain

background information or other means to deal with these spurious keys. The set of spurious keys

associated with the security transaction uplifts the level of secrecy in a secure communication.

The number of spurious keys is larger for small size message texts. The set of spurious keys

gradually becomes narrower as the size of the text increases. The is a point to the size at which the

probability of spurious keys is zero. It obtained by probabilistic approach is defined as the unicity

distance. Unicity distance give size of the text for which it is likely to be a single intelligible plain

text decryption when a brute-force attack is attempted. If the unicity distance tends to infinity then

the cipher is practically unbreakable with the brute-force technique. The unicity distance is

inversely proportional to redundancy of the text-space. The redundancy of text-space deals with the

those elements in the text space which when eliminated from the space does not effect the

informational value associated with texts corresponding to that text space. For example, for

alphanumeric texts all 1-byte data other then associated with alphanumeric characters are

considered as redundant data. The redundancy also deals with the characteristics of language; some

of the content when filtered from message still allows to retain the complete information of the

message.

The number of spurious keys remains a strong factor to strengthen cryptography for a short

messages. According to Shannon's theory, if the cipher text is equiprobable to all message texts

then the system is a perfect system. It means, if the number of spurious keys is fairly large then the

system tends to perfection. The use of natural language model helps to strengthen cryptography

which has been discussed in many papers. Natural language used for encryption by using utf-8

encoding standard for implementation may lead to depreciation of spurious keys because of the 3-

bytes weight of the character code-points. A fair implementation would be code-points mapping

technique to implement the language model. Code-point is a unique number assigned to the atomic

character of the script. The number reflect the information regarding the type of the language and

the information about the specific character. The part of the information dealing with the script say

language tag, can be extracted out which remains common to all characters of the text. And, the part

which deals with the specificity of the character is mapped to a 1-byte value which is encrypted.

Cipher text is obtained by inserting the language tag to the encrypted data. The process is reverted

for decryption process. This approach helps to fairly analyze the natural language model and trace

out the strength of cryptography. Unicity distance is comparatively bigger for natural language

model compared to English.

The plain text space is generally limited to the characters associated with the language but

the encrypted and decrypted texts can float through any 1-byte value as-far-as modern cryptosystem

is concerned. This leads to a depreciation of spurious keys. If a boundary could be set to the

encrypted-decrypted text space then the possibility of having text-like-texts at random encryption

and decryption would fairly increase. Cryptanalysis of One-time-Pad using number of plaintext

cihertext pairs gives a vision to shrink the boundary of the encrypted-decrypted text space for

stream ciphers. This comes up with a fairly longer unicity distance for cryptosystem. The possibility

of boundary to text spaces can uplift modern cryptosystems to a level of beyond cipher-text only

attack.

CHAPTER 2

LITERATURE REVIEW

2. LITERATURE REVIEW

2.1 INTRODUCTION

Several issues are growing up regarding the secrecy of information that travel in a huge mass

over the internet throughout the world. Security of information has become a public demand with

issues related to mass surveillance and attacks that has been revealed to the world. Cryptography is

a major tool concerned with security over internet network. It exists everywhere as far as the

secrecy is concerned.

There has been explosive growth in unclassified research for issues related to strengthening

cryptography. Different approaches are available to measure the strength of either block ciphers or

stream ciphers. Cryptanalysis techniques are basically used to estimate the strength of the algorithm.

Many crypt-systems that are thought to be secured were broken and a variety of tool that are useful

in cryptanalysis are developed The language to describe the security system basically relies on the

discrete probability.

2.2 MODERN CRYPTOSYSTEM

Most of the ciphers that have been examined are not really cryptographically secure,

although they may have been adequate prior to the widespread availability of computers. A

cryptosystem is called secure if a good cryptanalyst, armed with knowledge the details of the cipher

(but not the key used), would require a prohibitively large amount of computation to decipher a

plaintext. This idea (that the potential cryptanalyst knows everything but the key) is called

Kerckhoff's Law or sometimes Shannon's Maxim.

Kerckhoff's Law at first seems unreasonably strong rule; given a mass of encrypted data,

how is the cryptanalyst to know by what mean it was encrypted? Today, most encryption is done by

software or hardware that the user did not produce himself. One can reverse engineer a piece of

software (or hardware) and make the cryptographic algorithm apparent, so we cannot rely on the

secrecy of the method alone as a good measure of security. For example, when you make a purchase

over the internet, the encryption method that is used between our browser and the seller's web

server is public knowledge. The security of the transaction depends on the security of the keys.

There are several widely used ciphers which are believed to be fairly secure. Probably the

most commonly used cipher was DES (the Data Encryption Standard), which was developed in the

1970s and has been adopted as a standard by the US government. The standard implementation of

DES operates on 64-bit blocks (that is, it uses an alphabet of length 264-- each ``character'' is 8

bytes long), and uses a 56-bit key. Unfortunately, the 56-bit key means that DES, while secure

enough to thwart the casual cryptanalyst, is attackable with special hardware by governments, major

corporations, and probably well-heeled criminal organizations. One merely needs to try a large

number of the possible 256 keys. This is a formidable, but not insurmountable, computational effort.

A common variation of DES, called Triple-DES, uses three rounds of regular DES with three

different keys (so the key length is effectively 168 bits), and is considerably more secure. DES was

originally expected to be used for 'only a few years' when it was first designed. However, due to its

surprising resistance to attack it was generally unassailable for nearly 25 years. The powerful

methods of linear and differential cryptanalysis were developed to attack block ciphers like DES.

In the January of 1997, the National Institute for Standards and Technology (NIST) issued a

call for a new encryption standard to be developed, called AES (the Advanced Encryption

Standard). The requirements were that they operate on 128-bit blocks and support key sizes of 128,

192, and 256 bits. There were five ciphers which advanced to the second round: MARS, RC6,

Rijndael, Serpent, and Twofish. All five were stated to have ``adequate security''- Rijndael was

adopted as the standard in October of 2000.

Other commonly used ciphers are IDEA (the International Data Encryption Algorithm,

developed at the ETH Zurich in Switzerland), which uses 128-bit keys, and Blowfish (developed by

Bruce Schneier), which uses variable length keys of up to 448 bits. Both of these are currently

believed to be secure. Another common cipher, RC4 (developed by RSA Data Security) can use

variable length keys and, with sufficiently long keys, is believed to be secure. Some versions the

Netscape web browser used RC4 with 40 bit keys for secure communications. A single encrypted

session was broken in early 1995 in about 8 days using 112 networked computers; later in the same

year a second session was broken in under 32 hours. Given the speed increases in computing since

then, it is reasonable to believe that a 40-bit key can be cracked in a few hours. Notice, however,

that both of these attacks were essentially brute-force, trying a large fraction of the 240 possible

keys. Increasing the key size resolves that problem. Nearly all browsers these days use at least 128-

bit keys.

2.3 INFORMATION THEORETIC APPROACH

A cryptosystem has perfect secrecy if for any message x and any encipherment y, p(x|

y)=p(x). This implies that there must be for any message, cipher pair at least one key that connects

them. According to Shannon's theory , Suppose a cryptosystem with |K|=|C|=|P|. The cryptosystem

has perfect secrecy if and only if

each key is used with equal probability 1/|K|,

for every plaintext x and ciphertext y there is a unique key k such that e_k(x)=y.

There are certain issues which has to be addressed theoretically concerning the strength of a

cryptosystem which is mentioned in point below:

The immunity of a system to cryptanalysis when the cryptanalyst has unlimited time and

manpower available for the analysis of cryptograms

Does a cryptogram have a unique solution (even though it may require an impractical amount of

work to find it),

How much text in a given system must be intercepted before the solution becomes unique,

Are there systems for which no information could be extraced out whatever is given to the

enemy no matter how much text is intercepted

In the analysis of these problems the concepts of entropy, redundancy, unicity distance and

the like developed in A Mathematical Theory of Communication .

Shannon's entropy represents the amount of information the experimenter lacks prior to

learning the outcome of a probabilistic process. According to Shannon's formula, a message's

entropy is maximized when the occurrence of each of its individual parts is equally probable. The

entropy of a natural language is a statistical parameter that measures, how much information is

produced on an average for every letter of a text in the language . The minimum number of bits

required to encode all possible meanings of that message is the amount of information in a message.

In cryptography unicity distance is the length of an original cipher text needed to break the

cipher by reducing the number of possible spurious keys to zero in a brute force attack. That is, after

trying every possible key, there should be just one decipherment that makes sense, i.e. expected

amount of ciphertext needed to determine the key completely, assuming the underlying message has

redundancy.

CHAPTER 3

SPURIOUS KEYS

&

UNICITY DISTANCE

3. SPURIOUS KEYS AND UNICITY DISTANCE

3.1 INTRODUCTION

Shannon proposed information theoretic approach in his paper, Communication Theory of

Secrecy Systems, which explain different parameters related to the strength of cryptography. The

resistance to brute-force attack or cipher text only attack is increased if there exists large number of

text-like-texts corresponding to a cipher text. The keys which give text like text are spurious keys.

The modern cryptosystem provides fair number of spurious keys to short texts of smaller size of less

then 10 characters. This indicates that systems are more resistant to cipher text only attack to short

messages. The probability of having spurious key during random decryptions depends on the

distribution of text-like-texts in text-space. There is limitation by number of characters associated to

a language script to determine text-like-texts. An English text has alphanumeric characters with

some special characters but encrypted/decrypted texts has possibility to map a character to any 1-

byte value resulting to transformation which does not look like a text. The set of spurious keys

filters out invalid decryptions and leaves only text-like-texts from all possible decryptions. The

difficulty in elimination of keys and sorting out the unique solution exists with spurious keys which

can't be solved by simple attack mechanism.

The number of spurious keys gradually decreases and reaches to certain negligible point

with greater text size. There is a term which defines the threshold at which the probability of having

spurious keys is negligible, unicity distance. Consider a random 1 byte key (k K,256 possible

key) encrypts (xor) 1 letter (from 26-alphabet). The probability that the decrypted text is valid (an

alphabet) is 26/256. The probability that one of the keys is spurious would be 1/256. This

probability is attained when minimum of 3 characters are considered i .e . (26/256)^3 < 1/256. This

implies if the size of the text is 3, then none of the 256 keys would give text with alphabets as

decrypted text, which is the unicity distance. The number of possible 1-byte values in text-space

other then 26 alphabets are redundancy to the text space considered above. This redundancy is

inversely related to the unicity distance. Had there been no redundancy in text space, unicity

distance would be infinity.

3.2 SPURIOUS KEYS ANALYSIS

The analysis of spurious keys proposed in this thesis relies on discrete probability. The

observation of random decryptions applying random keys and brute-forcing using numeric keys

holds the Proof of Concept to this statistics. The discrete probability approach is applied with

consideration of uniform distribution and definition of the text spaces and desirable events. The

plaintexts has the limitation by the number of characters to used. Let us consider, English alphabets

(nA=26) are allowed set of characters to plain text. The encrypted/decrypted text can have any 1-

byte value(nU=n{0,1}8 =256). The probability that the decrypted text is text-like-text is called event.

Probability of event for 1-character text is simply 26/256 as there are 26 favorable conditions from

256 possibilities. The probability of events of different text size is presented in the Table 3.2.a .

S.No. Text-Size P.E

1 1 0.1016

2 2 0.0103

3 4 1.06*10 - 4

4 8 1.13*10 - 8

5 16 1.28*10 - 16

Table 3.2.a: Probability of Event for different English text-size

The probability of event to occur during random decryption is fairly large with smaller text

size and decreases gradually with size which is clearly illustrated in Table 3.2.a . With the text size

greater then 25 the probability of event happens to be of less the 2 -80 which is a negligible value

and happens not to occur in the life time. This point gives the unicity distance. The events discussed

here is directly associated with the probability of meeting spurious keys during random decryptions.

The Proof of Concept is based on the millions of decryptions performed on different modern

cryptographic algorithms. The decryptions performed were based on two approaches: random

decryption for intervals of time using random keys (alphabet-keys, numeric-keys, alpha-numeric

keys) and brute-forcing using all possible numeric-keys. The observations hold strong as a proof of

concept to the proposed statistics with minor variation within tolerable range.

The Proof of Concept (PoC) based on the observation of spurious keys with random

decryption approach on ARC4 algorithm for different keys sizes is illustrated in Table 3.2.b. The

observation is based on the plain texts with text space of 62- alphanumeric characters. The

decryption is carried out in number of random alphanumeric texts. The random alphanumeric keys,

random alphabet-keys and random numeric keys of size 8-bytes and 16-bytes were taken for

random decryptions. Similar Proof of Concept is presented in table 3.2.c which is based on the

observations in 8-byte block ciphers with text size 8-byte. The observation was performed using

random decryption and brute-forcing using 28 possible numeric keys.

S.No. Text-Size P.E 8 Bytes key 16 Bytes key

1 1 0.2421 0.2423 0.2426

2 4 3.44*10-3 3.46*10-3 3.47*10 -3

3 8 1.18*10-5 1.09*10-5 1.06*10 -5

Table 3.2.b Proof of Concept based on analysis of ARC4 Algorithm

S.No. Algorithm P.E (x*10-5)

1 Event Distribution 1.18

2 ARC4 1.08

3 DES 1.12

4 DES3 0.99

5 Blowfish 1.20

Table 3.2.c: PoC based on analysis on different algorithms for 8 bytes text size

Figure 3.2.a: Variance on analysis on different algorithms for 8 bytes text size

3.3. SPURIOUS KEYS ANALYSIS USING NATURAL LANGUAGE MODELS

3.3.1 INTRODUCTION

The natural languages needs certain encoding standard to feed to modern cryptosystem for

encryption. Utf-8 encoding standard is widely popular as it accepts more then 1 million characters

and leave no change on existing ASCII standard characters. The weight of the character codepoints

of indian languages are 3-bytes. The heavier weight of the characters of indian scripts gives the

possibility of larger universe to encrypted-decrypted texts and thus leading less probability of events

to occur. The probability of events to occur based on different text size for Devanagari script is

shown in table 3.3.a. It is clear that the unicity distance for Devanagari text is approximately 5

which is far less then that of English text.

Event Distribution ARC4 DES DES3 Blowfish0

0.2

0.4

0.6

0.8

1

1.2

PoC (Text Size = 8 bytes)

P.E (x*10-5)

Algorithms

varia

nce

at 1

0-^5

rang

e

S.No. Text-Size P.E

1 1 7.56*10-6

2 2 5.73*10-11

3 4 3.28*10 - 21

4 8 1.08*10 - 41

Table 3.3.a Probability of Event for Devanagari Script implementing UTF-8 Encoding

The code-points associated with the characters can be mapped to a 1byte value for

encryption and decryption which is discussed in sub-topic 3.3.2.

3.3.2 CODE-POINT MAPPING

Code-point mapping is a method proposed in this paper for fair implementation of natural

language models. Basically, code-point mapping techniques works for the languages which has a

maximum of 256 character codepoints associated to the script. The codepoint of a character is a

number in hexadecimal. This codepoint basically carries two information; information of the

language and specificity of the character. For example, the codepoint value of a character 'ka' ' '

is 0x0915. Here '0x09' part of the number remains through out the script and '15' specifies the

character. So, the part ( '15') of the number can be mapped to a 1-byte value ('\x15'). The mapped

value is encrypted and the extracted part ('0x09') can be appended back to the encrypted data to

obtain the cipher. The procedure of encryption and decryption is clearly illustrated in figure 3.3.2a.

Figure 3.3.2a Implementation of Code-point Mapping for Encryption and Decryption

The probability of event for Devanagari text after the implementation of codepoint mapping,

is fairly larger than for English. Table 3.3.2a provides the proof of concept for probability of events

for Devanagari Script. The unicity distance where the probability of event becomes negligible

(less than 2-80) is 82 which is fairly greater compared to English. Table 3.3.2b shows the comparison

between the number of spurious keys for Devanagari, Bengali and English (alphanumeric text) with

respect to different text size. Figure 3.3.2 a is the 3-D bar graph representation of the comparison

which clearly provides two observations; a. The probability of having spurious keys at random

decryption is fairly high to Devanagari script and the probability of the event decreases gradually

with the text size.

S.No. Text

P.E ARC4 DES

1 4 0.0606 0.0603 X

2 8 3.7*10 -3 3.6*10-3 3.69*10-3

3 16 1.34*10 -5 1.3*10 -5 1.62*10 -5

4 32 1.81*10-10 __ __

Table 3.3.2a PoC of P.E for Devanagari texts based on analysis using ARC4 and DES algorithms

S.No. Text Size

Devanagari Bengali English

1 4 0.0606 0.0167 3.4*10-3

2 8 3.7*10-3 2.8*10-4 1.2*10-5

3 16 1.3*10-5 7.7*10-8 1.4*10-10

4 32 1.8*10-10 6.0*10-15 3.3*10-20

Table 3.3.2.b Comparative analysis of spurious keys between Devanagari, Bengali and English(alphanumeric)

Figure 3.3.2.a Comparative analysis of spurious keys between Devanagari and English

4 8 160

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Analysis of Spurious keys

Devnagari vs English

Devanagari English

Text Size

Pro

babi

lity

of E

vent

CHAPTER 4

TEXT-SPACE BOUNDARY

IN

STREAM CIPHERS

4. TEXT-SPACE BOUNDARY IN STREAM CIPHER

4.1 INTRODUCTION

Stream ciphers is the practical application of one-time-pad. It consist of a Pseudo Random

Generator (PRG) which takes the supplied key as a seed to the pseudo random generator. The PRG

generates a long bit sequence which is equal to the text to be encrypted. The encryption is simply

the one-time-pad of the text with generated bit stream. The one-time-pad holds a property with

respect to the text space which can be illustrated with simplicity by considering a text-space with

only two element '\x00' and '\x01' ie. U= {\x00,\x01}n where n is the size of text. The encrypted and

decrypted texts can also be bounded to the same space-limit by applying MOD-2 operation to the

code-points after XOR. This scenario is clearly shown with comparative illustration on existing

model in figure 4.1.a and figure 4.1.b. We can clearly see that the event is likely to occur at every

decryption irrespective of text size for the proposed model. The property holds true for text space

of with set of elements of size 2x where x= 1 to 8 where MOD-2x. is applied to bound the encrypted-

decrypted texts.

K

\x00\x01\x00.... ......

\x00\x01\x00.... ......

K

Figure 4.1.a Simple Block Diagram of Stream Cipher for {\x00,\x01}n text-space

ARC4ENC

ARC4DEC

K

\x00\x01\x00.... \x01\x01\x00...

\x00\x01\x00... \x01\x01\x00......

K

Figure 4.1.b Stream Cipher with MOD 2 operation block for {\x00, \x01}n text space

4.2 MATHEMATICAL MODELING:

Let us define a simple model simple encryption and decryption model for stream cipher by

equation (1) and (2) where m,c,d,k are respectively message, cipher, decrypted text, key and E(),

D() be the encryption and decryption algorithm function which takes two arguments as given.

c = E(m,k) ---------------(1)

d= D(c,k) ------------------(2)

For stream cipher E(m,k) and D(c,k) may be defined as below

E(m,k) = mb OTP Kb --------(3)

where Kb =G(k) and G() is a Pseudo Random Generator which takes key k and generates Kb

D(c,k) = cb OTP Kb -------------- (4)

ARC4ENC

ARC4DEC

%2

%2

This implies,

D(E(m,k),k) = m -------------------(5)

A model is proposed in this thesis which is mathematically defined with limitation of boundary to

text space. If a plain text of size n-byte be defined by

m, where m U={\x00,\x01 . } n and size of the set {\x00,\x01 . } be given by 2p , p

{1,2,...8} , then

c=E'(m,k) = [ (x MOD 2p) : x = byte-value for each byte in E(m,k) ] --------(6)

D'(c,k) = [ (x MOD 2p) : x = byte-value for each byte in D(c,k) ] -----------(7)

case-1:

For p =8, E'(m,k) = E(m,k) ----(8)

D'(m,k)=D(m,k) ------(9) and hence, the model replicates the original stream cipher

system with p=8

case-2:

For p=0, E'(m,k) = D'(m,k) -------(10) and hence, this does not follow general rule of crpytography

The probability of collision increases with smaller value of p along with the condition that

with larger text size (n) the probability of collision is depreciated. Thus, with possible tradeoffs this

model can be implemented to attain large spurious keys and unicity distance.

4.3 CODE-POINT MULTIMAPPING:

The code point mapping approach to cryptography is mentioned in chapter 3, for

implementation of natural language model. The use of boundary to text space in stream ciphers

along with this approach is basically dealt in codepoint multimapping. The code-point

multimapping holds effective for language model with less then 128 characters and so is acceptable

to English and Indic languages. Basically, it is found that the language characters maps either to

upper half or the lower half of the 256 1-byte value. So, the idea of codepoint multimapping is to

map the code-points twice to both the halves symmetrically and then apply encryption limiting the

text-space boundary to half by implementing MOD 128 operation on stream ciphers as explained in

topic 4.2. The universe sinks to half leveraging the probability of having more text-like-texts at

random decryptions. The effect on text space with encryption and decryption model of codepoint

multimapping is illustrated in the figure 4.3.a and 4.3.b respectively.

Figure 4.3.a Encryption Model with Code-point Multimapping

Figure 4.3.b Decryption model with code-point Multimapping

Table 4.3.a illustrates the comparison of probability of event of grneral ARC4 stream cipher

and ARC4 with boundary limited by MOD 128 operation . It is noticed that the number of spurious

keys is increased as the boundary of encrypted decrypted text space shrinks to half.

S.No Text-Size S.C. Without MOD 128 S.C. With MOD 128

1 8 1.2*10-5 3.0*10-3

2 16 1.4*10-10 9.2*10-6

3 32 1.96*10-20 8.4*10-11

4 64 __ 7.1*10-21

Table 4.3.a. Comparatively analysis of Probability of Events in Stream Cipher with and

without MOD 128 operation

4.3.1 ANALYSIS USING NATURAL LANGUAGE MODEL:

The implementation of codepoint multimapping in stream ciphers result in fairly larger

unicity distance. It is observed that probability of events to occur is fair with larger text size like

256 characters and 512 characters. Table 4.3.1.a presents a comparative analysis of probability of

events for codepoint mapping and code point multi mapping in stream cipher. The table is is

illustrated with 3-D bar graph model in figure 4.3.1.b where we can easily see the leverage of

probability of events in case of code-point multimapping.

S.No Text-Size Codepoint Mapping (ARC4)

Codepoint Multimapping(ARC4)

1 8 3.7*10-3 0.94

2 16 1.3*10-5 0.88

3 32 1.8*10-10 0.78

4 64 3.3*10-20 0.60

5 128 __ 0.37

6 256 __ 0.13

Table 4.3.1.a: Comparative Analysis between Encryption Model implementing code-point

mapping and Code-point Multimapping for Devanagari text.

Table 4.3.1.a: Comparative Analysis between Encryption Model implementing code-point

mapping and Code-point Multimapping for Devanagari text.

4.4 INDEX MAPPING

Index mapping is an approach of implementation of stream cipher with boundary to text

space. The implementation of boundary to text in stream cipher allowed anly ist 2P bytes ,p

{1,2...8}, to appear in text space. For example for for p = 2, only ist 22 bytes i.e '\x00', '\x01' ,

'\x02', '\x03' are allowed elements to text-spaces. These number of bytes can be mapped to the

indices of equal set of desirable characters and implemented encryption. An example for p=3 is

presented here.

Let the set of desirable characters be A=['1','2','3','4','5','6','7','8'] now the indices mapping of

8 16 32 640

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Stream cipher with MOD-128 for Devanagari

S.C. Without MOD-128S.C. With MOD-128

Text-Size

P.E

elements to the first 23 8 bytes bounded to text-space is shown below:

'1' '\x00' , '2' '\x01' , '3' '\x02 ' , '4' '\x03' ,

'5' '\x04' '6' '\x05' , '7' '\x06' , '8 '\x07'

Now, the mapped byte-value is encrypted and decrypted using stream cipher with the

boundary to text-space and the cipher texts and decrypted texts are simply characters

corresponding to indices retrieved by reverse mapping of the bytes received after encryption and

decryption with MOD operation. The implementation of the same is illustrated in figure 4.4.a. In

figure 4.4.b , the implementation of index mapping with MOD 32 operation and 32 character set is

presented.

Figure 4.4.a Index Mapping with text space of 8 characters

Figure4.4.b: Index Mapping with text space of 32 characters

The index mapping provides boundary to the encrypted and decrypted text-space so that

every cipher text or decrypted text will have elements belonging to the character space. This

increases the possibility of having numbers of meaningful messages to one cipher text as illustrated

in the figure 4.4.c.

Figure 4.4.c: Spurious Keys Analysis with Index Mapping of 32 Characters

4.5 MODEL OF ENCRYPTION FOR NUMERIC-STRINGS

The numeric string or numeric text is composed of 10 characters belonging to set

string.digits i.e. ['0','1','2','3','4','5','6','7','8','9']. An encryption scheme with 8 numeric characters set

with boundary to text space is mentioned in the earlier topic. There are total 10C8 combinations of 8

numeric characters set. The text space string.digits can be realized as 10C8 =45 unique text-spaces

with 8-numeric characters. The implementation of 45 encryptions with boundary to those 45 text-

spaces would result an equivalent encryption to numeric strings with encrypted-decrypted texts

bounded to string.digits. Figure 4.5.a shows the implementation of numeric string encryption with

boundary. The plain text applied to encryption is the list of phone numbers reported on March 2014.

Figure 4.5.a: Numeric String Encryption with Text Space Boundary

Now, the encryption model described is very difficult to mark or brute-force as each

decryption will leads to a text-like-text as output. The partial encryption of texts can be helpful in

leveraging confusion which is illustrated in figure 4.5.b and figure 4.5.c

Figure 4.5.b: Partial Encryption of text using Numeric-String Encryption Model

(i) (ii) (iii)

figure 4.5.c Data Frame Encryption using Numeric-String Encryption Model (i) Plain Text,

(ii) Cipher Text, (iii) Decrypted Text

CHAPTER 5

CONCLUSIONS

5. CONCLUSIONS

The analysis of spurious keys provides a vision of strengthening cryptography by leading

cryptosystems beyond bruteforce bound. Some of the points were observed with statistics drove by

an approach of random and brute-forced decryptions as a proof of concept in this research work.

The analysis was done in modern cryptographic algorithms which include both block ciphers(DES,

Blowfish) and stream cipher(ARC4).

The probability of having spurious keys during random decryptions depends on the text-size and

text-space (character set associated to the language used).

Considering keys associated with the decrypted texts with elements within the character set of

plain text space, as spurious keys, the probability of spurious keys to occur at random decryption

for 8-byte texts with text space, 26 English alphabets is 10-8 whereas for 8 syllables text from

Devanagari Script text is 3*10-3 when code-point mapping is implemented for encryption.

The probability of event (text like text at random decryption) decreases gradually with text size

and is negligible at certain point which gives unicity distance. Considering 2 -80 as a negligible

probability, the unicity distance for 26 alphabet text is approximately 27 characters while that for

Devanagari text round offs to 81 characters.

The implementation of code-point multimapping hold very effective to Devanagari Script

providing fair probability of spurious keys to longer texts of size 256, 512 and even more.

The cryptanalysis of One Time Pad using multiple plain texts cipher texts pairs indicated that

there is a possibility of bounding encrypted-decrypted texts to certain number of character space in

stream ciphers. An encryption model is designed based on the property of OTP discovered for

numeric string for which the unicity distance tends to infinity with a limit that only 10 numeric

characters valid to encryption.

REPORT.pdf

Documents

Transcript of REPORT.pdf