CHAPTER 3 POLYALPHABETIC CIPHERS€¦ · This operation is self-inverse, so that exactly the same...
Transcript of CHAPTER 3 POLYALPHABETIC CIPHERS€¦ · This operation is self-inverse, so that exactly the same...
38
CHAPTER 3
POLYALPHABETIC CIPHERS
3.1 INTRODUCTION
In a polyalphabetic cipher, multiple cipher alphabets are used. To
facilitate encryption, all the alphabets are usually written out in a large table,
traditionally called a tableau. Usually the tableau is 26 × 26, so that 26 full
ciphertext alphabets are available. The method of filling the tableau, and of
choosing which alphabet to use next, defines the particular polyalphabetic
cipher. All such ciphers are easier to break than were believed since the
substitution alphabets are repeated for sufficiently large plaintexts. One of the
most popular was that of Vigenere cipher.
A simple substitution cipher involves a single mapping of the
plaintext alphabet onto ciphertext characters (Menezes et al 1997). A more
complex alternative is to use different substitution mappings (called multiple
alphabets) on various portions of the plaintext. This results in so-called
polyalphabetic substitution. In the simplest case, the different alphabets are
used sequentially and then repeated, so the position of each plaintext character
in the source string determines which mapping is applied to it. Under different
alphabets, the same plaintext character is thus encrypted to different
ciphertext characters, precluding simple frequency analysis as per mono-
alphabetic substitution. The simple Vigenere cipher is a polyalphabetic
substitution cipher. The definition is repeated here for convenience.
39
3.2 ADVANTAGES OF POLYALPHABETIC CIPHERS
The advantage of Polyalphabetic ciphers is that they make
frequency analysis more difficult. Frequency analysis is the practice of
decrypting a message by counting the frequency of ciphertext letters, and
equating it to the letter frequency of normal text. For instance if P occurred
most in a ciphertext whose plaintext is in English, one could suspect that P
corresponded to E, because E is the most frequently used letter in English.
Using the Vigenere cipher, E can be enciphered as any of several letters in the
alphabet in the Vigenere cipher, thus defeating simple frequency analysis
(www.experiencefestival.com/vigenre_cipher).
3.3 VIGENERE CIPHER
The Vigenere cipher is a method of encryption invented by Giovan
Batista Belaso and described in his 1553 book, “La cifra del. Sig. Giovan
Batista Belaso”. It was misattributed to Blaise de Vigenere in the 19th
century, and given his name. The cipher is a keyword-based system that uses
a series of different Caesar ciphers based on the letters of the keyword. It is a
simplified version of the more general polyalphabetic substitution cipher,
invented by Alberti ca 1465. This cipher is well-known because while it is
easy to understand and implement, it often appears to beginners to be
unbreakable. Consequently, many programmers have implemented
obfuscation or encryption schemes in their applications which are essentially
Vigenere ciphers, only to have them broken by the first cryptanalyst who
comes along. Use and cryptanalysis of the Vigenere cipher is therefore
frequently introduced at the beginning of courses on cryptography.
In the Vigenere cipher, the first row of the tableau is filled out with
a copy of the plaintext alphabet, and successive rows are simply shifted one
40
place to the left. (Such a simple tableau is called tabula recta, and
mathematically corresponds to adding the plaintext and key letters,
modulo 26.) A keyword is then used to choose which ciphertext alphabet to
use. Each letter of the keyword is used in turn, and then they are repeated
again from the beginning. So if the keyword is ’CAT’, the first letter of
plaintext is enciphered under alphabet ’C’, the second under ’A’, the third
under ’T’, the fourth under ’C’ again, and so on. In practice, Vigenere keys
were often phrases several words long. In 1863, Friedrich Kasiski published a
method, which enabled the calculation of the length of the keyword in a
Vigenere ciphered message. Once this was done, ciphertext letters that had
been enciphered under the same alphabet could be picked out and attacked
separately as a number of semi-independent simple substitutions complicated
by the fact that within one alphabet letters were separated and did not form
complete words, but simplified by the fact that usually a tabula recta had been
employed. As such, even today a Vigenere type cipher should theoretically be
difficult to break if mixed alphabets are used in the tableau, if the keyword is
random, and if the total length of ciphertext is less than
27.6 times the length of the keyword. These requirements are rarely
understood in practice and so Vigenere enciphered message security is usually
less than what might have been.
Other notable polyalphabetics include:
• The Gronsfeld cipher. This is identical to the Vigenere except
that only 10 alphabets are used, and so the “keyword” is
numerical.
• The Beaufort cipher. This is practically the same as the
Vigenere, except the tabula recta is replaced by a backwards
one, mathematically equivalent to ciphertext = key – plaintext.
41
This operation is self-inverse, so that exactly the same table is
used in exactly the same way, for both encryption and
decryption.
• The autokey cipher, which mixes plaintext in to the keying to
avoid periodicity in the key.
• The running key cipher, where the key is made very long by
using a passage from a book or similar text.
3.3.1 Definition of Vigenere Cipher
A simple Vigenere cipher of period t, over an s-character alphabet,
involves a t -character key k1k2k3…kt. The mapping of plaintext
m= m1m2m3.......... to ciphertext c=c1c2c3… is defined on individual characters
by ci =m i+ ki mod s, where subscript i in ki is taken modulo t (the key is re-used).
The simple Vigenere uses t shift ciphers defined by t shift values ki, each
specifying one of s (mono-alphabetic) substitutions; ki is used on the
characters in position i , i+s , i+2s….. . In general, each of the t substitution is
different; this is referred to as using t alphabets rather than a single
substitution mapping. The shift cipher is a simple Vigenere with period t=1.
3.3.2 Vigenere Table
Blaise de Vigenere was born in 1523 in Saint-Pourcain, France.
While serving as a diplomat in Rome, he came into contact with Giovanni
Battista della Porta in 1549 and learned from Porta’s Traicte´ des Chiffres
1585 describing various encryption systems. Vigenere’s book, “A Treatise on
Secret Writing” was published when Vigenere returned to Paris. It contains
the basic 26×26 Vigenere tableaux Table 3.1.
42
Table 3.1 Vigenere tableaux
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
B B C D E F G H I J K L M N O P Q R S T U V W X Y Z A
C C D E F G H I J K L M N O P Q R S T U V W X Y Z A B
D D E F G H I J K L M N O P Q R S T U V W X Y Z A B C
E E F G H I J K L M N O P Q R S T U V W X Y Z A B C D
F F G H I J K L M N O P Q R S T U V W X Y Z A B C D E
G G H I J K L M N O P Q R S T U V W X Y Z A B C D E F
H H I J K L M N O P Q R S T U V W X Y Z A B C D E F G
I I J K L M N O P Q R S T U V W X Y Z A B C D E F G H
J J K L M N O P Q R S T U V W X Y Z A B C D E F G H I
K K L M N O P Q R S T U V W X Y Z A B C D E F G H I J
L L M N O P Q R S T U V W X Y Z A B C D E F G H I JK
M M N O P Q R S T U V W X Y Z A B C D E F G H I J K L
N N O P Q R S T U V W X Y Z A B C D E F G H I J K L M
O O P Q R S T U V W X Y Z A B C D E F G H I J K L M N
P P Q R S T U V W X Y Z A B C D E F G H I J K L M N O
Q Q R S T U V W X Y Z A B C D E F G H I J K L M N O P
R R S T U V W X Y Z A B C D E F G H I J K L M N O P Q
S S T U V W X Y Z A B C D E F G H I J K L M N O P Q R
T T U V W X Y Z A B C D E F G H I J K LM N O P Q R S
U U V W X Y Z A B C D E F G H I J K L M N O P Q R S T
V V W X Y Z A B C D E F G H I J K L M N O P Q R S T U
W W X Y Z A B C D E F G H I J K L M N O P Q R S T U V
X X Y Z A B C D E F G H I J K L M N O P Q R S T U V W
Y Y Z A B C D E F G H I J K L M N O P Q R S T U V W X
Z Z A B C D E F G H I J K L M N O P Q R S T U V W X Y
43
The Vigenere encipherment of plaintext x (identified by its column
position) with the key k (identified by its row number) is the table entry in the
kth
row and column position x; for example, plaintext x = B is enciphered with
the key K = 2 to ciphertext y = d. Vigenere polyalphabetic encipherment
extends a sequence of r letters (k0, k1, . . . , kr-1) periodically to generate the
running key, k = (k0, k1, . . . , kn-1, . . .) with ki = k(i (modulo r)) for 0 ≤ i < ∞ . For
example, the key of length 12.
C R Y P T O G R A P H Y
12 17 24 15 19 14 6 17 0 15 8 24
Enciphers plaintext of length 20 using the repeated key
C R Y P T O G R A P H Y C R Y P T O G R
2 17 24 15 19 14 6 17 0 15 8 24 2 17 24 15 19 14 6 17
Vigenere’s original scheme subtracted rather than added the key
from the plaintext
x → y = ( y0, y1, . . . , yn-1), yi = (xi - ki) (modulo m).
It was rediscovered nearly one hundred years later by Admiral Sir
Francis Beaufort, whose name is associated with the wind velocity scale.
3.3.3 Operation of Vigenere Cipher
The Vigenere was described as “impossible of translation” in the
respected journal, “Scientific American”. “The Alphabet Cipher” provides a
good description of how to use a table for encryption and decryption using
arbitrary keywords, but here is an alternate description:
44
1. The encipherer chooses a plaintext: VIGENERE
2. The encipherer chooses a keyword and repeats it to become
the length of the plaintext, e.g. the keyword, “CIPH”:
CIPHCIPH
3. To encipher letter L1 of the plaintext, the encipherer creates a
new alphabet wherein A is shifted to letter L1 of the
ciphertext, B is shifted to the next letter, etc.:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
CDEFGHIJKLMNOPQRSTUVWXYZAB
4. The encipherer finds the letter that corresponds to L1 in the
substitution alphabet. This is now L1 of the plaintext: V⇒ X
5. This is repeated for each letter in the plaintext and its
corresponding letter in the key: VIGENERE + CIPHCIPH ⇒
XQVLPMGL
(www.mathdaily.com/lessons/Vigen%E8re_cipher).
A simpler, but equivalent way to encode a message is to write out a
copy of the alphabet, and then write the keyword vertically beneath the letter
A. Then, starting with each letter, complete the alphabet (starting again with
A after reaching Z). For example, if the keyword is “CUP”, one would write:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
CDEFGHIJKLMNOPQRSTUVWXYZAB
UVWXYZABCDEFGHIJKLMNOPQRST
PQRSTUVWXYZABCDEFGHIJKLMNO
To encode a message, one locates the plaintext letter in the top row,
and then reads the ciphertext letter from one of the alphabets below, using
45
each one in turn. One can also write out the entire set of these shifted
alphabets, picking the right row for any letter of the key, the resulting block of
alphabets is known as a tabula recta. This use of multiple alphabets in rotation
to encrypt a message is why this is called a polyalphabetic cipher; the
systematic and repeated use of multiple alphabets (all ordered as in the natural
alphabet) is why this cipher is not as secure as polyalphabetic ciphers can be.
3.4 SUBSTITUTION CIPHERS (BACKGROUND)
A substitution cipher is a cipher that replaces each plaintext symbol
with another ciphertext symbol. The receiver deciphers using the inverse
substitution. A substitution alphabet is the extended list of ciphertext symbols.
Examples are Caesar ciphers and the Atbash cipher.
This section considers the following types of classical ciphers:
simple (or mono-alphabetic) substitution, polygram substitution, and
homophonic substitution. The difference between codes and ciphers is also
noted. Polyalphabetic substitution ciphers are considered (Menezes et al
1997).
3.4.1 Mono-Alphabetic Substitution
The simple substitution cipher is one in which each plaintext
character is simply replaced by a corresponding one from a cipher alphabet.
The cipher alphabet may be shifted or reversed (creating the Caesar cipher
and atbash ciphers, respectively) or scrambled, in which case it is called a
“mixed alphabet” or “deranged alphabet”. Traditionally, mixed alphabets are
created by first writing out a keyword, then all the remaining letters.
Traditionally, the ciphertext is written out in blocks of fixed length to help
avoid errors and to disguise word boundaries from the plaintext.
46
Suppose the ciphertext and plaintext character sets are the same.
Let m= m1m2m3.......... be a plaintext message consisting of juxtaposed
characters mi A∈ , where A is some fixed character alphabet such as
A= {A, B,C, Z}. A simple substitution cipher or monoalphabetic substitution
cipher employs a permutation e over A, with encryption mapping
Ee (m) =e (m1) e (m2) e (m3)…. . Here juxtaposition indicates concatenation
(rather than multiplication), and e (mi) is the character to which mi is mapped
by e.
The disadvantage of using mono alphabetic in ciphering is easy
way to cryptanalysis since every bigram (for example _A) is mapped to the
same encrypted bigram each time.
The time complexity of mono-alphabetic is O (n2) when n is the
size of alphabetic, the possible number of keys in mono- alphabetic is n!
3.4.2 PolyGram Substitution (Example Playfair Cipher)
The Playfair cipher or Playfair square is a manual symmetric
encryption technique; this technique encrypts pairs of letters (digraphs),
instead of single letters as in the simple substitution cipher and rather more
complex Vigenere cipher systems. The Playfair is thus significantly harder to
break since the frequency analysis used for simple substitution ciphers does
not work with it. The usual form of the cipher used a 5 by 5 table and a key
word or phrase. Memorization of the key and 4 simple rules was all that was
required to create the 5 by 5 table and use the cipher. First fill in the spaces in
the table with the letters of the key (dropping any duplicate letters), then fill
the remaining spaces with the rest of the letters of the alphabet in order
(usually omitting “Q” to reduce the alphabet to fit, other versions put both “I”
and “J” in the same space). The key can be written in the top rows of the
47
table, from left to right, or in some other pattern, such as a spiral beginning in
the upper-left-hand corner and ending in the center. Then apply the following
4 rules, in order, to each pair of letters to encrypt a message:
• If the letters of a pair are both the same (or only one letter is
left), add an “X” after the first letter. Encrypt the new pair and
continue.
• If the letters appear on the same row of your table, replace
them with the letters to their immediate right respectively
(wrapping around to the left side of the row if a letter in the
original pair was on the right side of the row).
• If the letters appear on the same column of your table, replace
them with the letters immediately below respectively
(wrapping around to the top side of the column if a letter in
the original pair was on the bottom side of the column).
• If the letters are not on the same row or column, replace them
with the letters on the same row respectively but at the other
pair of corners of the rectangle defined by the original pair.
To decrypt, use the inverse of these 4 rules (dropping any extra
“X”s that don’t make sense in the final message when you finish).
PolyGram substitution ciphers involve groups of characters being
substituted by other groups of characters. For example, sequences of two
plaintext characters (bigrams) may be replaced by other bigrams. The same
may be done with sequences of three plaintext characters (trigrams), or more
generally using n-grams. In full bigram substitution over an alphabet of
26 characters, the key may be any of the 226
bigrams, arranged in a table with
row and column indices corresponding to the first and second characters in
48
the bigram, and the table entries being the ciphertext bigrams substituted for
the plaintext pairs. There are then (262)! Keys.
The advantage of this is, first the frequency distribution of digraphs
is much flatter than that of individual letters (though not actually flat in real
languages; for example, ’TH’ is much more common than ’XQ’ in English).
Second, the larger number of symbols requires correspondingly more
ciphertext to productively analyze letter frequencies. Because 262 = 676, to
substitute pairs with a substitution alphabet would take an alphabet 676
symbols long – which would be rather weighty. (Stewart lee 1999).
3.4.3 Homophonic Substitution
The idea of homophonic substitution is for each fixed key k to
associate with each plaintext unit (e.g., character) m a set S (k, m) of potential
corresponding ciphertext units (generally all of common size). To encrypt m
under k, randomly choose one element from this set as the ciphertext. To
allow decryption, for each fixed key this one-to-many encryption function
must be injective on ciphertext space. Homophonic substitution results in
ciphertext data expansion. In homophonic substitution, | S (k, m)| should be
proportional to the frequency of m in the message space. The motivation is to
smooth out obvious irregularities in the frequency distribution of ciphertext
characters, which result from irregularities in the plaintext frequency
distribution when simple substitution is used. While homophonic substitution
complicates cryptanalysis based on simple frequency distribution statistics,
sufficient ciphertext may nonetheless allow frequency analysis, in conjunction
with additional statistical properties of plaintext manifested in the ciphertext.
For example, in long ciphertexts each element of S (k, m) will occur roughly
the same number of times. Bigram distributions may also provide
information. (Menezes 1997).
49
3.5 SUBSTITUTION IN MODERN CRYPTOGRAPHY
The cryptographic concept of substitution carries on even today.
From a sufficiently abstract perspective, modern bit-oriented block ciphers
(e.g., Data Encryption Standard (DES) or Advanced Encryption Algorithm
(AES)) can be viewed as substitution ciphers on a very large binary alphabet.
In addition, block ciphers often include smaller substitution tables called
S-boxes.
3.6 SECURITY FOR SIMPLE SUBSTITUTION CIPHERS
A disadvantage of this method of derangement is that the last letters
of the alphabet (which are mostly low frequency) tend to stay at the end. A
stronger way of constructing a mixed alphabet is to perform a columnar
transposition on the ordinary alphabet using the keyword, but this is not often
done. Although the number of possible keys is very large (26! ≈ 288.4
, or
about 88 bits), this cipher is not very strong, being easily broken. Provided the
message is of reasonable length, the cryptanalyst can deduce the probable
meaning of the most common symbols by analyzing the frequency
distribution of the ciphertext. This allows formation of partial words, which
can be tentatively filled in, progressively expanding the (partial) solution.
Many people solve such ciphers for recreation, as with cryptogram puzzles in
the newspaper. According to the unicity distance of English, 27.6 letters of
ciphertext are required to crack a mixed alphabet simple substitution. In
practice, typically about 50 letters are needed, although some messages can be
broken with fewer if particular unusual patterns are found. In other cipher
cases, the plaintext can be contrived to have a nearly flat frequency
distribution, and much longer plaintexts will then be required.
50
3.7 POLYALPHABETIC SUBSTITUTION CIPHER
The polyalphabetic substitution cipher is a simple extension of the
monoalphabetic one. The difference is that the message is broken into blocks
of equal length, say B, and then each position in the block (1… B) is
encrypted (or decrypted) using a different simple substitution cipher key. The
block size B is often referred to as the period of the cipher. An example of a
polyalphabetic substitution cipher is shown on Table 3.1. The block size (i.e.,
B) is chosen to be three, and Table 3.2 gives an example key and shows the
corresponding encryption. (Dimovski and Gligoroski 2003a).
Table 3.2 Example of the polyalphabetic substitution cipher key and
encryption process
KEY:
Plaintext:
ABCDEFGHIJKLMNOPQRSTUVWXYZ_
Ciphertext:
ND_WIEURYTLAKSJQHFGMZPXOBCV
(Position 1)
LP_MKONJIBHUVGYCFTXDRZSEAWQ
(Position 2)
GFTYHBVCDRUJNXSEIKM_ZAOLWQP
(Position 3)
ENCRYPTION:
Position: 1 2 3 1 2 3 1 2312
Plaintext: HOW_ARE_YOU
Ciphertext RYOVLKIQWJR
51
The decryption process is the reversal of the encryption. The
polyalphabetic substitution cipher is somewhat more difficult to cryptanalyse
than the simple substitution cipher because of the independent keys used to
encrypt successive characters in the plaintext, but it is still relatively simple to
cryptanalyse the polyalphabetic substitution cipher based on the n-gram
statistics of the plaintext language. So, despite the monoalphabetic
substitution cipher where every bigram (for example _A) is mapped to the
same encrypted bigram each time, this is not the case for the polyalphabetic
substitution cipher, where the encrypted value of a bigram is dependent upon
two factors: the individual key values and the position of the characters within
the block.
3.8 VERNAM CIPHER
In modern terminology, a Vernam cipher is a stream cipher in
which the plaintext is XORed with a random or pseudorandom stream of data
of the same length to generate the ciphertext. If the stream of data is truly
random and used only once, this is the one-time pad. Substituting
pseudorandom data generated by a cryptographically secure pseudo-random
number generator is a common and effective construction for a stream cipher.
RC4 is an example of a Vernam cipher that is still widely used in 2004.
3.8.1 Vernam History
Gilbert Sandford Vernam was an AT&T Bell Labs engineer who, in
1917, invented the stream cipher and later co-invented the one-time pad
cipher. Vernam proposed a teletype cipher in which a previously-prepared
key, kept on paper tape, is combined character by character with the plaintext
message to produce the ciphertext. To decipher the ciphertext, the same key
would be again combined character by character, producing the plaintext.
52
3.8.2 Principle of the Vernam Cipher (One-Time Pad)
One type of substitution cipher, the One-Time Pad (OTP), is quite
special. In its most common implementation, the one-time pad can be called a
substitution cipher only from an unusual perspective; typically, the plaintext
letter is combined (not substituted) in some manner (eg, XOR) with the key
material character at that position. The one time pad is, in most cases,
impractical as it requires that the key material be as long as the plaintext,
actually random, used once and only once, and kept entirely secret from all
except the sender and intended receiver. When these conditions are violated,
even marginally, the one-time pad is no longer unbreakable. Each character in
the message is combined with one from the (random, secret, and used only
once) pad in the manner of a Vernam cipher. So the pad must be at least the
length of the message. Theoretically there is no way to decipher the message
without knowing the contents of the pad. For this reason it is very important
that the pad be protected (i.e., secret), random (i.e., unpredictable by anyone),
and used only once, lest the cipher be easily compromised.
3.9 THEORETICALLY SECURE CRYPTOSYSTEM
All the methods of encryption ever devised, only one has been
theoretically proved to be completely secure. It is called the Vernam cipher or
one-time pad. The worth of all other ciphers is based on computational
security. If a cipher is computationally secure this means the probability of
cracking the encryption key using current computational technology and
algorithms within a reasonable time is supposedly extremely small, yet not
impossible. In theory, every cryptographic algorithm except for the Vernam
cipher can be broken giving enough ciphertext and time. For example, the
Public Key (PK) cryptosystems such as Pretty Good Privacy (PGP) and
Rivest Shamir and Adleman (RSA) are based on the following:
53
Calculate an integer N such that it has only two prime number
factors f1 and f2. This triad of integers forms the basis of the encryption and
decryption keys used in PK cryptosystems. The security of these systems is
simply based on the computational difficulty of calculating f2 and f1 from N
if N is a very large integer. To break this cipher, N must be factored and at the
time these systems were devised the best publicly available factoring
algorithms would take millions of years to factor a 200 digit number. This
does not logically exclude the possibility of a new factoring algorithm being
discovered, or the existence of a secret factoring algorithm, or the invention of
technology capable of running current factoring algorithms at high speed.
Both the original design and the modern version of one-time pads are based
on the binary alphabet. The message, or plaintext, is converted to a sequence
of 0's and 1's, using some publicly known rule. The key is another sequence
of 0's and 1's of the same length. Each bit of the message, or the plaintext, is
then combined with the respective bit of the key, according to the rules of
addition in base 2:
0+0=0
0+1=1
1+0=1
1+1=0
The key is a random sequence of 0's and 1's, and therefore the
resulting cryptogram, the plaintext plus the key is also random and completely
scrambled unless one knows the key. The plaintext can be recovered by
adding (in base 2 again) the cryptogram and the key.
In the example, the sender adds each bit of the plaintext (01011100)
to the corresponding bit of the key (11001010) obtaining the cryptogram
(10010110), which is then transmitted to the receiver (Figure 3.1). The sender
54
and receiver must have exact copies of the key beforehand. The sender needs
the key to encrypt the plaintext, and the receiver needs the key to recover the
plaintext from the cryptogram. An eavesdropper, who has intercepted the
cryptogram and knows the general method of encryption but not the key, will
not be able to infer anything useful about the original message. It has been
proved that if the key is secret, the same length as the message, truly random,
and never reused, then the one-time pad is unbreakable. (Sergienko 2006).
Figure 3.1 Transmission of data
3.10 KEY MANAGEMENT PROBLEM IN VERNAM
All one-time pads suffer from a serious practical drawback, known
as the key distribution problem. Potential users have to agree secretly and in
advance on the key - a long, random sequence of 0's and 1's. Once they have
done this they can use the key for enciphering and deciphering, and the
resulting cryptograms can be transmitted publicly, for example, broadcasted
by radio, posted on Internet or printed in a newspaper, without compromising
the security of messages. But the key itself must be established between the
sender and the receiver by means of a very secure channel, for example, a
55
very secure telephone line, a private meeting or hand-delivery by a trusted
courier. Such a secure channel is usually available only at certain times and
under certain circumstances. So users far apart, in order to guarantee perfect
security of subsequent crypto-communication, have to carry around with them
an enormous amount of secret and meaningless information (cryptographic
keys), equal in volume to all the messages they might later wish to send. This
is, to say the least, not very convenient. Furthermore, even if a ‘secure’
channel is available, this security can never be truly guaranteed.
A fundamental problem remains because, in principle, any classical
private channel can be monitored passively, without the sender or receiver
knowing that the eavesdropping has taken place. This is because classical
physics - the theory of ordinary-scale bodies and phenomena such as paper
documents, magnetic tapes and radio signals - allows all physical properties
of an object to be measured without disturbing those properties. Since all
information, including cryptographic keys is encoded in measurable physical
properties of some object or signal, classical theory leaves open the possibility
of passive eavesdropping, because in principle it allows the eavesdropper to
measure physical properties without disturbing them. The fastest method of
encrypting a message with a one-time pad is with a computer. A computer
simplifies the process because the message and pad are encoded in binary.
Each character is represented internally by a computer as a unique
combination of zeros and ones called bits, for example the letter 'b' is
composed of the eight bits '1100010'. This binary number is 98 in decimal. To
encrypt the message each bit of each letter in the plaintext is combined with
the corresponding letters' bit in the pad in sequence using a transformation
called the bitwise exclusive or (abbreviated to XOR). This operation is
performed on each letter in sequence i.e. The first letter of the plaintext is
XORed with the first letter of the pad to produce the first letter of the
56
ciphertext, then the second letter of the plaintext is XORed with the second
letter of the pad to produce the second letter of the ciphertext and so on.
A basic example:
Suppose you wish to encrypt the message - begin at 17.30
Using the pad - #/KBZaF>TQV^Nc
Firstly all the bits in 'b' are XORed with all the bits in '#. This
produces the binary pattern for the character 'A'.
Table 3.3 shows Bitwise XOR operation.
Table 3.3 Bitwise XOR operation
Bit sequence for [b] Bit sequence for [#] Bitwise XOR
[A]
1
1
0
0
0
1
0
0
1
0
0
0
1
1
1
0
0
0
0
0
1
The same process is repeated for the next letters
e' and '/' are XORed to produce 'J'
'g' and 'K' are XORed to produce ',' etc.
57
To do this manually necessitates that you have a list of all the
character binary codes, which is why a computer is helpful. The completed
ciphertext looks like [AJ, +4A'Jt`AP} S]. By XORing the ciphertext with their
duplicate pad, the receiver regenerates the plaintext.
You can experimentally verify this procedure as follows:
1. Produce a table of the letters of the alphabet and numbers 0 to
9. Assign to each letter and digit a unique bit sequence. There
is no need to use eight bits, six are sufficient for this test. A
sample Table 3.4 is provided.
Table 3.4 Bit sequence for plaintext
Letter Bit
sequence Letter
Bit
sequence Letter
Bit
sequence Letter
Bit
sequence
a 111111 j 110110 s 101101 2 100100
b 111110 k 110101 t 101100 3 100011
c 111101 l 110100 u 101011 4 100010
d 111100 m 110011 v 101010 5 100001
e 111011 n 110010 w 101001 6 100000
f 111010 o 110001 x 101000 7 011111
g 111001 p 110000 y 100111 8 011110
h 111000 q 101111 z 100110 9 011101
2. Using the throws of two dice to index the rows and columns of
the Table 3.5, generate a pad of sufficient length for the
message.
58
Table 3.5 Random key generation
1 2 3 4 5 6
1 a g m s y 5
2 b h n t z 6
3 c i o u 1 7
4 d j p v 2 8
5 e k q w 3 9
6 f l r x 4 0
3. XOR each bit from each letter of the text with the
corresponding bit of the equivalent pad letter to create the
ciphertext.
4. XOR the ciphertext with the pad. The plaintext will be
regenerated.
5. One final test is to XOR the ciphertext with the plaintext. This
will reconstruct the pad.
3.11 SECURITY OF THE VERNAM CIPHER
The one-time pad is unbreakable if used properly. The pad must be
composed of truly random data, it must never be used more than once and it
must be kept secure.
If each key letter in the pad sequence is truly random, a
cryptanalyst can do no better than try every possible key letter for every
ciphertext message position. This is a hopeless situation for the attacker
because it is equivalent to trying all the possible messages the key could ever
encrypt. Even for a short pad such as the given example, the number of
59
possible messages is in the region of 200,000,000,000,000,000,000,000. The
ciphertext can provide no clues as to which one of these possibilities is the
real message.
3.12 TRANSPOSITION CIPHERS
Classical ciphers were first used hundreds of years ago. So far as
security is concerned, they are no match for today’s ciphers; however, this
does not mean that they are any less important to the field of cryptology.
Their importance stems from the fact that most of the ciphers in common use
today utilize the operations of the classical ciphers as their building blocks.
For example, the Data Encryption Standard (DES) (U.S. Department of
Commerce 1988), an encryption algorithm used widely in the finance
community throughout the world, uses only three very simple operators,
namely substitution, permutation (transposition) and bit-wise exclusive-or
(admittedly, in a complicated fashion). Given their simplicity and the fact that
they are used to construct other ciphers, the classical ciphers are usually the
first ones considered when researching new attack techniques. Many flavors
of classical ciphers exist, although most fall into one of two broad categories:
substitution ciphers and transposition (permutation) ciphers (Clark and
Dawson 1998). The transposition cipher rearranges the positions of the
plaintext characters in a different and complex order but "leaves the value of a
character or character string unaltered when transforming plaintext into
Ciphertext" (Grundlingh and Van Vuuren 2003).
3.13 TYPES OF TRANSPOSITION SYSTEMS
Transposition systems are fundamentally different from substitution
systems. In substitution systems, plaintext values are replaced with other
values. In transposition systems, plaintext values are rearranged without
60
otherwise changing them. All the plaintext characters that were present before
encipherment are still present after encipherment. Only the order of the text
changes (Field manual 1990).
• Most transposition systems rearrange text by single letters. It
is possible to rearrange complete words or groups of letters
rather than single letters, but these approaches are not very
secure and have little practical value. Larger groups than
single letters preserve too much recognizable plaintext.
• Some transposition systems go through a single transposition
process. These are called single transposition. Others go
through two distinctly separate transposition processes. These
are called double transposition.
• Most transposition systems use a geometric process. Plaintext
is written into a geometric figure, most commonly a rectangle
or square and extracted from the geometric figure by a
different path than the way it was entered. When the
geometric figure is a rectangle or square and the plaintext is
entered by rows and extracted by columns, it is called
columnar transposition. When some route other than rows and
columns is used, it is called route transposition.
• Another category of transposition is grille transposition. There
are several types of grilles, but each type uses a mask with cut
out holes that is placed over the worksheet. The mask may in
turn be rotated or turned over to provide different patterns
when placed in different orientations. At each position, the
holes lineup with different spaces on the worksheet. After
writing plaintext into the holes, the mask is removed and the
61
ciphertext extracted by rows or columns. In some variations,
the plaintext may be written in rows or columns and the
ciphertext extracted using the grille. These systems may be
difficult to identify initially when first encountered, but once
the process is recognized, the systems are generally solvable.
• Transposition systems are easy to identify. Their frequency
counts will necessarily look just like plaintext, since the same
letters are still present. There should be no repeats longer than
two or three letters, except for the rare longer accidental
repeat. The monographic phi will be within plaintext limits,
but a digraphic phi should be lower, since repeated digraphs
are broken up by transposition. Identifying which type of
transposition is used is much more difficult initially, and you
may have to try different possibilities until you find the
particular method used or take advantage of special situations
which can occur.
• Columnar transposition systems can be exploited when keys
are reused with messages of the same length. The plaintext to
messages with reused keys can often be recovered without
regard to the actual method of encipherment. Once the
plaintext is recovered, the method can be reconstructed.
3.14 DOUBLE TRANSPOSITION
A single columnar transposition could be attacked by guessing
possible column lengths, writing the message out in its columns (but in the
wrong order, as the key is not yet known) and then looking for possible
anagrams. Thus to make it stronger, a double transposition was often used.
This is simply a columnar transposition applied twice, with two different keys
62
of different (preferably relatively prime) length. Double transposition was
generally regarded as the most complicated cipher that an agent could operate
reliably under difficult field conditions.
3.15 COMBINATIONS
Transposition is often combined with other techniques. For
example, a simple substitution cipher combined with a columnar transposition
avoids the weakness of both. Replacing high frequency ciphertext symbols
with high frequency plaintext letters does not reveal chunks of plaintext
because of the transposition. Anagramming the transposition does not work
because of the substitution. The technique is particularly powerful if
combined with fractionation. A disadvantage is that such ciphers are
considerably more laborious and error prone than simpler ciphers.
3.16 TRANSPOSITION CIPHERS AND BIGRAMS
According to Russell et al (2003a), cryptography has a long and
colorful history. The earliest schemes, now termed the classical ciphers, were
designed to be carried out with pen and paper rather than by electronics.
Many were transposition algorithms which rearrange the order of letters in a
message. Classical cryptography became obsolete after the advent of
computers: more complex ciphers could be used and older ciphers broken
with greater ease. Nonetheless, modern analogues of classical schemes can
still be found as components of larger ciphers. In particular, some iterated
block ciphers such as the Data Encryption Standard (NIST 1999), incorporate
transpositions to provide diffusion. The cryptanalyst's tactic when presented
with a transposition was to exploit particular statistical features of the
ciphertext, as well as to rely upon intuition, luck and trial-and-error, to find
the correct decryption. As this was sometimes too slow a process, mechanized
63
aids were used as early as World War II (Bauer 1997) by which frequencies
of letter pairs (known as bigrams) were automatically examined in order to
narrow down the space of possible keys. The remaining few keys could then
be checked exhaustively by hand to recover the plaintext.
The possibility of fully automating this procedure is considered. A
straightforward implementation turns out to be incapable of decrypting harder
cryptograms due to random 'variation in the bigram heuristic. Cryptograms
which are hard for this algorithm are quantified. It will be shown that the
pheromone feedback mechanism of an Ant Colony System is capable of
overcoming some random variation and decrypting a wider variety of
messages. A preliminary version of this result was summarized in (Russell
et al 2003b).
3.17 FORMS OF THE TRANSPOSITION CIPHER
Two forms of the transposition cipher (Helen Fouché Gaines 1939)
are introduced and their cryptanalysis shown to be equivalent. The first is
known as columnar transposition. The Columnar Transposition Cipher
arranges the plaintext in a square matrix from left to right and from top to
bottom. It depends on the key to determine the number of columns for the
letters in the square. Each character in the key becomes a column header
followed by the plaintext message in successive rows beneath those headers.
Spaces are ignored or replaced with a "null" value. Finally, the encrypted
message is written in groups according to columns. The transposition cipher
basically rearranges the content according to a regular pattern. This could be
made more complex by additional shuffling of the positions of the characters.
The standard columnar transposition consists of writing the key out
as column headers, then writing the message out in successive rows below
64
these headers (filling in any spare spaces with nulls), finally, the message is
read off in columns, in alphabetical order of the headers. As an example,
consider the plaintext "CRYPTANALYSISOFTRANSPOSITION
CIPHERSISTOUGH", encrypted using the key (31524):
3 1 5 2 4
C R Y P T
A N A L Y
S I S O F
T R A N S
P O S I T
I O N S I
S T O U G
H X X X X
Ciphertext: RNIROOTX PLONISUX CASTPISH TYFSTIGX YASASNOX
Decryption is simply a matter of writing the ciphertext back into the grid
using the same ordering of the columns. The second form is termed complete-
unit transposition. The plaintext is divided into a series of blocks (units) of a
fixed length w. again padding if necessary. A permutation of size w, is applied
to each block in turn, rearranging the letters. The sequence of permuted
blocks is then used as the ciphertext. Here is an example with the same key
and plaintext as before:
31524 31524 31524 31524 31524 31524…
CRYPT ANALY SISOF TRANS POSIT IONCI...
RPCTY NLAFS IOSFS RNTSA OIPTS OCIIN
In a sense this latter form of the transposition is also the most
general, as any transposition can be recast as a complete-unit transposition
65
with key size set to the length of the plaintext. Both of these forms of the
transposition cipher are susceptible to an attack known as multiple
anagramming. The key size w is assumed to be known (there are statistical
tests for this purpose), and the ciphertext is written into a grid with w
columns. For columnar transpositions, the ciphertext must be written into the
grid, column by column from left to right and dually for complete-unit
transpositions the ciphertext is entered row by row from top to bottom. The
columns then have to be rearranged to form readable plaintext in every row.
For example.
Finding the correct rearrangement is clearly equivalent to finding
the key. Certain patterns inherent in natural language can he exploited in
order to do this efficiently.
3.18 HISTORICAL CRYPTANALYSIS OF TRANSPOSITION
CIPHERS
According to National Institute of Standards and Technology
(NIST) (1999), one property of written natural language is the distribution of
pairs of letters known as bigrams, is not uniform. In English, for example,
'TH' is common and 'QZ' is rare. Using some large sample of text, stripped of
R P C T Y
N L A Y A
I O S F S
R N T S A
O I P T S
O S I I N
T U S G O
X X H X X
C R Y P T
A N A L Y
S I S O F
T R A N S
P O S I T
I O N S I
S T O U G
H X X X X
Rearrange
66
numbers, punctuation, white space and other non-letters, a standard
probability for each bigram can be obtained. For other texts, the observed
frequencies will tend to be close to these probabilities. Two columns placed
next to each other form several bigrams, one for each row. The bigram
adjacency score, Adj(I, J) is defined as the average probability of the bigrams
created by juxtaposing columns I and J, i.e.
Adj (I, J) = )(1
1
rr
h
r
std JIPh∑
=
(3. 1)
where Ir and Jr denote the rth
letter in the column I or J respectively. Pstd(xy) is
the standard probability of the bigram ’xy’, and h is the number of rows in a
column. The score will be higher for two correctly aligned columns, because
the bigrams will be from the plaintext. If they are incorrectly aligned, the
pairs will be much more random and likely to score lower. From the bigram
adjacency score, a pen-and-paper cryptanalyst infers the top candidates for
each column's neighbor. Together with other statistical clues, it is usually
straightforward to reassemble the columns correctly.
A more general method is also known (Bauer 1997), that is less
reliant on ad hoc exploitation of particular features of the cryptogram. This
method has been partially automated, and will now be considered in more
detail. Some preliminaries are needed: by I || J. it means that when columns I
and J are adjacent in that order they form bigrams from the plaintext. If this is
not the case, then I ≠ J can be written. The multiple anagramming problem
can be represented as a graph, as has been done in Figure 3.2. The graph is
called the anagramming graph of the problem. Each node denotes a column.
A directed arc from column I to column .J indicates that I || J has not been
ruled out. Since the transposition key is a permutation, a candidate key can be
represented as some path through the column nodes which does not pass
67
through the same column twice. Normally, this would specify w! possible
keys, where w is the width of the grid: even for small w this precludes an
exhaustive search. In the historical attack, arcs on the anagramming graph are
pruned to restrict the number of paths. The number of keys is hopefully
reduced to a point where each can be checked by hand to see which produces
a comprehensible plaintext. To prune the arcs, a cutoff value α is chosen; an
arc is included from node I to J if and only if Adj (1.J) >α .
Figure 3.2 The anagramming graph produced by the cryptogram used
as the running example
3.19 SIMPLE TRANSPOSITION (ROW TRANSPOSITION)
A simple transposition or permutation cipher works by breaking a
message into fixed size blocks and then permuting the characters within each
block according to a fixed permutation, say P. The key to the transposition
cipher is simply the permutation P. So, the transposition cipher has the
C
A
S
T
P
I
S
H
R
N
I
R
O
O
T
X
Y
A
S
A
S
N
O
X
P
L
O
N
I
S
U
X
T
Y
F
S
T
I
G
X
68
property that the encrypted message contains all the characters that were in
the plaintext message. In other words, the unigram statistics for the message
are unchanged by the encryption process. The size of the permutation is
known as the period. Let's consider an example of a transposition cipher with
a period of ten 10, and a key P={7,10,4,2,8,1,5,9,6,3}. In this case, the
message is broken into blocks of ten characters, and after encryption the
seventh character in the block will be moved to position 1, the tenth moved
character in the block will be moved to position 2, the forth is moved to
position 3, the second to position 4, the eighth to position 5, the first to
position 6, the fifth to the position 7, the ninth to the position 8, the sixth to
the position 9 and the third to position 10.
Table 3.6 shows the key and the encryption process of the
previously described transposition cipher. It can be noticed that the random
string "X" was appended to the end of the message to enforce a message
length, which is a multiple of the block size.
Table 3.6 Example of the transposition cipher key and encryption process
KEY:
Plaintext: 1 2 3 4 5 6 7 8 9 10
Ciphertext: 7 10 4 2 8 1 5 9 6 3
ENCRYPTION:
Position : 12345678910 1234 5678 910 12345678910
Plaintext : TRANSPOSITION _ ALGORITHMXXXXXXX
Ciphertext OTNRSTSIPAGI _ OOIARLNXXXHXTXXXM
It is also clear that the decryption can be achieved by following the
same process as encryption using the "inverse" of the encryption permutation.
In this case the decryption key, P-1
is equal to {6,4,10,3,7,9,1,5, 8,2}.