Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon...

13
Data Privacy and Security

Transcript of Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon...

Page 1: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

Data Privacy and Security

Page 2: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

Springer New York Berlin Heidelberg Hong Kong London Milan Paris Tokyo

Page 3: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

David Salomon

Data Privacy and Security

With 122 Illustrations

,~, T

Springer

Page 4: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

David Salomon Department of Computer Science California State University, Northridge Northridge, CA 91330-8281 USA [email protected]

Library of Congress Cataloging-in-Publication Data Salomon, D. (David), 1938-

Data privacy and security / David Salomon. p. cm.

Includes bibliographical references and index. ISBN 978·1·4419·1816·1 1. Computer security. 2. Data encryption (Computer science) 3. Data protection.

1. Title. QA76.9.A25S2652003 005.8-dc21 2002044524

Printed on acid-free paper.

ISBN 978-1-4419-1816-1 ISBN 978-0-387-21707-9 (eBook) DOI 10.1007/978-0-387-21707-9

© 2003 Springer-Verlag New York, Inc. Softcover reprint of the hardcover 1st edition 2003

All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

9 8 7 6 5 4 3 2 1 SPIN 10906224

www.springer-ny.com

Springer-Verlag New York Berlin Heidelberg A member of BertelsmannSpringer Science+Business Media GmbH

Page 5: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

To J. Robert Henderson, whose help and support made this as well as other books possible.

Page 6: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

Preface Computers are indispensable in today's world and many individuals spend substantial amounts of time using them. Most users consider the computer a tool for communi­cations (email, Web browsing, and file transfers) and entertainment (playing games, listening to music, and watching movies). However, when the first modern electronic computers were developed, during and after World War II, their designers had other applications in mind. They were interested in a fast, reliable calculating machine to solve immediate practical problems such as creating and breaking secret codes, con­structing accurate firing tables for cannons, and simulating complex physical processes such as weather forecasting and nuclear reactions. Thus, cryptography is one of the oldest computer applications.

Cryptography is the stuff of spy novels, action comics, and thriller movies. Some of us may remember how as kids we saved up bubble-gum wrappers to send away for Captain Midnight's secret decoder disk. On television and in the movies we commonly watch a nondescript gentleman in a gray flannel suit carrying a briefcase, presumably full of secrets, handcuffed to his wrist.

And what about you, gentle reader, sitting in your office, trying to email a con­fidential company memo to a colleague; a common, boring, but responsible task. You have to guarantee that your coworker is the only recipient of the message, you want to be sure that the recipient has actually received it, and is convinced that you, and no one else, were the sender. Considering how easy it is to intercept email messages, and taking into account the sophistication of computer hackers and commercial spies, sending this memo is no trivial task. If you are like most, you use cryptography. Specifically, you use a modern cryptographic technique to encrypt your message with the recipient's public key on your computer and you send it directly from your computer to its destination. The recipient has a private key to decrypt the message.

Cryptography is an old science (some may consider it an art). The timeline fol­lowing the appendixes lists developments in this field since 1900 B.C.. Today, these developments are considered classical cryptogmphy. With the advent of the computer in the mid twentieth century, new cryptographic methods have been developed that are referred to as modern cryptogmphy.

Page 7: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

viii Preface

One of the earliest computers, the Colossus, was built in England during World War II for the specific purpose of deciphering German military codes. Early in the war, the German military used the Enigma machine to encrypt messages. The story of the Enigma and how its code was broken is told in Chapter 5. Later in the war, the British discovered that the Germans had started using another cipher, dubbed the Lorenz, that was far more complex than the Enigma. Breaking the Lorenz code required a sophisticated machine, a machine that could perform statistical analyses, data searching, and string matching, and that could easily be reconfigured to perform different operations as needed. Max Newman, one of the mathematicians employed in Bletchley Park on breaking the Enigma code, came up with a design for such a machine, but his superiors were convinced that constructing it, especially during the war, was beyond their capabilities.

Fortunately, an engineer by the name of Tommy Flowers had heard about the idea and believed that it was workable. He worked for the British Post Office in North London, where he managed to convert Newman's design into a working machine in 10 months. He finished the construction and delivered his machine to Bletchley Park in December 1943. It was called Colossus, and it had two important features; it was completely electronic, using 1500 vacuum tubes and no mechanical relays, and it was programmable. It used paper tapes for input and output. Today, the Colossus is one of several candidates for the title "the first modern electronic computer," but for a long time it was kept secret.

After the war, Colossus was dismantled, and its original blueprints destroyed by Flowers obeying government instructions. This is why for many years, others were credited with the invention of the modern computer.

Cryptography is important and popular because it scrambles our data, making them unreadable and thereby providing privacy. There is, however, another approach to privacy. Data can be hidden instead of being encrypted. Data hiding, also called steganography, is different from cryptography but achieves the same goal, namely privacy and security of our data.

This book is about keeping data private, which is why it covers classical cryptog­raphy, modern cryptography, and steganography. Each of the three topics is illustrated and explained by presenting and describing various methods and techniques.

Modern cryptographic methods are mathematical and are based on concepts such as binary numbers, the modulo function, prime numbers, factoring large numbers, and permutations. Yet I believe that when this material is presented with an adequate introduction to each topic and with enough examples, anyone with even a little expo­sure to mathematics and computer algorithms can grasp the main ideas. The use of mathematics is kept to a minimum and the stress is on examples, diagrams, and clear descriptions. Instead of trying to be rigorous and prove every claim, the text often says "it can be shown that ... " or "it can be proved that .... "

An important feature of the book is the exercises, which are generously sprinkled throughout. Most of them encourage the reader to better understand a topic by doing

Page 8: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

Preface ix

Cryptography is the mathematical consequence of paranoid assumptions.

-Unknown

a bit of work. The rest tempt the reader to try to come up with a new idea or a novel principle. It is important to try to work out the exercises, but the answers are provided and can always be consulted as a last resort.

• The Introduction tells the story of the Zimmermann telegram to illustrate the effect secret codes and code breaking can have on important historical events. The main terms used in this field, such as cryptography, cryptanalysis, and steganography, are defined. The Introduction continues with a discussion of Kerckhoffs' principle which claims that the important part of a secret code is not the encryption algorithm but the cryptographic key. The Introduction concludes with a list of important cryptographic resources.

• Chapter 1 discusses monoalphabetic substitution ciphers, where each symbol is replaced by another symbol and the replacement (SUbstitution) rule does not vary. Section 1.2 illustrates how a knowledge of the letter frequencies of a language can be used to break a monoalphabetic cipher. Section 1.4 discusses the Polybius monoalphabetic cipher, Section 1.6 explains the Playfair cipher, and Section 1. 7 introduces homophonic substitution ciphers.

• Chapter 2 is devoted to transposition ciphers. Such a cipher replaces the entire alphabet with a permutation of itself. The topics covered in this chapter are transposi­tion by turning template (Section 2.3), transposition with a key (Section 2.4), and the two-step ADFGVX cipher (Section 2.6).

• Polyalphabetic substitution ciphers are the topic of Chapter 3. In such a cipher, the substitution rule is varied each time a character is encrypted. The main encryption methods covered in this chapter are the Trithemius cipher (Section 3.4), the Vigenere cipher (Section 3.5) and how it was broken, the index of coincidence (Section 3.17), and Polybius's polyalphabetic cipher (Section 3.16).

• A polyalphabetic substitution cipher can be made absolutely secure through the use of a one-time pad based on random numbers, so Chapter 4 is a survey of random numbers, methods for generating both true and pseudo-random numbers, and statistical tests for randomness.

• The last word in encryption, before the computer age, was mechanical (or elec­tromechanical) rotor encryption machines. Chapter 5 is devoted to these machines, specifically to the most famous of them, the German Enigma. The principles of rotor machines are explained, followed by a discussion of the Enigma, its history, principles of operation, and how its code was broken before and during World War II.

• Chapters 6, 7, and 8 discuss modern cryptography. Both symmetric-key and public­key encryption methods are discussed, with emphasis on block ciphers and stream ci­phers.

• Does the future belong to quantum cryptography? This question is the topic of Chapter 9, where the principles of this esoteric field are explained.

Page 9: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

x Preface

• Steganography, the topic of Chapters 10 through 12, represents a different approach to privacy. Instead of being encrypted, the data are hidden. These chapters include an overview of steganographic techniques and descriptions of many methods for embedding data, watermarks, and fingerprints in text, image, video, and audio files.

• The appendixes present auxiliary material such as convolution, hash functions, cyclic redundancy code (CRC), and finite fields.

• Both a cryptography timeline and a glossary of important terms follow the ap­pendixes. The former provides a bird's-eye view of the main stages in the development of cryptography, while the latter is a summary of important terms.

• The index caters to those who have already read the book and want to locate a familiar item, as well as to those new to the book who are looking for a particular topic. I have included any terms that may occur to a reader interested in any of the topics discussed in the book (even topics that are just mentioned in passing). As a result, even a quick glancing over the index gives the reader an idea of the topics and subtopics included in the book. A special effort was made to include full names (first and middle names instead of initials) and dates of all persons mentioned in the book.

Currently, the book's Web site is part of the author's Web site, which is located at http://www . ecs. csun. edur dxs/. Domain name BooksByDavidSalomon. com has been reserved and will always point to any future location of the Web site. The au­thor's email address is dsalomon@csun. edu, but it is planned that any email sent to (anyname;@BooksByDavidSalomon. com will be forwarded to the author.

Consumer electronics maker JVC and games developer Hudson Soft say they've found a way to fight CD-ROM software piracy.

The companies said Wednesday they've developed a new anti-copying technology, called "Root," that they claim will prevent CD-ROM discs from being duplicated. The technology is just one part of the computer industry's ongoing efforts to control software piracy.

The Root technology-which prevents illegal copying "from the roots up," the company says-uses encryption keys, an established method of protecting data. The technology encrypts a disc's contents so it cannot be read without a key, which is also located on the disc. The key is hidden in such a way that it can be read by any CD-ROM drive, but cannot be written by a CD-RjRW drive-so that a copied version of the disc would be unreadable. The key is different for each disc and is hidden in a different place each time.

From Cnet news.com August 29, 2002, 4:01 PM PT

Northridge, California

Audience, level, and treatment-a description of such matters is what prefaces are supposed to be about.

-Paul R. Halmos, I Want To Be A Mathematician (1985)

David Salomon

Page 10: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

Contents

Preface vii

Introduction 1

Basic Concepts 4 The Caesar Cipher 7 The Affine Cipher 8 The One-Time Pad 11 Kerckhoffs' Principle 15

Part I. Data Encryption 19

1 Monoalphabetic Substitution Ciphers 21

1.1 Letter Distributions 22 1.2 Breaking a Monoalphabetic Cipher 24 1.3 The Pigpen Cipher 28 1.4 Polybius's Monoalphabetic Cipher 29 1.5 Extended Monoalphabetic Ciphers 30 1.6 The Playfair Cipher 30 1.7 Homophonic Substitution Ciphers 35

2 Transposition Ciphers 39

2.1 Simple Examples 40 2.2 Cyclic Notation and Keys 44 2.3 Transposition by Thrning Template 45 2.4 Columnar Transposition Cipher 48 2.5 Double Transposition 49 2.6 A 2-Step ADFGVX Cipher 52 2.7 An Approach to Decryption 53 2.8 Conclusions 56

Page 11: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

xii Contents

3 Polyalphabetic Substitution Ciphers 59

3.1 Self-Reciprocal Ciphers 60 3.2 The Porta Polyalphabetic Cipher 60 3.3 The Beaufort Cipher 62 3.4 The Trithemius Cipher 63 3.5 The Vigenere Cipher 64 3.6 Breaking the Vigenere Cipher 66 3.7 Long Keys 72 3.8 A Variation on Vigenere 74 3.9 The Gronsfeld Cipher 75 3.10 Generating Permutations 76 3.11 The Eyraud Cipher 77 3.12 The Hill Cipher 80 3.13 The Jefferson Multiplex Cipher 82 3.14 Strip Ciphers 85 3.15 Polyphonic Ciphers and Ambiguity 85 3.16 Polybius's Polyalphabetic Cipher 87 3.17 The Index of Coincidence 88

4 Random Numbers 91

4.1 Manually Generated Random Numbers 92 4.2 True Random Numbers 93 4.3 Pseudo-Random Number Generators 97 4.4 Statistical Tests for Randomness 100

5 The Enigma 107

5.1 Rotor Machines 107 5.2 The Enigma: History 111 5.3 The Enigma: Operation 113 5.4 Breaking the Enigma Code 117

6 Stream Ciphers 131

6.1 Symmetric Key and Public Key 133 6.2 Stream Ciphers 134 6.3 Linear Shift Registers 136 6.4 Cellular Automata 139 6.5 Nonlinear Shift Registers 139 6.6 Other Stream Ciphers 145 6.7 Dynamic Substitution 145 6.8 The Latin Square Combiner 147 6.9 SEAL Stream Cipher 148 6.10 RC4 Stream Cipher 150

Page 12: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

Contents xiii

7 Block Ciphers 155

7.1 Block Ciphers 155 7.2 Lucifer 161 7.3 The Data Encryption Standard 162 7.4 Blowfish 175 7.5 IDEA 178 7.6 RC5 181 7.7 Rijndael 183

8 Public-Key Cryptography 195

8.1 Diffie-HeIlman-Merkle Keys 195 8.2 Public-Key Cryptography 198 8.3 RSA Cryptography 199 8.4 Rabin Public-Key Method 203 8.5 El Gamal Public-Key Method 204 8.6 Pretty Good Privacy 205 8.7 Sharing Secrets: Threshold Schemes 206 8.8 The Four Components 212 8.9 Authentication 214 8.10 Elliptic Curve Cryptography 218

9 Quantum Cryptography 235

Part II. Data Hiding 243

10 Data Hiding in Text 245 10.1 Basic Features 247 10.2 Applications of Data Hiding 250 10.3 Watermarking 251 10.4 Intuitive Methods 252 10.5 Simple Digital Methods 255 10.6 Data Hiding in Text 255 10.7 Innocuous Text 258 10.8 Mimic Functions 262

11 Data Hiding in Images 269 11.1 LSB Encoding 269 11.2 BPCS Steganography 280 11.3 Lossless Data Hiding 285 11.4 Spread Spectrum Steganography 294 11.5 Data Hiding by Quantization 297 11.6 Patchwork 298 11.7 Signature Casting in Images 299 11.8 Transform Domain Methods 301 11.9 Robust Data Hiding in JPEG Images 303 11.10 Robust Frequency Domain Watermarking 309 11.11 Detecting Malicious Tampering 312

Page 13: Data Privacy and Security - Home - Springer978-0-387-21707... · 2017-08-26 · David Salomon Department of Computer Science California State University, Northridge Northridge, CA

xiv Contents

11.12 Wavelet Methods 11.13 Kundur-Hatzinakos Watermarking: I 11.14 Kundur-Hatzinakos Watermarking: II 11.15 Data Hiding in Binary Images 11.16 The Zhao-Koch Method 11.17 The Wu-Lee Method 11.18 The CPT Method 11.19 The TP Method 11.20 Data Hiding in Fax Images

314 321 323 325 325 328 329 332 336

12 Data Hiding: Other Methods _______________ _

12.1 Protecting Music Scores 339 12.2 Data Hiding in MPEG-2 Video 341 12.3 Digital Audio 344 12.4 The Human Auditory System 347 12.5 Audio Watermarking in the Time Domain 351 12.6 Echo Hiding 353 12.7 The Steganographic File System 356 12.8 Ultimate Steganography? 361 12.9 Public-Key Steganography 362 12.10 Current Software 362

Part III. Essential Resources Appendixes A Convolution ____________________ _

A.1 One-Dimensional Convolution 369 A.2 Two-Dimensional Convolution 373 B Hashing ______________________ _

B.1 Hash Tables 377 B.2 Hash Functions 378 B.3 Collision Handling 379 B.4 Secure Hash Functions 381 C Cyclic Redundancy Codes ______________ _

D Galois Fields ____________________ _

D.1 Field Definitions and Operations D.2 GF(256) and Rijndael D.3 Polynomial Arithmetic

387 395 399

Answers to Exercises __________________ _

Cryptography Timeline _________________ _

Glossary ________________________ _

Bibliography ______________________ _

Index _________________________ __

339

367

369

377

383

387

401

419

429

441

453

Each memorable verse of a true poet has two or three times the written content.

--Alfred de M usset