Methods in Hardware/Software System Design€¦ · DEDICATED DIGITAL PROCESSORS Methods in...

DEDICATEDDIGITAL

PROCESSORSMethods in Hardware/Software System Design

F. Mayer-LindenbergTechnical University of Hamburg-Harburg, Germany

Innodata0470092823.jpg

DEDICATEDDIGITAL

PROCESSORS

DEDICATEDDIGITAL

PROCESSORSMethods in Hardware/Software System Design

F. Mayer-LindenbergTechnical University of Hamburg-Harburg, Germany

Copyright C© 2004 John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester,West Sussex PO19 8SQ, England

Telephone (+44) 1243 779777

Email (for orders and customer service enquiries): [email protected] our Home Page on www.wileyeurope.com or www.wiley.com

All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval systemor transmitted in any form or by any means, electronic, mechanical, photocopying, recording,scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 TottenhamCourt Road, London W1T 4LP, UK, without the permission in writing of the Publisher.Requests to the Publisher should be addressed to the Permissions Department, John Wiley &Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex PO19 8SQ, England, or emailedto [email protected], or faxed to (+44) 1243 770620.

This publication is designed to provide accurate and authoritative information in regard tothe subject matter covered. It is sold on the understanding that the Publisher is not engagedin rendering professional services. If professional advice or other expert assistance isrequired, the services of a competent professional should be sought.

Other Wiley Editorial Offices

John Wiley & Sons Inc., 111 River Street, Hoboken, NJ 07030, USA

Jossey-Bass, 989 Market Street, San Francisco, CA 94103-1741, USA

Wiley-VCH Verlag GmbH, Boschstr. 12, D-69469 Weinheim, Germany

John Wiley & Sons Australia Ltd, 33 Park Road, Milton, Queensland 4064, Australia

John Wiley & Sons (Asia) Pte Ltd, 2 Clementi Loop #02-01, Jin Xing Distripark, Singapore 129809

John Wiley & Sons Canada Ltd, 22 Worcester Road, Etobicoke, Ontario, Canada M9W 1L1

Wiley also publishes its books in a variety of electronic formats. Some content that appearsin print may not be available in electronic books.

British Library Cataloguing in Publication Data

A catalogue record for this book is available from the British Library

ISBN 0-470-84444-2

Typeset in 10/12pt Times by TechBooks, New Delhi, IndiaPrinted and bound in Great Britain by Antony Rowe Ltd, Chippenham, WiltshireThis book is printed on acid-free paper responsibly manufactured from sustainable forestryin which at least two trees are planted for each one used for paper production.

http://www.wileyeurope.comhttp://www.wiley.com

Contents

Preface ix

1 Digital Computer Basics 11.1 Data Encoding 1

1.1.1 Encoding Numbers 31.1.2 Code Conversions and More Codes 7

1.2 Algorithms and Algorithmic Notations 91.2.1 Functional Composition and the Data Flow 101.2.2 Composition by Cases and the Control Flow 111.2.3 Alternative Algorithms 13

1.3 Boolean Functions 141.3.1 Sets of Elementary Boolean Operations 141.3.2 Gate Complexity and Simplification of Boolean Algorithms 161.3.3 Combined and Universal Functions 18

1.4 Timing, Synchronization and Memory 191.4.1 Processing Time and Throughput of Composite Circuits 201.4.2 Serial and Parallel Processing 221.4.3 Synchronization 25

1.5 Aspects of System Design 291.5.1 Architectures for Digital Systems 291.5.2 Application Modeling 311.5.3 Design Metrics 35

1.6 Summary 37Exercises 38

2 Hardware Elements 412.1 Transistors, Gates and Flip-Flops 41

2.1.1 Implementing Gates with Switches 412.1.2 Registers and Synchronization Signals 50

CONTENTS•vi2.1.3 Power Consumption and Related Design Rules 572.1.4 Pulse Generation and Interfacing 61

2.2 Chip Technology 672.2.1 Memory Bus Interface 692.2.2 Semiconductor Memory Devices 712.2.3 Processors and Single-Chip Systems 762.2.4 Configurable Logic, FPGA 78

2.3 Chip Level and Circuit Board-Level Design 862.3.1 Chip Versus Board-Level Design 882.3.2 IP-Based Design 912.3.3 Configurable Boards and Interconnections 912.3.4 Testing 94


3 Hardware Design Using VHDL 993.1 Hardware Design Languages 993.2 Entities and Signals 1013.3 Functional Behavior of Building Blocks 1023.4 Structural Architecture Definitions 1063.5 Timing Behavior and Simulation 1073.6 Test Benches 1093.7 Synthesis Aspects 1113.8 Summary 112

Exercises 113

4 Operations on Numbers 1154.1 Single Bit Binary Adders and Multipliers 1154.2 Fixed Point Add, Subtract, and Compare 1164.3 Add and Subtract for Redundant Codes 1204.4 Binary Multiplication 1224.5 Sequential Adders, Multipliers and Multiply-Add Structures 1244.6 Distributed Arithmetic 1284.7 Division and Square Root 1304.8 Floating Point Operations and Functions 1314.9 Polynomial Arithmetic 1334.10 Summary 134

Exercises 135

5 Sequential Control Circuits 1375.1 Mealy and Moore Automata 1375.2 Scheduling, Operand Selection and the Storage Automaton 1405.3 Designing the Control Automaton 1425.4 Sequencing with Counter and Shift Register Circuits 1445.5 Implementing the Control Flow 146

CONTENTS •vii5.6 Synchronization 1485.7 Summary 148

Exercises 149

6 Sequential Processors 1516.1 Designing for ALU Efficiency 153

6.1.1 Multifunction ALU Circuits 1536.1.2 Pipelining 158

6.2 The Memory Subsystem 1596.2.1 Pipelined Memory Accesses, Registers, and the

Von Neumann Architecture 1606.2.2 Instruction Set Architectures and Memory Requirements 1626.2.3 Caches and Virtual Memory, Soft Caching 165

6.3 Simple Programmable Processor Designs 1686.3.1 CPU1 – The Basic Control Function 1686.3.2 CPU2 – An Efficient Processor for FPGA-based Systems 172

6.4 Interrupt Processing and Context Switching 1796.5 Interfacing Techniques 182

6.5.1 Pipelining Input and Output 1826.5.2 Parallel and Serial Interfaces, Counters and Timers 1836.5.3 Input/Output Buses 1856.5.4 Interfaces and Memory Expansion for the CPU2 192

6.6 Standard Processor Architectures 1936.6.1 Evaluation of Processor Architectures 1936.6.2 Micro Controllers 1946.6.3 A High-Performance Processor Core for ASIC Designs 1986.6.4 Super-Scalar and VLIW Processors 199


7 System-Level Design 2057.1 Scalable System Architectures 205

7.1.1 Architecture-Based Hardware Selection 2057.1.2 Interfacing Component Processors 2067.1.3 Architectures with Networking Building Blocks 208

7.2 Regular Processor Network Structures 2117.3 Integrated Processor Networks 2187.4 Static Application Mapping and Dynamic Resource Allocation 2217.5 Resource Allocation on Crossbar Networks and FPGA Chips 2247.6 Communicating Data and Control Information 2267.7 The π -Nets Language for Heterogeneous Programmable Systems 228

7.7.1 Defining the Target System 2307.7.2 Algorithms and Elementary Data Types 2327.7.3 Application Processes and Communications 2357.7.4 Configuration and Reconfiguration 2387.7.5 Hardware Targets 240

CONTENTS•viii7.7.6 Software Targets 2437.7.7 Architectural Support for HLL Programming 244


8 Digital Signal Processors 2498.1 Digital Signal Processing 249

8.1.1 Analog-to-Digital Conversion 2498.1.2 Signal Sampling 2518.1.3 DSP System Structure 253

8.2 DSP Algorithms 2558.2.1 FIR Filters 2568.2.2 Fast Fourier Transform 2578.2.3 Fast Convolution and Correlation 2608.2.4 Building Blocks for DSP Algorithms 261

8.3 Integrated DSP Chips 2638.4 Integer DSP Chips – Integrated Processors for FIR Filtering 266

8.4.1 The ADSP21xx Family 2678.4.2 The TMS320C54x Family 2708.4.3 Dual MAC Architectures 271

8.5 Floating Point Processors 2738.5.1 The Sharc Family 2738.5.2 The TMS320C67xx Family 276

8.6 DSP on FPGA 2798.7 Applications to Underwater Sound 279

8.7.1 Echo Sounder Design 2808.7.2 Beam Forming 2838.7.3 Passive Sonar 286


References 291

Index 295

Preface

This book is intended as an introduction to the design of digital processors that are dedicated toperforming a particular task. It presents a number of general methods and also covers generalpurpose architectures such as programmable processors and configurable logic. In fact, thededicated digital system might be based on a standard microprocessor with dedicated software,or on an application-specific hardware circuit. It turns out that there is no clear distinctionbetween hardware and software, and a number of techniques like algorithmic constructionsusing high-level languages, and automated design using compilation apply to both. For sometime, dynamic allocation methods for storage and other resources have been common for soft-ware while hardware used to be configured statically. Even this distinction vanishes by usingstatic allocation techniques to optimize software functions and by dynamically reconfiguringhardware substructures.

The emphasis in this book is on the common, system-level aspects of hardware andsoftware structures. Among these are the timing of computations and handshaking that needto be considered in software but play a more prominent role in hardware design. The sameapplies to questions of power consumption. System design is presented as the optimizationtask to provide certain functions under given constraints at the lowest possible cost (a taskconsidered as one of the basic characteristics of engineering). Detailed sample applications aretaken from the domain of digital signal processing. The text also includes some detail on recentFPGA (field programmable gate arrays), memory, and processor, in particular DSP (digitalsignal processor) chips. The selected chips serve to demonstrate the state of the art and variousdesign aspects; there remain interesting others that could not be covered just for reasons ofspace. The statements made in the text regarding these chips are all conclusions by the authorthat may be erroneous due to incomplete or wrong data. Viable corrections mailed to the authorwill be posted to a page dedicated to this book at the web site: www.tu-harburg.de/ti6/ddpalong with other supplementary information.

A non-standard topic of special interest covered in this book will be the design of simpleyet efficient processors that can be implemented on FPGA chips, and, more generally, thebalance between serial and parallel processing in application-specific processors. A processordesign of this kind is presented in detail (the ‘CPU2’ in Chapter 6), and also a system-leveldesign tool supporting this processor and others. The VHDL source code for a version of this

PREFACE•xprocessor can also be downloaded from [55] along with the software tools for it for free use inFPGA designs and for further experimentation. Licensing and checking for patent protectionare only required for commercial usage.

The book is the outcome of lectures on digital systems design, DSP, and processornetworks given at the Technical University of Hamburg-Harburg, and is intended as an intro-ductory textbook on digital design for students of electrical engineering and computer science.It presents a particular selection of topics and proposes guidelines to designing digital systemsbut does not attempt to be comprehensive; to study a broad subject such as digital processing,further reading is needed. As an unusual feature for an introductory text, almost every chapterdiscusses some subject that is non-standard and shows design options that may be unexpectedto the reader, with the aim of stimulating further exploration and study. These extras can alsoserve as hooks to attach additional materials to lectures based on this book.

The book assumes some basic knowledge on how to encode numbers, on Boolean func-tions, algorithms and data structures, and programming, i.e. the topics usually covered in in-troductory lectures and textbooks on computer science such as [13, 20]. Some particular DSPalgorithms and algorithms for constructing arithmetic operations from Boolean operations aretreated. The system designer will, however, need additional knowledge on application specific,e.g. DSP algorithms [14] and more general algorithms [15]. Also, semiconductor physics andtechnology are only briefly discussed to have some understanding of the electronic gate circuitsand their power consumption, mostly concentrating on CMOS technology [10]. For the mainsubject of this book, the design of digital systems, further reading is recommended, too. Inbooks such as [2, 49] the reader will find more detail on standard topics such as combinatorialcircuit and automata design. They are treated rather briefly in this book and are focused onparticular applications only in order to cover more levels of the design hierarchy. The text con-centrates on the hierarchical construction of efficient digital systems starting from gate levelbuilding blocks and given algorithms and timing requirements. Even for these topics, furtherreading is encouraged. Through the additional literature the reader will gain an understandingof how to design both hardware and software of digital systems for specific applications. Thereferences concentrate on easily accessible books and only occasionally cite original papers.

Chapter 1 starts with some general principles on how to construct digital systems frombuilding blocks, in particular the notion of algorithms, which applies to both hardware andsoftware. It discusses complexity issues including minimization, and, in particular, the timingand synchronization of computations. The presentation proceeds at a fairly abstract levelto aspects of system-level specifications and introduces some important metrics for digitalsystems to be used in the sequel, e.g. the percentage of time in which a circuit such as an ALU(arithmetic and logic unit) of a processor performs computational steps.

Chapter 2 enters into the technological basics of digital computers, including transistorcircuits and the properties of current integrated chips. It provides the most elementary hardwarebuilding blocks of digital systems, including auxiliary circuits such as clock generators, andcircuits for input and output. Configurable logic and FPGA are introduced. Board and chiplevel design are considered, as well as the design of application-specific systems from IP(intellectual property) modules.

Chapter 3 then introduces the method of describing and designing hardware using ahardware description language. VHDL is briefly introduced as a standard language. All VHDLexamples and exercises can be simulated and synthesized with the free design tools providedby FPGA companies such as Xilinx and Altera.

PREFACE •xiChapter 4 proceeds to the realization of arithmetical functions as special Boolean func-

tions on encoded numbers, including the multiply-add needed for DSP. Serial versions ofthese functions are also presented, and some special topics such as the distributed arithmeticrealized with FPGA cells.

Chapter 5 further elaborates on the aspects of sequential control, starting with schedul-ing and operand storage. It includes a discussion of those automata structures suitable forgenerating control sequences, and, in particular, a memory-based realization of the controllerautomaton.

In Chapter 6 the concept of a programmable processor is discussed, including the handlingof input and output, interrupt processing and DMA. The presentation of sequential processorsdoes not attempt to trace the historical development but continues a logical path started inChapter 5 towards what is needed for efficient control. This path does not always duplicatecontemporary solutions. Two simple processor designs are presented to demonstrate varioustechniques to enhance the ALU efficiency mentioned above. Some standard microproces-sors are discussed as well, and techniques used to boost performance in modern high-speedprocessors.

Chapter 7 proceeds to the system level where processors and FPGA chips are just com-ponents of a scalable architecture (as defined in Chapter 1), and the systems based on suchan architecture are networks of sequential processors or heterogeneous networks includingboth FPGA-based logic circuits and programmable processors. The components need to beequipped with interfaces supporting their use in networks. The chapter also sketches a system-level design tool taking up several of the ideas and concepts presented before. It demonstratesa convenient setting for a compiler support surpassing the individual target processor or pro-grammable logic circuit. The chapter also explains some automatic allocation techniques usedby compilers and FPGA design tools.

Chapter 8 discusses the application domain of digital signal processing starting fromthe basics of signal sampling and proceeding to application-specific processors. Some recentcommercial signal processors are discussed in detail, and the use of FPGA chips for DSP isconsidered. The final section discusses some specific examples of embedded digital systemsperforming high-speed real-time DSP of sonar (underwater sound) signals.

Throughout this book, the notion of a ‘system’ encompassing components and subsystemsplays a crucial role. Processors will be viewed as complex system components, and processor-based systems as sub-systems of a digital system. In general, a digital system will containseveral processor-based sub-systems depending on the performance and cost requirements.Dedicated digital systems are usually embedded sub-systems of some hybrid supersystem,and the operations performed by the sub-system need to be consistent with the operation ofthe entire system. It may not be enough to specify the interfaces with the supersystem, butnecessary to analyze the dependency on other sub-systems of the total system that may bevariable to some degree or be invariable givens. The reader is encouraged to proceed withthis analysis to further levels, in particular to the dependencies within the social environmentof engineering work, even if their analysis becomes more and more complex. It is a shameto see the beautiful technology of digital systems being applied to violate and destroy goodsand lives. The judgement will, however, be different if the same techniques are used to createcountermeasures. Fortunately, there are many applications in which the benefits of an achieve-ment are not as doubtful, and the engineer may choose to concentrate on these.

1Digital Computer Basics

1.1 DATA ENCODING

A digital system is an artificial physical system that receives input at a number of sites andtimes by applying input ‘signals’ to it and responds to these with output that can later bemeasured by some output signals. A signal is a physical entity measurable at some sites anddepending on time. The input signals usually encode some other, more abstract entities, e.g.numbers, and so do the outputs. In a simple setting, the numbers encoded in the output may bedescribed as a function of the input numbers, and the artificial system is specifically designedto realize this function. More generally, the output may also depend on internal variables ofthe system and the sites and times at which it occurs may be data dependent. The main topic ofthis book is how to systematically construct a system with some wanted processing behavior,e.g. one with a prescribed transfer function.

The application of such a system with a particular transfer function first involves theencoding of the input information into physical values that are applied at the input sites forsome time by properly preparing its input signals, then some processing time elapses until theoutput signals become valid and encode the desired output values, and finally these encodedvalues are extracted from the measured physical values. For the systems considered, the inputand output signals will be electrical voltages measured between pairs of reference sites andrestricted to range within some allowed intervals.

In contrast to analogue circuits, an input signal to a digital system at the time at which itis valid is restricted to ranging within a finite set of disjoint intervals. These intervals are usedto encode or simply are the elements of a finite set K. Any two voltages in the same intervalrepresent the same element of K (Figure 1.1). Moreover, the circuits are designed so that forwhatever particular values in the allowed intervals present at the inputs, the output will alsorange in allowed intervals and hence encode elements of K. If two sets of input values are‘equivalent’, i.e. represent the same elements of K, then so are the corresponding outputs.Thus, the digital system computes a function mapping tuples of elements of K (encoded at the

Dedicated Digital Processors: Methods in Hardware/Software System Design. F. Mayer-LindenbergC© 2004 John Wiley & Sons, Ltd ISBNs: 0-470-84444-2

DIGITAL COMPUTER BASICS•2allowed voltage intervals

corresponding elements of K k1 k2 k3 ….. knkn-1

Figure 1.1 Range of an n-level digital signal

different input sites and times) to tuples of elements of K encoded by the outputs, i.e. a functionKn→ Km. The continuum of possible voltages of a digital signal is only used to representthe finite set K. This is compensated by the fact that the assignment of output values doesnot suffer from the unavoidable variations of the signals within the intervals due to loading,temperature, or tolerances of the electronic components. The correspondence of signal levelsin the allowed intervals to elements of K is referred to as the physical encoding.

The most common choice for K is the two elements set B = {0, 1}. This restricts thevalid input and output values to just two corresponding intervals L and H (‘low’ and ‘high’),e.g. the intervals L = [−0.5, 2] V and H = [3, 5.5] V of voltages between two reference sites.Most often, one of the reference sites is chosen to be a ‘ground’ reference that is common toall input and output signals. If there are n input sites and times to the system as well as theground, the voltages at these encode n-tuples in the set Bn, and the outputs at m sites and timesdefine an element of Bm. Then the system computes a ‘Boolean’ function:

f: Bn → Bm

To let the system compute f(b) for some specific input tuple b, one connects the input sites tospecific voltages in the L and H intervals w. r. t. the ground reference, e.g. 0V or 5V, perhapsby means of switches, and the output tuple is determined from voltage measurements at theoutput sites.

The fact that the same type of signal occurs both at the input and at the output sites isintentional as this permits digital circuits to be cascaded more easily by using the output ofone machine as the input of another to construct more complex processing functions. Thismethod will be used to construct machines computing arbitrary functions f as above fromsimple ones. If the output sites and times of the first machine are not identical to the input sitesand times of the second, some effort is needed to produce a copy of the output of the first asthe input of the second. In order to communicate an output voltage of a circuit site w. r. t. theground reference to a nearby input site of another circuit at nearly the same time, it sufficesto connect the sites by a metal wire that lets them assume the same potential. If the sites areapart and do not share a common ground reference, more effort is involved, and if the copy ofthe output value is needed later when the signal at the output has been allowed to change, thevalue must be communicated through some storage device. Copying the same output value toseveral different input sites of other circuits involves still more effort. This can be done by firstapplying the ‘fan-out’ function mapping an input x to the tuple (x, . . . ,x) and then connectingthe individual output components each to one of the inputs.

To build digital systems that process more general information than just binary tuples, asecond level of ‘logic’ encoding is used as well as the physical one. The input information,e.g. a number, is first encoded as a binary n-tuple (a bit field), which in turn is represented tothe machine as a voltage, as explained above. Similarly, the output m-tuple represented by theoutput voltages needs to be further decoded into a number. Obviously, only finite sets can be

DATA ENCODING •3encoded by assigning different n-bit codes to their elements. If N and M are finite sets, binaryencodings of N and decodings into M are mappings:

e: M → Bnd: Bm → M

As in the case of the physical encoding, a decoding need not be injective and defined on allof Bm, i.e. different binary m-tuples may encode the same element of M, and not all tuplesneed to be used as codes. By composing it with e and d, the Boolean function f computed bya digital system translates into the abstract function:

f ◦: N → M defined byf ◦(n) = d( f (e(n))) for n ∈ N

The function f ◦ is also said to be computed by the system although e and d need to be appliedbefore and after the operation of the machine. For the data exchange between subsystems ofa digital system the codes can be chosen arbitrarily, but for the external input and output of asystem intended to compute a given function f ◦, e and d are chosen so that their application isstraightforward and useful to further represent the data, using e.g. the binary digits of a numberas its code both for the input and the output. Otherwise one could simply use the e(n) for someencoding e as the codes f(n) of the results f ◦(n). This would satisfy the above requirementson codes, but make the operation of the machine insignificant and put all computational effortinto the interpretation of the output codes.

Every digital system will necessarily have limited numbers of input and output signalsites. These numbers, however, do not limit the sizes of the input and output codes that can beoperated by the system. By applying sequences of data one by one to the same n input sitesor collecting sequences of outputs from the same n output sites at k different, distinguishedtimes (serial input and output), the input and output codes actually range in Bn* k. Even asingle input or output signal can pass tuples of arbitrary size. Moreover, digital systems areoften used repetitively and then transform virtually unlimited sequences of input tuples intounlimited sequences of output tuples.

1.1.1 Encoding Numbers

In this section we very briefly recall the most common choices for encoding numbers, and hintat some less common ones. Once bit fields encode numbers, the arithmetic operations translateinto Boolean functions, and digital systems can be applied to perform numeric computations.Of particular interest are encodings of numbers by bit fields of a fixed size. Fixed size fieldscan be stored efficiently, and the arithmetical operations on them which are still among themost elementary computational steps can be given fast implementations. However, it is onlyfinite sets of numbers that can be encoded by fields of a fixed size, and no non-trivial finiteset of numbers is closed under the add and multiply operations. The maximum size of theencoded numbers will be exceeded (overflow), and results of the add and multiply operationwithin the size range may first have to be rounded to the nearest element of the encoded set.These difficulties can be overcome by tracking rounding errors and overflows and switchingto encodings for a larger set of numbers by wider bit fields if required.

The most common binary encoding scheme for numbers within a digital system is thebase-2 polyadic encoding on the finite set of integers from 0 to 2n−1 which assigns to a number

DIGITAL COMPUTER BASICS•4m the unique tuple b = (b0, . . . ,bn−1) (in string notation the word ‘bn−1. . . b0’) of its binarydigits defined by the property:

m = b0 + 2b1 + 4b2 + · · · =n−1∑i=0

bi2i (1)

In particular, for n = 1, the numbers 0,1 are encoded in the obvious way by the elements 0,1∈B, and B can be considered as a subset of the integers. The terms ‘unsigned binary number’or simply ‘binary number’ are often used to refer to this standard base-2 polyadic encoding.Every positive integer can be represented as a binary number by choosing n high enough. The(n+k)-bit binary code of a number m

DATA ENCODING •5(e.g., 1∗2k = 1∗2k+1 − 1∗2k) and also covers negative numbers. The code of the number 0 isstill unique, and the sign of a number is the sign of the highest non-zero digit.

Some encodings use particular ways to describe a number in terms of others and thenconcatenate codes for these others, or map the numbers into another mathematical structurefor which an encoding is already defined. An integer k is e.g. uniquely characterized by itsremainders after the integer divisions by different, mutually prime ‘bases’ m1, m2, · · · as longas 0 ≤ k < �im i. The choice of m1 = 2n –1, m2 = 2n e.g. gives a unique 2n-bit encoding forintegers in the range 0 ≤ k < m1∗m2.

The code set Bn is in a one-to-one correspondence to the set Pn of ‘binary’ polynomialsof degree

DIGITAL COMPUTER BASICS•663 62 52 51 0

s ex man

Figure 1.2 Floating point code fields

that force a result within the encoded set. In some applications even overflows are handled by‘approximating’ the true results by the most positive or the most negative representable value(saturation). It must be checked whether the result of a computation is within the requirederror bounds for an application. This is mostly done statically by analyzing the algorithm tobe executed and selecting an appropriate number of places. It is also possible to dynamicallytrack the errors of a computation and to adaptively increase the number of places if the errorbecomes too high. The input data to a digital system may themselves be imprecise, e.g. derivedfrom measurements of some continuous signal. Then the number of places is chosen so thatthe extra ‘quantization’ error due to the limited set of representable numbers is sufficientlysmall in comparison to the measurement error.

An n-bit fixed point encoding is for rational numbers q of the form q = m/2r with m beingan integer in the range −2n−1 ≤ m < 2n−1 and r being fixed. It is obtained by simply usingthe n-bit twos complement code of m defined by equation (3) as a code for q and correspondsto a scaling of the two’s complement integer range (the redundant signed bit code could alsobe used). Usually, r = n − 1 so that −1 ≤ q < 1 and

q = −bn−1 +n−2∑i=0

bi2i−n+1 (6)

Floating point encoding is for rational numbers h of the form h = m∗2r with 1 ≤ m < 2 anda variable integer r in the range –2p−1 ≤ r < 2p−1. It is obtained by concatenating a q-bitfixed-point code of m − 1 (the q-bit binary code of the integer (m − 1)∗2q) with the p-bitbinary code for the positive integer r + off, with off = 2p−1 − 1. An extra bit is added for thesign, and numbers m∗2r with r = − 2p−1 + 1 and 0 ≤ m < 1 (called denormalized) use theq-bit fixed point code for m. The total code size is thus p + q + 1 (Figure 1.2). Thus, if ‘man’,‘ex’ are the non-negative integers encoded by the mantissa and exponent fields then for thenormalized case of ex = 0:

h = ±(2−q man + 1)∗2ex−off (7)The common 32-bit IEEE standard format is q = 23, p = 8 (‘single precision’) and coversnumbers in the range of ± 3.37∗1038 that are defined to 6–7 decimal places. For the 64-bit‘double precision’ format q = 52, p = 11, the range is ± 1.67∗10308 with numbers being definedto 15–16 decimal places. There are standard 40- and 80-bit floating point formats as well.

A simple non-standard floating point encoding for numbers m∗2r is obtained by con-catenating a fixed point code for m and a twos complement integer code for r yet droppingthe normalization requirement of 1 ≤ m < 2. Then, different codes may represent the samenumber using different numbers of places for the mantissa. This can be used to track theprecision of a result. Floating point arithmetics including unnormalized floating point numberrepresentations are discussed in more detail in [54].

Another non-standard m-bit encoding similar to a floating point format with a zero bitmantissa field is the logarithmic encoding for numbers of the form ±qn for some fixed realnumber q close to 1 and − 2m−2 < n < 2m−2. It is obtained by concatenating the (m-1)-bit

DATA ENCODING •7binary code of n +2m−2 with a sign bit. The all zeroes code is used to represent the number 0,and the code (0,0,..,0,1) is not used. The logarithmic encoding covers a large dynamic rangeof numbers and eases the implementation of multiplication (which is performed by addingexponents) [82].

1.1.2 Code Conversions and More Codes

If c: N -> Bn and c′: N′ -> Bm are encodings on two sets N,N′ of numbers, the numbers in theset Q = N∩N′ are encoded both by c and c′. The code conversion function is defined on c(Q)⊂ Bn and maps a code c(q) to c′(q). Often, code conversions are implemented as processingfunctions in the digital system and used to switch to the encodings that are most convenient forthe desired processing steps (e.g. compact codes that can be communicated in a shorter time,or ones for which the implementation of the arithmetical operations is particularly simple).

The simplest conversions are those that transform an n-bit binary or twos-complementcode into an m-bit one by appending or stripping zero or sign bits. Other common conversionsare between integer and floating point formats, or floating point formats of different lengths.If real numbers are first approximated by numbers in a set N on which an encoding is defined(as in the case of fixed and floating point encodings), the notion of code conversion becomesrelaxed. The conversion from an m-bit to an n-bit fixed point code (6) is by appending zero bitsif n < m or by performing a rounding operation otherwise, i.e. using the closest approximationby an n-bit fixed point number. A single precision (32-bit) floating point code can be exactlyconverted into a double precision (64-bit) code, but the conversion from double to singleinvolves first performing a rounding operation to the closest number that can be representedin the shorter format. The conversion is defined on all double precision codes of numbers psatisfying –r ≤ p ≤ r where r is the maximum single precision number. If a number is to beconverted that is absolutely greater than the maximum representable one in a fixed or floatingpoint target format, then sometimes saturation to the maximum representable number of theright sign is performed.

Conversions are also needed for input and output. For example, numeric input and outputare most convenient in the multiple decimal digits format whereas the arithmetic operations areimplemented more efficiently for the twos-complement codes. Or, the result of a computationperformed with floating point numbers may be desired in a rational representation p/q. Thisconversion is achieved by means of Euclid’s algorithm to expand it into a continued fraction[12].

Another example is the inputting of an n-bit number code in parallel from n digital inputs.‘In parallel’ means simultaneously from n nearby input sites. As changes at the input sites ortheir reading could occur with slight time delays, there is a chance of misreading the input.If the numeric input is known to only change by increments of ±1, it is useful to encode itin such a way that the codes of two numbers i and i + 1 only differ in one bit position, i.e.have a Hamming distance of 1. The Hamming distance of two codes b = (b0,..,bn−1) and c =(c0,..,cn−1) is defined by:

d(b, c) =n−1∑i=0

|bi − ci|

and simply counts the number of bit positions where the codes differ. The n-bit codes can beinterpreted as the nodes of the n-dimensional hypercube as a subset of n-dimensional space.

DIGITAL COMPUTER BASICS•8Then a code with the desired property defines a Hamiltonian path, i.e. a path along the edgesof the cube that visits every node just once.

This requirement on the codes is fulfilled by the Gray code. The n-bit Gray code gn(k)for integers k in the range 0 ≤ k < 2n is constructed recursively from gn−1 codes by appendingan nth bit as follows:

gn(k) = app(gn−1(k), 0) for k < 2n−1= app(gn−1(2n − 1 − k), 1) for k ≥ 2n−1

If an n-bit code needs to be communicated to a different site by means of electrical signals orthrough storage media, there may be some chance that some of the bits get ‘flipped’ in thisprocess. It is important to be able to detect such errors. To distinguish faulty and correct codetransmissions, the n-bit code is mapped into a longer one, e.g. an (n+1)-bit code constructedby appending a ‘parity’ bit chosen so that the total number of ones becomes even. A single-biterror in the (n + 1)-bit code then results in a parity error, i.e. an odd number of ones and caneasily be detected. For an error-free code the conversion to the original n-bit code is done bystripping the last bit. More generally, the n-bit code is subdivided into k-bit words that areinterpreted as binary numbers and summed up modulo 2k. This ‘check sum’ is appended tothe code to form an (n + k)-bit code before it is communicated. Then many multi-bit errorscan be detected (but not all).

Another common method is to append a CRC (cyclic redundancy check) code computedfrom the original bit sequence. A k-bit CRC code for the bit sequence (b0, . . . , bn−1) is obtainedusing two fixed polynomials p, q, q having the degree k. It is the remainder of the binarypolynomial division of the polynomial with the coefficient vector (0, . . . , 0, bn−1, . . . ,b0)(k zeroes) plus the polynomial pXn by q. The fixed size CRC does not uniquely encode the bitsequence which is usually much longer but it may be used as a fingerprint (a hash code) for it.

Certain codes not only permit the detection of a limited number of bit errors but alsotheir correction [16, 68]. In a code capable of correcting single bit errors any two distincterror-free codes need to have a Hamming distance of >2. Then for a code with a single biterror, there is a unique error-free code at the distance of 1 that is used as the corrected one.Due to the allowed tolerances for the values of the physical signals representing bits and forthe times when they are read off, bit errors tend to be rare. If single bit errors are corrected,the probability of remaining bit errors drops considerably. A code allowing detection andcorrection of single-bit errors is obtained starting from a primitive polynomial p(X) of thedegree n. Let N = 2n−1. Any (N−n)-tuple/polynomial b = (b0, . . . ,bN−n−1) is converted tothe N-tuple m(X) = b(X)∗p(X) before being transmitted. If instead of m(X) the sequencem′(X) = m(X) + Xs with a single error at the bit position s is received, then s can be uniquelyidentified from the remainder of m′(X) after a division by p(X) due to the assumed propertyof p(X) and be corrected. b(X) is the result of the polynomial division of the corrected codeby p(X). If a double bit fault has occurred, m′(X) = m(X) + Xr + Xs, then there is a uniquecode m′′ (X) = b′′ (X)∗p(X) so that m′(X) = m′′ (X) + Xt for some t as the balls of Hammingradius 1 around the correct codes exhaust all of BN. Then m′′ has the opposite parity to m.Double bit faults can hence be detected by restricting the encoding to tuples b of even parity(b and b∗p have the same parity).

While for the error-handling capabilities some extra bits are deliberately invested, large,composite codes representing multi-component objects, e.g., high-dimensional vectors or textfiles composed of many ASCII character codes, need to be converted (‘compressed’) into

ALGORITHMS AND ALGORITHMIC NOTATIONS •9smaller codes for the purposes of communications or storage and to be reconverted (‘de-compressed’) afterwards. Common methods for data compression are (a) the use of differentcode sizes for the elements occurring in the object so that the most frequent ones have theshortest codes; (b) to substitute repetitions of the same component code by a single one plus arepeat count (run length coding); or (c) by encoding the changes between subsequent groupsof components if they can be described by smaller codes. If the large code is the result of acomputation, it can be advantageous to simply encode and communicate the parameters ofthis computation, or a definition of this computation along with the parameters.

Finally, for the purpose of encryption, a standard code may be converted into another onethat cannot be reconverted without knowing some secret parameter. Such code conversionsrelated to secure communications have become important applications for digital systems intheir own right.

1.2 ALGORITHMS AND ALGORITHMIC NOTATIONS

Digital systems are constructed from building blocks of a few types that perform some simpletransfer functions (called elementary). If the input and output signals of these are compatible,the output signals of a building block or copies of them can be used as input signals of another.For electronic building blocks using voltage signals between pairs of reference sites this isparticularly simple. As already pointed out, the output signal sites are directly connected tothe input sites by means of wires that force the potentials at the connected input and outputreference sites to become the same after a short time. If an output value is required as an inputlater, it must be passed through an electronic storage device that conserves or delays it untilthat time.

For two building blocks with the (abstract or encoded) transfer functions f and g, respec-tively, their connection in series computes the composition ‘g ◦ f’, i. e. the function defined by:

(g ◦ f )(x) = g( f (x))The procedure to compute some desired result from given input values is usually given byprescribing a number of computing steps, each performing a particular one of a small choiceof basic functions or operations on the inputs or intermediate values. Such a computationalprocedure is called an algorithm for the desired total transfer function. If the elementaryoperations are the transfer functions of the hardware building blocks, then the algorithm canbe considered to be a set of instructions on how to build a machine with the desired transferfunction from the available building blocks, simply by providing a hardware building block ofthe right type for every operation in the algorithm and connecting outputs to inputs wheneverthe algorithm says that the output is an intermediate value that is used as an operand for theother operation. The same building block can be used at different times for different steps ofthe algorithms if the intermediate results required as their inputs are passed through storagedevices to be available at the time the building block is used for them.

The notion of an algorithm is thus used for a function

f: M → Nbeing represented as a composition of simpler functions or operations. The simple operationsare called elementary, as they are not further reduced to still simpler operations. Besides the

DIGITAL COMPUTER BASICS•10a c a c

*

+ +

*

+ +

b 1 b 1

Figure 1.3 Data flow graph and related circuit (boxes represent machines performingthe operations)

set of elementary operations methods such as ‘◦’ of composing simple or composite functionsmust be defined (and eventually be associated with methods of connecting hardware buildingblocks such as connecting through wires or through storage elements).

1.2.1 Functional Composition and the Data Flow

The most basic composition is functional composition allowing multi-argument functions tobe applied to the results of multiple others, the above ‘◦’ operator being a special case. Func-tional composition translates into feeding outputs from building blocks into selected inputsof multiple others. Algorithms for functions are usually described in a formal mathematicalnotation or an equivalent programming language. If the elementary operations are the arith-metic operations +,∗ etc. on numbers, one can use the ordinary mathematical notation forthese and denote the result of every operation by a unique symbol in order to be able to ref-erence it as an input of another operation (which needs some ‘naming’ notation; we choosethe notation ‘-> name’). Then an algorithm to compute a result d from inputs a, b, c usingfunctional composition only might read:

a + b −> rc + 1 −> sr ∗ s −> d

The same algorithm could also be given by the single composite expression ‘(a+b) ∗(c+1)’. The algorithms of this kind (only using functional composition) always computeexpressions formed from the inputs, constants and elementary operations. The references tothe intermediate results can be represented as a directed graph with the individual operationsas nodes (Figure 1.3). This graph showing the dependency of the operational steps is calledthe data flow graph (DFG) of the algorithm. It must not contain cyclic paths in order to define acomputational procedure. The graph directly translates into a diagram of connected hardwarebuilding blocks.

Obviously, a formal notation as sketched above or taken from a programming languagecan be used to describe the building blocks and interconnections for a digital computer, atleast one designed for computing expressions. If a standard language such as C is used forthis purpose, one has to keep in mind that an assignment ‘r = a + b;’ similar to the aboveone only indicates a naming yet not a store operation to a variable; names and assignmentsmust be unique. Also, the order of assignments does not prescribe an order of execution, as

ALGORITHMS AND ALGORITHMIC NOTATIONS •11the operations are not executed serially. For languages like VHDL dedicated to describinghardware structures (see Chapter 3), this is the standard semantics.

1.2.2 Composition by Cases and the Control Flow

Another common way to compose functional building blocks in an algorithm besides thefunctional composition is the composition by cases corresponding to a condition being trueor false. A mathematical shorthand notation for this would be:

f(x) ={

g(x) if c(x)h(x) otherwise

while in a programming language this is usually indicated by an if/else construction:

if conditionset of operationspass r1

elseother set of operationspass r2

For each of the branches the ‘pass’ statement indicates what will be the result if this branch istaken (in C, one would assign the alternative results to common local variables).

A condition can be considered as a function outputting a Boolean result b (‘true’ or‘false’), and the branches compute the alternative results r1 and r2. A similar behavior wouldresult from applying a select function ‘sel’ to r1, r2 and b that outputs its first argument,r1, if b is true and its second, r2, otherwise, i.e. from a special functional composition. Inmany cases an algorithm using branches can be transformed in this way into one usingfunctional composition only. An important difference, however, is that as a function ‘sel’ canonly be applied if both r1 and r2 are computed before, executing the operations in both brancheswhereas, in the if/else construction, the operations of the unselected branch are not computedat all. Its result does not even need to be defined (e.g. due to a division by zero). If both r1 andr2 can be computed, the ‘sel’ version gives the same result as the if/else version, yet performsmore operations.

The composition with a select function directly translates into a hardware structure if abuilding block performing the selection is provided. This can be used to implement the if/elsecomposition. The operations for computing r1 and r2 must both be implemented on appropri-ate building blocks although only one of them will be used for any particular computation.To implement the if/else in its strict sense, one might look for some control device switch-ing between alternative wirings of elementary building blocks depending on the branchingcondition. Then, the result of the unselected branch is definitely not computed (on a con-ventional computer this is implemented by ‘jumping’ over the instructions of the unselectedbranch). Building blocks may then be shared between both branches. If sharing of buildingblocks is not possible, then at least one does not have to wait for the result of the unselectedbranch.

The if/else branches in an algorithm impose a structure on the set of all operations specifiedin it that is known as its control flow, as they control which steps are actually performed. For analgorithm using branches, the number of operations actually performed becomes dependent

DIGITAL COMPUTER BASICS•12on the input data. If the if/else construction is translated into a controlled hardware structure,the time and energy needed for the computation become data dependent.

If in a complex algorithm a pattern of dependent operations shows up several times, thenone can arrive at a more concise description by giving the pattern a name and substituting itsoccurrences by references to this name, or by using an (implicit) indexing scheme distinguish-ing the individual instances. The latter is done using loops or, more generally, recursion. Here,a substructure (a set of dependent operations) is repeated a finite but, maybe, unlimited numberof times depending on the data. If the number of times is data dependent, conditional branchesand thereby the control flow are involved. In a formal language, the repeated substructure isidentified by enclosing it between begin/end brackets and by naming it for the purpose of therecursive reference. As an example, the recursion for calculating the greatest common divisor(GCD) of two numbers might read:

function gcd(n, m){

if n = m pass nelse if n > m pass gcd(m,n-m)else pass gcd(n,m-n)

}The individual operations cannot be performed by different hardware building blocks, as thetotal number of building blocks is necessarily limited while the number of recursive steps isnot. If, however, a limit value is specified for the loop count or the depth of the recursion, thestraightforward translation of the individual operations into hardware building blocks remainspossible. With such a limitation the result of the recursions is undefined for inputs demandinga recursion depth beyond the limit (a special output value might be used to encode an invalidoutput). The expansion of the GCD recursion into elementary operations up to the depth oftwo starts by the expression shown in Listing 1.1 that could be used to build a GCD computer:

if n = m pass nelse if n > m

n – m -> n1if m = n1 pass melse if m > n1

m – n1 -> m1if n1 = m1 pass n1else pass invalid

elsen1 – m -> n2if m = n2 pass melse pass invalid

elsem – n -> m1if n = m1 pass nelse if n > m1 . . . ..etc. etc. . . . ..

Listing 1.1 Expanded GCD recursion

ALGORITHMS AND ALGORITHMIC NOTATIONS •131.2.3 Alternative Algorithms

Once an algorithm for a function is known that is based on elementary operations for whichcorresponding hardware building blocks and interconnection facilities are available, it mayserve as a blueprint to construct a special purpose computer to execute it. The design ofa digital system will start by giving algorithms for the functions to be performed. Afterthat, the operations need to be assigned to hardware building blocks. This assignment doesnot need to be one-to-one as some building blocks can be used for more than one opera-tion. Our further discussion mostly concentrates on providing the building blocks and on theassignment of operations to building blocks, but the finding of the algorithms is of equalimportance.

An important property of an algorithm is its complexity. It is defined as the numberof operations used as elementary building blocks applied therein. If the algorithm containsbranches, the number of operations actually performed may depend on the input data. Thenthe worst-case complexity and the mean complexity may differ. The complexity depends onthe selection of building blocks. Numeric algorithms, for example, use arithmetic operationson encoded numbers as building blocks, and their complexity would be measured in terms ofthese. If the operations of the algorithm directly correspond to hardware building blocks, thenits complexity measures the total hardware effort. If the operations execute one-by-one on thesame block, the complexity translates into execution time.

A given function may have several algorithms based on the same set of elementary op-erations that differ in their total numbers of elementary operations (i.e. their complexity), andin their data and control flows. Often functions are defined by giving algorithms for them, butother algorithms may be used to execute them. It turns out that there can be a dependency ofthe optimum algorithm w. r. t. some performance metric, say, the speed of execution, on thetarget architecture, i.e. the available elementary building blocks and interconnection methods.Algorithms and architectures must fit. In some cases, algorithms for a given function canbe transformed into other ones with slightly different characteristics using algebraic rules,and the specification of the system design through the algorithm is understood to allow forsuch transformations as a method of optimization. If an operation is associative and com-mutative (such as ‘+’), then for a set S of operands a,b,c,. . . , the result of the compositeoperation

∑S = ..((a + b) + c) + . . .

does not depend on the particular selection and order of individual ‘+’ operations andoperands but only on S. The 2n-1 add operations to add up 2n numbers can, for example,be arranged linearly or in a binary tree (Figure 1.4). Both versions can be used to con-struct processors from subunits performing the individual add operations. The linear ver-sion suffers from each adder stage having to wait for the result of the previous one (whichtakes some processing time) while in the tree version adders can operate simultaneously.If there is just one adder that has to be used sequentially, the tree version cannot exploitthis but suffers from needing more memory to store intermediate results. When just definingthe output, the arrangement of the ‘+’ operations used to construct the system may be leftunspecified.

DIGITAL COMPUTER BASICS•14+

+

+

+

+

+

+

++

++

++

+

Figure 1.4 Equivalent adder arrangements

1.3 BOOLEAN FUNCTIONS

From a given, even a small set of elementary operations, many functions may be constructedby means of algorithms, even if only functional compositions are allowed and branches andrecursion are not used. As the operations performed by a digital system are Boolean functions,it is of interest to consider algorithms for Boolean functions based on some set of elementaryoperations. Any algorithm based on special Boolean operations that e.g. implement arithmeticoperations on encoded numbers can be expanded into one based on the elementary operationsonce the arithmetic functions themselves have algorithms based on these.

1.3.1 Sets of Elementary Boolean Operations

Some common Boolean operations that are used as building blocks in Boolean algorithms arethe unary NOT operation defined by NOT(0) = 1, NOT(1) = 0, the dual input AND, OR,NAND, NOR, XOR (exclusive OR) operations defined by:

x y AND(x, y) OR(x, y) NAND(x, y) NOR(x, y) XOR(x, y)

0 0 0 0 1 1 01 0 0 1 1 0 10 1 0 1 1 0 11 1 1 1 0 0 0

and the 3-argument SEL operation defined as in section 1.2.2 by:

SEL(x, y, 0) = x, SEL(x, y, 1) = y for all x, y ∈ BThe operations AND, OR, and XOR are commutative and associative so that they may beapplied to sets of operands without having to specify an order of evaluation.

Theorem: Every totally defined function f: Bn → B can be obtained as a purely functionalcomposition (a composite expression) of the constants 0,1 and operations uniquely taken fromany particular among the following sets of operations

(1) AND, OR and NOT(2) NAND(3) SEL(4) AND, XOR

BOOLEAN FUNCTIONS •15f(0,0,0)f(1,0,0)

f(0,1,0)f(1,1,0)

f(x,y,z)f(0,0,1)f(1,0,1)

f(0,1,1)f(1,1,1)

x z

SEL

SEL

SEL

SEL

SEL

SEL

SEL

y

Figure 1.5 Selector tree implementation of a Boolean function

In other words, every function has, at least, one algorithm over each of these sets of elementaryBoolean operations. Although the theorem states the existence of such algorithms withoutexplicitly indicating how to obtain them, its proof is by actually constructing them startingfrom a table listing the values of the function. For the single operation set consisting of theSEL operation only, the algorithm realizing a given function f is the selector tree shown inFigure 1.5 as a composition of functional building blocks. This algorithm uses 2n − 1 SELbuilding blocks. The same SEL tree structure can be used for every function f by composingit with the appropriate input constants.

For the AND, OR and NOT set, a particular algorithm that can be immediately read offfrom the function table of f is the so-called disjunctive normal form (DNF) for f. If one writes‘xy’ for ‘AND(x, y)’, ‘x + y’ for ‘OR(x, y)’, ‘x0’ for ‘NOT(x)’ and ‘x1’for ‘x’, this algorithm is:

f (x1,..,xn) =∑

x1b1..xn

bn

where the sum (a multiple OR operation) extends over all n-tuples (b1,..,bn) for whichf(b1,..,bn) = 1. That this really is an algorithm for f is easily verified using the fact that thex1b1..xnbn term takes the value of 1 exactly on the tuple (b1,..,bn).

To prove that a particular set of building blocks generates all Boolean functions, it isotherwise enough to verify that the AND, OR and NOT functions can be obtained from it. Forexample, AND, OR and NOT are partial functions of SEL obtained by keeping some of theSEL inputs constant (composing with the constants 0,1):

(1) NOT(z) = SEL(1, 0, z)(2) AND(y, z) = SEL(0, y, z)(3) OR(x, z) = SEL(x, 1, z).

Vice versa, as explained above, the SEL, NAND and XOR operations are obtained as com-binations of AND, OR and NOT using their DNF algorithms. The XOR operation can beexpressed as

(4) XOR(x, z) = SEL(x, NOT(x), z)Each of the sets of operations in the theorem can hence be used as a basis to construct Booleanfunctions and digital systems once they are implemented as hardware building blocks. Theexistence of these finite and even single element sets of basic operations generating all Booleanfunctions implies that general digital systems can be constructed from very small selections ofbuilding blocks. SEL was introduced in section 1.2.2 as an operation implementing control. It

DIGITAL COMPUTER BASICS•16actually performs no operation resulting in new data values but only passes the selected argu-ment. A machine capable of moving data and performing conditional branches can thereforecompute every Boolean function by performing a suitable sequence of these.

In the recent discussion on quantum computers and reversible computations [8] bijective(both injective and surjective) Boolean functions from Bn onto itself are considered. EveryBoolean function f: Bn → B can be obtained by composing a bijective Boolean function withextra constant inputs and only using some of its outputs. The mapping

(b0, b1, .., bn) -> (b0, .., bn−1, XOR(f (b0, .., bn−1), bn))

is, in fact, bijective, and with bn set to 0, the last component of the result becomes f(b0,..,bn−1).Arbitrary bijective mappings can be constructed from simple ones like the exchange functionXCH(x, y) = (y, x) or the Fredkin controlled exchange function on B3 defined by

F (x, y, 0) = (x, y, 0), F (x, y, 1) = (y, x, 1)

1.3.2 Gate Complexity and Simplification of Boolean Algorithms

The complexity of an algorithm describing a composite circuit of AND, OR, NOT ‘gates’or similar building blocks is also known as its gate count. A given function may have manydifferent algorithms based on a given set of elementary operations. This may be exploited bysearching for one of minimum complexity (there could be other criteria as well), starting froman algorithm such as the selector tree or the DNF that can be read off from the function tableand then simplifying it using appropriate simplification steps.

For the selector tree implementation of a function, simplification steps are the applicationof the rule SEL(x, x, z) = x that eliminates an SEL building block if the inputs to select fromare the same values, and the rule SEL(0, 1, z) = z. Also, the formulas (1) to (4) in section1.3.1 can be used to replace SEL building blocks by simpler ones. The leftmost column of notless than 2n−1 selectors in Figure 1.5 can be substituted this way by a single inverter (if at all)as the only possible outputs to the next column are the values SEL(0, 0, x) = 0, SEL(1, 0, x)= NOT(x), SEL(0, 1, x) = x and SEL(1, 1, x) = 1.

For the AND, OR and NOT building blocks, the well-known rules of Boolean algebra[12] can be used to simplify algorithms, in particular the rules

ab + ac = a(b + c)(a + b)(a + c) = a + bca + a = a , and aa = 0a(a + b) = a , and a + ab = a0a = 0, 1 + a = 1, a + 0 = a , and a1 = aa + a◦ = 1, aa◦ = 0, and (a◦)◦ = au0v0 = (u + v)0, and u0 + v0 = (uv)0 (de Morgan’s laws)

the scope of which can be further extended by applying the commutative and associative lawsfor the AND and OR operations. All of them reduce the number of operations to be performed.For example, the DNF for a Boolean function f is more complex, the more ones there are inthe function table. By applying de Morgan’s laws to the DNF of the negated function f0, oneobtains the CNF (the conjunctive normal form):

f (x1, .., xn) = �((

x1b1

)◦ + · · +(xnbn)◦)

Methods in Hardware/Software System Design€¦ · DEDICATED DIGITAL PROCESSORS Methods in...

Documents

Transcript of Methods in Hardware/Software System Design€¦ · DEDICATED DIGITAL PROCESSORS Methods in...