DIGITAL DESIGN OF SIGNAL PROCESSING SYSTEMS

11
DIGITAL DESIGN OF SIGNAL PROCESSING SYSTEMS A PRACTICAL APPROACH Shoab Ahmed Khan National University of Sciences and Technology (NUST), Pakistan ©WILEY A John Wiley and Sons, Ltd, Publication

Transcript of DIGITAL DESIGN OF SIGNAL PROCESSING SYSTEMS

DIGITAL DESIGN OF SIGNAL PROCESSING SYSTEMS A PRACTICAL APPROACH

Shoab Ahmed Khan National University of Sciences and Technology (NUST), Pakistan

©WILEY A John Wiley and Sons, Ltd, Publication

Contents

Preface xv Acknowledgments xix

Overview 1 1.1 Introduction 1 1.2 Fueling the Innovation: Moore's Law 3 1.3 Digital Systems 3

1.3.1 Principles 3 1.3.2 Multi-core Systems 6 1.3.3 NoC-based MPSoC 7

1.4 Examples of Digital Systems 8 1.4.1 Digital Receiver for a Voice Communication System 8 1.4.2 The Backplane of a Router 10

1.5 Components of the Digital Design Process 10 1.5.1 Design 10 1.5.2 Implementation 11 7.5. J Verification 11

1.6 Competing Objectives in Digital Design 11 1.7 Synchronous Digital Hardware Systems 11 1.8 Design Strategies 12

1.8.1 Example of Design Partitioning 14 1.8.2 NoC-based SoC for Carrier-class VoIP Media Gateway 16 1.8.3 Design Flow Migration 18

References 19

Using a Hardware Description Language 21 2.1 Overview 21 2.2 About Verilog 22

2.2.1 History 22 2.2.2 What is Verilog? 22

vi Contents

2.3 2.4 2.5

2.6

2.7

2.8 2.9

Exei Refe

System Design Flow Logic Synthesis Using the Verilog HDL 2.5.1 2.5.2 2.5.3 2.5.4 2.5.5 2.5.6 2.5.7

Modules Design Partitioning Hierarchical Design Logic Values Data Types Variable Declaration Constants

Four Levels of Abstraction 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.6.6 2.6.7

Switch Level Gate Level or Structural Modeling Dataflow Level Behavioral Level Verilog Tasks Verilog Functions Signed Arithmetic

Verification in Hardware Design 2.7.1 2.7.2 2.7.3

2.7.4 2.7.5 Examp

Introduction to Verification Approaches to Testing a Digital Design Levels of Testing in the Development Cycle Methods for Generating Test Cases Transaction-level Modeling

e of a Verification Setup SystemVerilog 2.9.1 2.9.2 2.9.3 2.9.4 2.9.5 2.9.6 2.9.7 2.9.8 2.9.9 2.9.10 2.9.11 2.9.12 2.9.13 2.9.14 2.9.15 2.9.16

•cises rences

Data Types Module Instantiation and Port Listing Constructs of the C/C++ Type for and do-while Loops The always Procedural Block The final Procedural Block The unique and priority Case Statements Nested Modules Functions and Tasks The Interface Classes Direct Programming Interface (DPI) Assertion Packages Randomization Coverage

23 23 24 24 25 26 29 30 30 31 31 32 32 33 39 55 56 56 57 57 58

59 59 60 61 61 61 63 64 65 65 66 66 67 67 68 70 72 73 74 74 75 75 80

Contents vü

System Design Flow and Fixed-point Arithmetic 81 3.1 Overview 81 3.2 System Design Flow 83

3.2.1 Principles 83 3.2.2 Example: Requirements and Specifications of a UHF Software-

defined Radio 85 3.2.3 Coding Guidelines for High-level Behavioral Description 86 3.2.4 Fixed-point versus Floating-point Hardware 88

3.3 Representation of Numbers 89 3.3.1 Types of Representation 89 3.3.2 Two's Complement Representation 89 3.3.3 Computing Two's Complement of a Signed Number 90 3.3.4 Scaling 91

3.4 Floating-point Format 92 3.4.1 Normalized and Denormalized Values 93 3.4.2 Floating-point Arithmetic Addition 95 3.4.3 Floating-point Multiplication 96

3.5 Qn.m Format for Fixed-point Arithmetic 96 5.5.7 Introducing Qn.m 96 3.5.2 Floating-point to Fixed-point Conversion of Numbers 97 3.5.3 Addition in Q Format 98 3.5.4 Multiplication in Q Format 98 3.5.5 Bit Growth in Fixed-point Arithmetic 101 3.5.6 Overflow and Saturation 102 3.5.7 Two's Complement Intermediate Overflow Property 103 3.5.8 Corner Cases 105 3.5.9 Code Conversion and Checking the Corner Case 106 3.5.10 Rounding the Product in Fixed-point Multiplication 107 3.5.11 MATLAB® Support for Fixed-point Arithmetic 110 3.5.12 SystemC Support for Fixed-point Arithmetic 111

3.6 Floating-point to Fixed-point Conversion 112 3.7 Block Floating-point Format 113 3.8 Forms of Digital Filter 115

3.8.1 Infinite Impulse Response Filter 115 3.8.2 Quantization of IIR Filter Coefficients 117 3.8.3 Coefficient Quantization Analysis of a Second-order Section 123 3.8.4 Folded FIR Filters 126 3.8.5 Coefficient Quantization of an FIR Filter 127

Exercises 128 References 132

Mapping on Fully Dedicated Architecture 133 4.1 Introduction 133 4.2 Discrete Real-time Systems 134 4.3 Synchronous Digital Hardware Systems 136 4.4 Kahn Process Networks 137

Vll l Contents

4.4.1 4.4.2 4.4.3 4.4.4 4.4.5

Methoi 4.5.1 4.5.2 4.5.3 4.5.4 4.5.5 4.5.6 4.5.7 4.5.8 4.5.9 4.5.10 4.5.11 4.5.12 4.5.13

Introduction to KPN 137 KPN for Modeling Streaming Applications 139 Limitations of KPN 144 Modified KPN and MPSoC 144 Case Study: GMSK Communication Transmitter 145

4.5 Methods of Representing DSP Systems 148 Introduction 148 Block Diagram 149 Signal Flow Graph 151 Dataflow Graph or Data Dependency Graph 151 Self-timed Firing 156 Single-rate and Multi-rate SDFGs 156 Homogeneous SDFG 158 Cyclo-static DFG 158 Multi-dimensional Arrayed Dataflow Graphs 160 Control Flow Graphs 160 Finite State Machine 161 Transformations on a Dataflow Graph 162 Dataflow Interchange Format (DIF) Language 162

4.6 Performance Measures 162 4.6.7 Iteration Period 162 4.6.2 Sampling Period and Throughput 163 4.6.3 Latency 163 4.6.4 Power Dissipation 164

4.7 Fully Dedicated Architecture 164 4.7.1 The Design Space 164 4.7.2 Pipelining 165 4.7.3 Selecting Basic Building Blocks 167 4.7.4 Extending the Concept of One-to-One Mapping 168

4.8 DFG to HW Synthesis 168 4.8.1 Mapping a Multi-rate DFG in Hardware 169 4.8.2 Centralized Controller for DFG Realization 171

Exercises 173 References 181

Design Options for Basic Building Blocks 183 5.1 Introduction 183 5.2 Embedded Processors and Arithmetic Units in FPGAs 183 5.3 Instantiation of Embedded Blocks 186

5.3.1 Example of Optimized Mapping 190 5.3.2 Design Optimization for the Target Technology 192

5.4 Basic Building Blocks: Introduction 194 5.5 Adders 194

5.5.7 Overview 194 5.5.2 Half Adders and Full Adders 195 5.5.3 Ripple Carry Adder 196

5.5.4 5.5.5 5.5.6 5.5.7 5.5.8 5.5.9 5.5.10 5.5.11

Fast Adders Carry Look-ahead Adder Hybrid Ripple Carry and Carry Look-ahead Adder Binary Carry Look-ahead Adder Carry Skip Adder Conditional Sum Adder Carry Select Adder Using Hybrid Adders

5.6 Barrel Shifter 5.7 Carry Save Adders and Compressors

5.7.1 5.7.2 5.7.3

5.8 Paralle 5.8.1 5.8.2 5.8.3 5.8.4 5.8.5 5.8.6

5.9 Two's 5.9.1 5.9.2 5.9.3 5.9.4 5.9.5

Carry Save Adders Compression Trees Dot Notation

I Multipliers Introduction Partial Product Generation Partial Product Reduction A Decomposed Multiplier Optimized Compressors Single- and Multiple-column Counters

Complement Signed Multiplier Basics Sign Extension Elimination String Property Modified Booth Receding Multiplier Modified Booth Receded Multiplier in RTL Verilog

5.10 Compression Trees for Multi-operand Addition 5.11 Algorithm Transformations for CSA Exercises References

Multiplier-less Multiplication by Constants 6.1 Introduction 6.2 Canonic Signed Digit Representation 6.3 Minimum Signed Digit Representation 6.4 Multipl ication by a Constant in a Signal Processing Algorithm 6.5 Optimized DFG Transformation 6.6 Fully Dedicated Architecture for Direct-form FIR Filter

6.6.1 6.6.2 6.6.3 6.6.4 6.6.5

Introduction Example: Five-coefficient Filter Transposed Direct-form FIR Filter Example: TDFArchitecture Hybrid FIR Filter Structure

6.7 Complexity Reduction 6.7.7 Sub-graph Sharing

198 198 203 203 209 209 215 217 217 221 221 221 221 222 222 223 224 230 231 232 234 234 235 237 238 240 243 243 247 251

253 253 254 255 255 256 261 261 262 269 272 276 277 277

X Contents

6.7.2 Common Sub-expression Elimination 279 6.7.3 Common Sub-expressions with Multiple Operands 283

6.8 Distributed Arithmetic 283 6.8.1 Basics 283 6.8.2 Example: FIR Filter Design 287 6.8.3 M-Parallel Sub-filter-based Design 291 6.8.4 DA Implementation without Look-up Tables 292

6.9 FFT Architecture using FIR Filter Structure 292 Exercises 297 References 299

7 Pipelining, Retiming, Look-ahead Transformation and Polyphase Decomposition 301 7.1 Introduction 301 7.2 Pipelining and Retiming 302

7.2.1 Basics 302 7.2.2 Cut-set Retiming 303 7.2.3 Retiming using the Delay Transfer Theorem 304 7.2.4 Pipelining and Retiming in a Feedforward System 304 7.2.5 Re-pipelirling: Pipelining using Feedforward Cut-set 304 7.2.6 Cut-set Retiming of a Direct-form FIR Filter 306 7.2.7 Pipelining using the Delay Transfer Theorem 309 7.2.8 Pipelining Optimized DFG 311 7.2.9 Pipelining Carry Propagate Adder 312 7.2.10 Retiming Support in Synthesis Tools 312 7.2.11 Mathematical Formulation of Retiming 312 7.2.12 Minimizing the Number of Registers and Critical Path Delay 314 7.2.13 Retiming with Shannon Decomposition 315 7.2.14 Peripheral Retiming 316

7.3 Digital Design of Feedback Systems 316 7.3.1 Definitions 316 7.3.2 Cut-set Retiming for a Feedback System 319 7.3.3 Shannon Decomposition to Reduce the IPB 320

7.4 C-slow Retiming 320 7.4.1 Basics 320 7.4.2 C-Slow for Block Processing 323 7.4.3 C-Slow for FPGAs and Time-multiplexed

Reconfigurable Design 323 7.4.4 C-Slow for an Instruction Set Processor 324

7.5 Look-ahead Transformation for IIR filters 324 7.6 Look-ahead Transformation for Generalized IIR Filters 326 7.7 Polyphase Structure for Decimation and Interpolation Applications 327 7.8 IIR Filter for Decimation and Interpolation 329 Exercises 336 References 340

Contents xi

Unfolding and Folding of Architectures 343 8.1 Introduction 343 8.2 Unfolding 344 8.3 Sampling Rate Considerations 344

8.3.1 Nyquist Sampling Theorem and Design Options 344 8.3.2 Software-defined Radio Architecture and Band-pass Sampling 345 8.3.3 A/D Converter Bandwidth and Band-pass Sampling 347

8.4 Unfolding Techniques 348 8.4.1 Loop Unrolling 348 8.4.2 Unfolding Transformation 349 8.4.3 Loop Unrolling for Mapping SW to HW 350 8.4.4 Unfolding to Maximize Use of a Compression Tree 352 8.4.5 Unfolding for Effective Use of FPGA Resources 353 8.4.6 Unfolding and Retiming in Feedback Designs 356

8.5 Folding Techniques 362 8.5.1 Definitions and the Folding Transformation 363 8.5.2 Folding Regular Structured DFGs 363 8.5.3 Folded Architectures for FFT Computation 366 8.5.4 Memory-based Folded FFT Processor 367 8.5.5 Systolic Folded Architecture 370

8.6 Mathematical Transformation for Folding 372 8.7 Algorithmic Transformation 376 Exercises 377 References 378

Designs based on Finite State Machines 381 9.1 Introduction 381 9.2 Examples of Time-shared Architecture Design 382

9.2.7 Bit-serial and Digit-serial Architectures 382 9.2.2 Sequential Architecture 383

9.3 Sequencing and Control 388 9.3.1 Finite State Machines 388 9.3.2 State Encoding: One-hot versus Binary Assignment 390 9.3.3 Mealy and Moore State Machine Designs 391 9.3.4 Mathematical Formulations 392 9.3.5 Coding Guidelines for Finite State Machines 392 9.3.6 SystemVerilog Support for FSM Coding 397

9.4 Algorithmic State Machine Representation 398 9.4.1 Basics 398 9.4.2 Example: Design of a Four-entry FIFO 399 9.4.3 Example: Design of an Instruction Dispatcher 401

9.5 FSM Optimization for Low Power and Area 408 9.6 Designing for Testability 409

9.6.1 Methodology 409 9.6.2 Coverage Metrics for Design Validation 410

9.7 Methods for Reducing Power Dissipation 411

xii Contents

9.7.1 Switching Power 411 9.7.2 Clock Gating Technique 412 9.7.3 FSM Decomposition 413

Exercises 415 References 419

10 Micro-programmed State Machines 421 10.1 Introduction 421 10.2 Micro-programmed Controller 422

10.2.1 Basics 422 10.2.2 Moore Micro-programmed State Machines 425 10.2.3 Example: UFO and FIFO 426

10.3 Counter4)ased State Machines 427 10.3.1 Basics 427 10.3.2 Loadable Counter-based State Machine 429 10.3.3 Counter-based FSM with Conditional Branching 430 10.3.4 Register-based Controllers 431 10.3.5 Register-based Machine with Parity Field 432 10.3.6 Example to Illustrate Complete Functionality 432

10.4 Subroutine Support 434 10.5 Nested Subroutine Support 435 10.6 Nested Loop Support 436 10.7 Examples 439

10.7.1 Design for Motion Estimation 439 10.7.2 Design of a Wavelet Processor 443

Exercises 446 References 451

11 Micro-programmed Adaptive Filtering Applications 453 11.1 Introduction 453 11.2 Adaptive Filter Configurations 453

11.2.1 System Identification 453 11.2.2 Inverse System Modeling 454 11.2.3 Acoustic Noise Cancellation 454 11.2.4 Linear Prediction 455

11.3 Adaptive Algorithms 455 11.3.1 Basics 455 11.3.2 Least Mean Square (LMS) Algorithm 456 11.3.3 Normalized LMS Algorithm 457 11.3.4 Block LMS 457

11.4 Channel Equalizer using NLMS 457 11.4.1 Theory 457 11.4.2 Example: NLMS Algorithm to Update Coefficients 458

11.5 Echo Canceller 463 11.5.1 Acoustic Echo Canceller 463 11.5.2 Line Echo Cancellation (LEC) 464

Contents

11.6 Adaptive Algorithms with Micro-programmed State Machines 464 11.6.1 Basics 464 11.6.2 Example: LEC Micro-coded Accelerator 465 11.6.3 Address Registers Arithmetic Л1А 11.6.4 Pipelining Options 478 11.6.5 Optional Support for Coefficient Update 479 11.6.6 Multi MAC Block Design Option 480 11.6.7 Compression Tree and Single CPA-based Design 480

Exercises 481 References 482

12 CORDIC-based DDFS Architectures 483 12.1 Introduction 483 12.2 Direct Digital Frequency Synthesizer 484 12.3 Design of a Basic DDFS 485 12.4 The CORDIC Algorithm 486

12.4.1 Introduction 486 12.4.2 CORDIC Algorithm for Hardware Implementation 489 12.4.3 Hardware Mapping 492 12.4.4 Time-shared Architecture 498 12.4.5 C-slowed Time-shared Architecture 501 12.4.6 Modified CORDIC Algorithm 502 12.4.7 Receding of Binary Representation as ±7 502

12.5 Hardware Mapping of Modified CORDIC Algorithm 506 12.5.1 Introduction 506 12.5.2 Hardware Optimization 510 12.5.3 Novel Optimal Hardware Design 514

Exercises 519 References 520

13 Digital Design of Communication Systems 521 13.1 Introduction 521 13.2 Top-level Design Options 522

13.2.1 Bus-based Design 522 13.2.2 Point-to-Point Design 523 13.2.3 Network-based Design 523 13.2.4 Hybrid Connectivity 524 13.2.5 Point-to-Point KPN-based Top-level Design 524 13.2.6 KPN with Shared Bus and DMA Controller 524 13.2.7 Network-on-Chip Top-level Design 527 13.2.8 Design of a Router for NoC 532 13.2.9 Run-time Reconfiguration 534 13.2.10 NoC for Software-defined Radio 535

13.3 Typical Digital Communication System 536 13.3.1 Source Encoding 536 13.3.2 Data Compression 536

xiv Contents

13.3.3 Encryption 541 13.3.4 Channel Coding 559 13.3.5 Framing 561 13.3.6 Modulation 562 13.3.7 Digital Up-conversion and Mixing 572 13.3.8 Front End of the Receiver 573

Exercises 574 References 577

Index 579