Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay...
-
Upload
arthur-hand -
Category
Documents
-
view
219 -
download
0
Transcript of Using Carry-Save Adders For Radix- 4, Can Be Used to Generate 3a – No Booth’s Slight Delay...
Using Carry-Save Adders
• For Radix- 4, Can Be Used to Generate 3a – No Booth’s• Slight Delay Penalty from CSA – 3 Gates
Upper Half P in Stored Carry
• For Radix-2, Better Use in Keeping Cumulative Product
in Redundant Form for First k -1 Cycles• Then Use a CPA in the Last Cycle
CSA With Booth Recoding
• Better Usage when Combined with Booth’s
Recoding
– Reduces Cycles by 50%
• Each Cycle Faster Due to CSA
• Sign of a, 2a Incorporated Directly in
Recoder/Selector Instead of Add/Subtract
Signal Generation
CSA Combined with Booth Recoding
Booth Recoder/Selector• Circuitry Shown on Following Slide
• Negative Multiples –a, -2a in 2’s Complement
• a, 2a Aligned at Right with Position i
• Must be Padded with i Zeros to Right
• Bitwise Complement (when –a, -2a Needed) Converts zeros to ones Followed by LSb add of 1 Converts Back to zeros
• Causes a Carry-in of 1 into Position i
• Can Ignore Positions 0 through i -1 (in neg. multiples) Insert carry-in directly (dot)
Booth Recoder – Selector Circuit
Radix-4 with CSA – No Booth
Radices > 4• Radix-8 (3 bits at a time-k/3 multiples) Requires 3-Level
CSA Tree
– Might as Well Use Radix-16 (4 bits at a time)
– Still 3-level tree with one more CSA
• MUXes Can Be Replaced with Booth Recoder/Selector
Circuits in Higher Radix Multipliers
• Can Continue to Increase Radix (256-8bits) Leading to
Wider Trees
• Tradeoff is Speed Versus Area
Radix-16 Multiplication
Classification of Multipliers
Twin-Beat Mult. with Radix-8 Booth Recoding
Full Tree Multipliers• All k PPs Produced Simultaneously
• Input to k-input Multioperand Tree
• Multiples of a (Binary, High-Radix or Recoded) Formed at
Top of Tree
• Multiple-Forming Circuits
– AND Gates (binary multiplier)
– radix-4 Booth (recoded multiplier)
• Tree Results in Product in Redundant Form
(2 Values – Carry-Store for Example)
• Final Product Formed With Converter
(Fast CPA for Exmaple)
General Parallel Multiplier
Tree Type Multiplier Classification
• Distinguished by Design of:1. Partial Product Forming Circuits (i.e., Booth, Hi-Rad, etc.)2. Reduction Tree Type3. Redundant-to-Binary Converter
• If Redundant Result in Carry-Save Form, Converter is
Just a CPA
• Could Use Other Redundant Adders Such as Signed
Binary (4:2 Compressors)
• High Radix Multipliers Lead to Fewer Values to
Accumulate– Sequential Design – Fewer Cycles– Parallel Design Smaller Tree– Tradeoff Tree Complexity Versus Multiple Forming Circuit
Wallace and Dadda Tree Multipliers
• Wallace – Combine Partial Products as Soon as
Possible
• Dadda – Maintain Critical Path Length (Tree
Depth) but Combine as Late as Possible
• Wallace – Fastest Possible Design Since
Typically Smaller CPA at End
• Dadda – Simpler Tree but Wider CPA at End
4 4 Example
• 16 AND Gates Used to Form xiaj Terms (dots)
1 2 3 4 3 2 1
Wallace Example
1 2 3 4 3 2 1
• 5 FAs, 3 HAs, 4-bit CPA
Dadda Examples
1 2 3 4 3 2 1
• 3 FAs, 3 HAs, 6-bit CPA
1 2 3 4 3 2 1
• 4 FAs, 2 HAs, 6-bit CPA
Trees in Numeric Representation
• Many Times Hybrid Approach Used to Find Smallest Width CPA
• MS Thesis Topic – Optimize Tree With Different Counter Types
Implementation Issues
• Logarithmic Depth Tree – Irregular Structure
• Design/Layout Difficult
• Various Length Signal Propagation Paths
• Hazards and Signal Skew
• Need Iterated Recursive Structures
• Automatic Synthesis and Layout
• Motivates Search for Alternative Reduction Tree
Structures
Other Tree Architectures
• Can Compose from Larger Counters, e.g. (7:2)
– Use “0” Inputs for Some
– Or Prune the Tree for Some
• Use “slices” – Example is (11:2) – Next Slide
– Can be Laid Out to Occupy Narrow Vertical
Slice and Replicated
– All Carries Produced in Level i Enter Level i+1
– Balanced Delay Tree Results
• 3 Columns – 1, 3, 5 FAs
• Can Expand from 11 to 18 – Append Col. of 7
(11:2) Tree Slice
Other Tree Blocks
• Converter Stage is Fast CPA • Can Also Use SBD• With SBD the Converter Stage is a Fast Subtractor
Array Multipliers
• Can Eliminate Top CSA With 0 Input• Can Replace 0 With y to Compute ax+y
Array Multipliers
• Tree is One-Sided
• Longest Delay is 4 CSA Plus k-bit CPA
• Slower than Wallace/Dadda Tree
• Regular Structure
– short wires in horiz., vert., diag. positions
– simple, efficient layout
– easily pipelined (latches after each CSA row)
Methods for Reducing Array Size
Reducing Array Size (cont.)
5 by 5 Array Multiplier (unsgnd)
Signed Array Multiplier
• Array with 2’s Complement
• Alternative is Pezaris Array with Different Cell
Types
• Need Array of AND Gates for Multiple Generation
• Critical Path is Main Diagonal then Ripple Thru
CPA
• Can skip “h” Cells Along Main Diag– lower right cell now has 4 inputs– move to “extra” input in second cell in diag.– less regular layout now but faster
5 by 5 Array Multiplier (signed)
5 by 5 Array Multiplier
• AND Gates Embedded inside FA Blocks
Pipelined Partial Tree Multiplier
Pipelined Array Multiplier