Dsp m.tech 64 Bit Mac Docx

download Dsp m.tech 64 Bit Mac Docx

of 52

Transcript of Dsp m.tech 64 Bit Mac Docx

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    1/52

    CHAPTER 1

    INTRODUCTION

    Recently there has been a trend to implement DSP functions using field

    programmable gate arrays (FPGAs). While application specific integrated circuits (ASICs) are

    the traditional solution to high performance applications, the high development costs and time-

    to-market factors prohibit the deployment of such solutions for certain cases. DSP processors

    offer high programmability, but the sequential execution nature of their architecture can

    adversely affect their throughput performance. As such, the reason for the rising popularity of the

    FPGA is due to the balance that FPGAs provide the designer in terms of flexibility, cost, and

    time-to-market. Digital filter structures, which are extensively used in applications such as

    speech processing, image and video processing, and telecommunications to name a few, are

    commonly implemented using FPGAs.

    In signal processing, there are many instances in which an input signal

    to a system contains extra unnecessary content or additional noise which can degrade the quality

    of the desired portion. In such cases we may remove or filter out the useless samples. For

    example, in the case of the telephone system, there is no reason to transmit very high frequenciessince most speech falls within the band of 400 to 3,400 Hz. Therefore, in this case, all

    frequencies above and below that band are filtered out. The frequency band between 400 and

    3,400 Hz, which isnt filtered out, is known as the passband, and the frequency band that is

    blocked out is known as the stop band.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    2/52

    1.3 FINITE IMPULSE RESPONSE :

    A finite impulse response (FIR) filter is a filter structure that can be used to implement almost

    any sort of frequency response digitally. An FIR filter is usually implemented by using a series

    of delays, multipliers, and adders to create the filter's output.

    Figure below shows the basic block diagram for an FIR filter of length N. The delays result in

    operating on prior input samples. The hkvalues are the coefficients used for multiplication, so

    that the output at time n is the summation of all the delayed samples multiplied by the

    appropriate coefficients.

    The difference equation that defines the output of an FIR filter in terms of its input is:

    where:

    x[n] is the input signal,

    y[n] is the output signal,

    biare the filter coefficients, and

    Nis the filter order anNth-order filter has (N+ 1) terms on the right-hand side; these

    are commonly referred to as taps.

    This equation can also be expressed as a convolution of the coefficient sequence biwith the input

    signal:

    That is, the filter output is a weighted sum of the current and a finite number of previous values

    of the input. Also the response of the filter depends upon the values of the filter coefficients and

    the input applied.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    3/52

    Figure : The logical structure of an FIR filter

    The process of selecting the filter's length and coefficients is called filter design. The goal is to

    set those parameters such that certain desired stopband and passband parameters will result from

    running the filter. Most engineers utilize a program such as MATLAB to do their filter design.

    But whatever tool is used, the results of the design effort should be the same:

    A frequency response plot, like the one shown in Figure 1, which verifies that the filter

    meets the desired specifications, including ripple and transition bandwidth, The filter's

    length and coefficients.

    The longer the filter (more taps), the more finely the response can be tuned.

    With the length, N, and coefficients, float h[N] = { ... }, decided upon, the implementation of the

    FIR filter is fairly straightforward. Listing 1 shows how it could be done in C. Running this code

    on a processor with a multiply-and-accumulate instruction (and a compiler that knows how to

    use it) is essential to achieving a large number of taps.

    1.4 Approach to design a FIR Filter :

    Filters are signal conditioners. Each functions by accepting an input signal, blocking prespecified

    frequency components, and passing the original signal minus those components to the output.

    For example, a typical phone line acts as a filter that limits frequencies to a range considerably

    smaller than the range of frequencies human beings can hear. That's why listening to CD-quality

    music over the phone is not as pleasing to the ear as listening to it directly.

    A digital filter takes a digital input, gives a digital output, and consists of digital components. In

    a typical digital filtering application, software running on a digital signal processor (DSP) reads

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    4/52

    input samples from an A/D converter, performs the mathematical manipulations dictated by

    theory for the required filter type, and outputs the result via a D/A converter.

    An analog filter, by contrast, operates directly on the analog inputs and is built entirely with

    analog components, such as resistors, capacitors, and inductors.

    There are many filter types, but the most common are lowpass, highpass, bandpass, and

    bandstop. A lowpass filter allows only low frequency signals (below some specified cutoff)

    through to its output, so it can be used to eliminate high frequencies. A lowpass filter is handy, in

    that regard, for limiting the uppermost range of frequencies in an audio signal; it's the type of

    filter that a phone line resembles.

    A highpass filter does just the opposite, by rejecting only frequency components below some

    threshold. An example highpass application is cutting out the audible 60Hz AC power "hum",which can be picked up as noise accompanying almost any signal in the U.S.

    The designer of a cell phone or any other sort of wireless transmitter would typically place an

    analog bandpass filter in its output RF stage, to ensure that only output signals within its narrow,

    government-authorized range of the frequency spectrum are transmitted.

    Engineers can use bandstop filters, which pass both low and high frequencies, to block a

    predefined range of frequencies in the middle.

    Frequency response

    Simple filters are usually defined by their responses to the individual frequency components that

    constitute the input signal. There are three different types of responses. A filter's response to

    different frequencies is characterized as passband, transition band, or stopband. The passband

    response is the filter's effect on frequency components that are passed through (mostly)

    unchanged.

    Frequencies within a filter's stopband are, by contrast, highly attenuated. The transition band

    represents frequencies in the middle, which may receive some attenuation but are not removed

    completely from the output signal.

    In below Figure which shows the frequency response of a lowpass filter, p is the passband

    ending frequency, sis the stopband beginning frequency, and Asis the amount of attenuation in

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    5/52

    the stopband. Frequencies between pand sfall within the transition band and are attenuated to

    some lesser degree.

    Figure: The response of a lowpass filter to various input frequencies

    Given these individual filter parameters, one of numerous filter design software packages can

    generate the required signal processing equations and coefficients for implementation on a DSP.

    Before we can talk about specific implementations, however, some additional terms need to be

    introduced.

    Ripple is usually specified as a peak-to-peak level in decibels. It describes how little or how

    much the filter's amplitude varies within a band. Smaller amounts of ripple represent more

    consistent response and are generally preferable.

    Transition bandwidth describes how quickly a filter transitions from a passband to a stopband, or

    vice versa. The more rapid this transition, the higher the transition bandwidth; and the more

    difficult the filter is to achieve. Though an almost instantaneous transition to full attenuation is

    typically desired, real-world filters don't often have such ideal frequency response curves.

    There is, however, a tradeoff between ripple and transition bandwidth, so that decreasing either

    will only serve to increase the other.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    6/52

    1.5 Properties of FIR Filter:

    An FIR filter has a number of useful properties which sometimes make it preferable to an infinite

    impulse response (IIR) filter. FIR filters:

    Are inherently stable. This is due to the fact that all the poles are located at the origin and

    thus are located within the unit circle.

    Require no feedback. This means that any rounding errors are not compounded by

    summed iterations. The same relative error occurs in each calculation. This also makes

    implementation simpler.

    They can be designed to be linear phase, which means the phase change is proportional to

    the frequency. This is usually desired for phase-sensitive applications, for example

    crossover filters, and mastering, where transparent filtering is adequate.

    The main disadvantage of FIR filters is that considerably more computation power is required

    compared with a similar IIR filter. This is especially true when low frequencies (relative to the

    sample rate) are to be affected by the filter.

    1.6 Filter design Techniques:To design a filter means to select the coefficients such that the system has specific

    characteristics. The required characteristics are stated in filter specifications. Most of the time

    filter specifications refer to the frequency response of the filter. There are different methods to

    find the coefficients from frequency specifications:

    1. Window design method

    2. Frequency Sampling method

    3.

    Weighted least squares design

    4. Minimax design

    5. Equiripple design.

    Software packages like MATLAB, GNU Octave, Scilab, and SciPy provide convenient ways to

    apply these different methods.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    7/52

    Some of the time, the filter specifications refer to the time-domain shape of the input signal the

    filter is expected to "recognize". The optimum matched filter is to sample that shape and use

    those samples directly as the coefficients of the filter -- giving the filter an impulse response that

    is the time-reverse of the expected input signal.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    8/52

    Chapter 2

    Design of High Performance 64 bit MAC Unit

    Introduction:

    A design of high performance 64 bit Multiplier-and-Accumulator (MAC) is implemented in this

    paper. MAC unit performs important operation in many of the digital signal processing (DSP)

    applications. The multiplier is designed using modified Wallace multiplier and the adder is done

    with carry save adder.

    MAC unit is an inevitable component in many digital signal processing (DSP) applications

    involving multiplications and/or accumulations. MAC unit is used for high performance digital

    signal processing systems. The DSP applications include filtering, convolution, and inner

    products. Most of digital signal processing methods use nonlinear functions such as discrete

    cosine transform (DCT) or discrete wavelet transforms (DWT). Because they are basically

    accomplished by repetitive application of multiplication and addition, the speed of the

    multiplication and addition arithmetic determines the execution speed and performance of the

    entire calculation. Multiplication-and-accumulate operations are typical for digital filters.

    Therefore, the functionality of the MAC unit enables high-speed filtering and other processing

    typical for DSP applications. Since the MAC unit operates completely independent of the CPU,

    it can process data separately and thereby reduce CPU load. The application like optical

    communication systems which is based on DSP, require extremely fast processing of huge

    amount of digital data. The Fast Fourier Transform (FFT) also requires addition and

    multiplication. 64 bit can handle larger bits and have more memory.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    9/52

    MAC OPERATION:

    The Multiplier-Accumulator (MAC) operation is the key operation not only in DSP applications

    but also in multimedia information processing and various other applications. As mentioned

    above, MAC unit consist of multiplier, adder and register/accumulator. In this paper, we used 64

    bit modified Wallace multiplier. The MAC inputs are obtained from the memory location and

    given to the multiplier block. This will be useful in 64 bit digital signal processor. The input

    which is being fed from the memory location is 64 bit. When the input is given to the multiplier

    it starts computing value for the given 64 bit input and hence the output will be 128 bits. The

    multiplier output is given as the input to carry save adder which performs addition. The function

    of the MAC unit is given by the following equation :

    F= Pi Qi (1)

    The output of carry save adder is 129 bit i.e. one bit is for the carry (128bits+ 1 bit). Then, the

    output is given to the accumulator register. The accumulator register used in this design is

    Parallel In Parallel Out (PIPO). Since the bits are huge and also carry save adder produces all the

    output values in parallel, PIPO register is used where the input bits are taken in parallel and

    output is taken in parallel. The output of the accumulator register is taken out or fed back as

    one of the input to the carry save adder.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    10/52

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    11/52

    MODIFIED WALL ACE MULTIPLIER:

    A modified Wallace multiplier is an efficient hardware implementation of digital circuit

    multiplying two integers. Generally in conventional Wallace multipliers many full adders and

    half adders are used in their reduction phase. Half adders do not reduce the number of partialproduct bits. Therefore, minirnizing the number of half adders used in a multiplier reduction will

    reduce the complexity. Hence, a modification to the Wallace reduction is done in which the

    delay is the same as for the conventional Wallace reduction. The modified reduction method

    greatly reduces the number of half adders with a very slight increase in the number of full adders.

    Reduced complexity Wallace multiplier reduction consists of three stages. First stage the N x N

    product matrix is formed and before the passing on to the second phase the product matrix is

    rearranged to take the shape of inverted pyramid. During the second phase the rearranged

    product matrix is grouped into non-overlapping group of three as shown in the figure single bit

    and two bits in the group will be passed on to the next stage and three bits are given to a full

    adder. The number of rows in each stage of the reduction phase is calculated by the formula

    ri+ 1= 2[ri/3]+rjmod3

    If ri mod3 = 0, then ri+ 1 = 2r/3

    If the value calculated from the above equation for number of rows in each stage in the second

    phase and the number of row that are found in each stage of the second phase does not match,

    only then the half adder will be used. The final product of the second stage will be in the height

    of two bits and passed on to the third stage. During the third stage the output of the second stage

    is given to the carry propagation adder to generate the final output.

    Thus, 64 bit modified Wallace multiplier is constructed and the total number of stages in the

    second phase is 10. As per the equation the number of row in each of the 10 stages was

    calculated and the use of half adders was restricted only to the 10th

    stage. The total number of

    half adders used in the second phase is 8 and the total number of full adders that was used during

    the second phase is slightly increased that in the conventional Wallace multiplier.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    12/52

    CARRY SAVE ADDER:

    In this design 128 bit carry save adder is used since the output of the multiplier is 128 bits (2N).

    The carry save adder minirnize the addition from 3 numbers to 2 numbers. The propagation

    delay is 3 gates despite of the number of bits. The carry save adder contains n full adders,

    computing a single sum and carries bit based mainly on the respective bits of the three input

    numbers. The entire sum can be calculated by shifting the carry sequence left by one place and

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    13/52

    then appending a 0 to most significant bit of the partial sum sequence. Now the partial sum

    sequence is added with ripple carry unit resulting in n + 1 bit value. The ripple carry unit refers

    to the process where the carryout of one stage is fed directly to the carry in of the next stage.

    This process is continued without adding any intermediate carry propagation. Since the

    representation of 128 bit carry save adder is infeasible , hence a typical 8 bit carry save adder is

    shown in the figure .Here we are computing the sum of two 128 bit binary numbers, then 128

    half adders at the first stage instead of 128 full adder. Therefore , carry save unit comprises of

    128 half adders, each of which computes single sum and carry bit based only on the

    corresponding bits of the two input numbers. If x and y are supposed to be two 128 bit numbers

    then it produces the partial products and carry as S and C respectively.

    Si = xi xoryi

    Ci = xi andyi

    During the addition of two numbers using a half adder, two ripple carry adder is used. This is due

    the fact that ripple carry adder cannot compute a sum bit without waiting for the previous carry

    bit to be produced, and hence the delay will be equal to that of n full adders. However a carry-

    save adder produces all the output values in parallel, resulting in the total computation time less

    than ripple carry adders. So, Parallel in Parallel out (PIPO) is used as an accumulator in the final

    stage.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    14/52

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    15/52

    CHAPTER-5

    INTRODUCTION TO VLSI DOMAIN

    4.1 VLSI DESIGN:

    The complexity of VLSI is being designed and used today makes the manual approach to

    design impractical. Design automation is the order of the day. With the rapid technological

    developments in the last two decades, the status of VLSI technology is characterized by the

    following

    A steady increase in the size and hence the functionality of the ICs:

    A steady reduction in feature size and hence increase in the speed of operation as well as gate

    or transistor density.

    A steady improvement in the predictability of circuit behavior.

    A steady increase in the variety and size of software tools for VLSI design.

    The above developments have resulted in a proliferation of approaches to VLSI design.

    4.2 HISTORY OF VLSI:

    VLSI began in the 1970s when complex semiconductor and communication technologies

    were being developed. The microprocessor is a VLSI device. The term is no longer as common

    as it once was, as chips have increased in complexity into the hundreds of millions of transistors.

    This is the field which involves packing more and more logic devices into smaller and

    smaller areas. VLSI circuits can now be put into a small space few millimeters across.. VLSI

    circuits are everywhere ... our computer, our car, our brand new state-of-the-art digital camera,the cell-phones, and what we have.

    4.3 VARIOUS INTEGRATIONS:

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    16/52

    Over time, millions, and today billions of transistors could be placed on one chip, and to

    make a good design became a task to be planned thoroughly.

    In the early days of integrated circuits, only a few transistors could be placed on a chip as the

    scale used was large because of the contemporary technology, and manufacturing yields were

    low by today's standards. As the degree of integration was small, the design was done easily.

    Over time, millions, and today billions of transistors could be placed on one chip, and to make a

    good design became a task to be planned thoroughly.

    4.3.1 SSI Technology:

    The first integrated circuits contained only a few transistors. Called "small-scale

    integration" (SSI), digital circuits containing transistors numbering in the tens provided a few

    logic gates for example, while early linear ICs such as the Plessey SL201 or the Philips TAA320

    had as few as two transistors. The term Large Scale Integration was first used by IBM scientist

    Rolf Landauer when describing the theoretical concept from there came the terms for SSI, MSI,

    VLSI, and ULSI.

    4.3.2 MSI Technology:

    The next step in the development of integrated circuits, taken in the late 1960s, introduced

    devices which contained hundreds of transistors on each chip, called "medium-scale integration" (MSI).

    They were attractive economically because while they cost little more to produce than SSI

    devices, they allowed more complex systems to be produced using smaller circuit boards, less assembly

    work (because of fewer separate components), and a number of other advantages.

    4.3.3LSI Technology:

    Further development, driven by the same economic factors, led to "large-scale

    integration" (LSI) in the mid 1970s, with tens of thousands of transistors per chip.

    Integrated circuits such as 1K-bit RAMs, calculator chips, and the first microprocessors, thatbegan to be manufactured in moderate quantities in the early 1970s, had under 4000 transistors.

    True LSI circuits, approaching 10,000 transistors, began to be produced around 1974, for

    computer main memories and second-generation microprocessors.

    4.3.4 VLSI:

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    17/52

    Final step in the development process, starting in the 1980s and continuing through the

    present, was in the early 1980s, and continues beyond several billion transistors as of 2009.In

    1986 the first one megabit RAM chips were introduced, which contained more than one million

    transistors. Microprocessor chips passed the million transistor mark in 1989 and the billion

    transistor mark in 2005.The trend continues largely unabated, with chips introduced in 2007

    containing tens of billions of memory transistors.

    VLSI DESIGN FLOW:

    Fig 4.1 vlsi design flow

    Start

    Design Entity

    Pre layout

    SimulationLogic Synthesis

    System

    Partitionin

    Pre layoutSimulation

    Floor Planning

    Placement

    Circuit ExtractionRouting

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    18/52

    4.4 ULSI, WSI, SOC and 3D-IC:

    To reflect further growth of the complexity, the term ULSI that stands for "ultra-large-scale

    integration" was proposed for chips of complexity of more than 1 million transistors. Wafer-scaleintegration(WSI) is a system of building very-large integrated circuits that uses an entire silicon wafer to

    produce a single "super-chip". Through a combination of large size and reduced packaging.

    A system-on-a-chip ( SOC) is an integrated circuit in which all the components needed for a

    computer or other system are included on a single chip. The design of such a device can be complex and

    costly, and building disparate components on a single piece of silicon may compromise the efficiency of

    some elements. However, these drawbacks are offset by lower manufacturing and assembly costs and by

    a greatly reduced power budget: because signals among the components are kept on-die, much less power

    is required.

    Three-dimensional integrated circuit (3D-IC) has two or more layers of active electronic

    components that are integrated both vertically and horizontally into a single circuit, &less power

    consumption.

    4.5 VLSI DESIGN FLOW AND THEIR DESCRIPTION:

    The design at the behavioral level is to be elaborated in terms of known and

    acknowledged functional blocks. It forms the next detailed level of design description. Once

    again the design is to be tested through simulation and iteratively corrected for errors. The

    elaboration can be continued one or two steps further. It leads to a detailed design description in

    terms of logic gates and transistor switches.

    Optimization

    The circuit at the gate level in terms of the gates and flip-flops can be redundant in

    nature. The same can be minimized with the help of minimization tools. The step is not shown

    separately in the figure. The minimized logical design is converted to a circuit in terms of the

    switch level cells from standard libraries provided by the foundries. The cell based designgenerated by the tool is the last step in the logical design process; it forms the input to the first

    level of physical design.

    Simulation

    The design descriptions are tested for their functionality at every level behavioral, data

    flow, and gate. One has to check here whether all the functions are carried out as expected and

    http://en.wikipedia.org/wiki/Wafer-scale_integrationhttp://en.wikipedia.org/wiki/Wafer-scale_integrationhttp://en.wikipedia.org/wiki/Wafer-scale_integrationhttp://en.wikipedia.org/wiki/System-on-a-chiphttp://en.wikipedia.org/wiki/System-on-a-chiphttp://en.wikipedia.org/wiki/Three-dimensional_integrated_circuithttp://en.wikipedia.org/wiki/Three-dimensional_integrated_circuithttp://en.wikipedia.org/wiki/Three-dimensional_integrated_circuithttp://en.wikipedia.org/wiki/System-on-a-chiphttp://en.wikipedia.org/wiki/Wafer-scale_integrationhttp://en.wikipedia.org/wiki/Wafer-scale_integration
  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    19/52

    rectify them. All such activities are carried out by the simulation tool. The tool also has an editor

    to carry out any corrections to the source code. Simulation involves testing the design for all its

    functions, functional sequences, timing constraints, and specifications. Normally testing and

    simulation at all the levels behavioral to switch level are carried out by a single tool; the

    same is identified as scopeof simulation tool in Figure 4.2.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    20/52

    Fig 4:2 scope of simulation tool

    4.6 Synthesis

    With the availability of design at the gate (switch) level, the logical design is complete.

    The corresponding circuit hardware realization is carried out by a synthesis tool. Two common

    approaches are as follows:

    The circuit is realized through an FPGA. The gate level design description is the starting point

    for the synthesis here. The FPGA vendors provide an interface to the synthesis tool. Through the

    interface the gate level design is realized as a final circuit. With many synthesis tools, one can

    directly use the design description at the data flow level itself to realize the final circuit through

    an FPGA. The FPGA route is attractive for limited volume production or a fast development

    cycle.

    The circuit is realized as an ASIC. A typical ASIC vendor will have his own library of basic

    components like elementary gates and flip-flops. Eventually the circuit is to be realized by

    selecting such components and interconnecting them conforming to the required design. This

    constitutes the physical design. Being an elaborate and costly process, a physical design may call

    for an intermediate functional verification through the FPGA route. The circuit realized through

    the FPGA is tested as a prototype. It provides another opportunity for testing the design closer to

    the final circuit.

    Physical Design

    A fully tested and error-free design at the switch level can be the starting point for a

    physical design [Baker & Boyce, Wolf]. It is to be realized as the final circuit using (typically) a

    million components in the foundrys library. The step-by-step activities in the process are

    described briefly as follows:

    System partitioning: The design is partitioned into convenient compartments or functional

    blocks. Often it would have been done at an earlier stage itself and the software design prepared

    in terms of such blocks. Interconnection of the blocks is part of the partition process.

    Floor planning: The positions of the partitioned blocks are planned and the blocks are arranged

    accordingly. The procedure is analogous to the planning and arrangement of domestic furniture

    in a residence. Blocks with I/O pins are kept close to the periphery; those which interact

    frequently or through a large number of interconnections are kept close together, and so on.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    21/52

    Partitioning and floor planning may have to be carried out and refined iteratively to yield best

    results.

    Placement: The selected components from the ASIC library are placed in position on the

    Silicon floor. It is done with each of the blocks above.

    Routing: The components placed as described above are to be interconnected to the rest of the

    block: It is done with each of the blocks by suitably routing the interconnects. Once the routing is

    complete, the physical design cam is taken as complete. The final mask for the design can be

    made at this stage and the ASIC manufactured in the foundry.

    Post Layout Simulation

    Once the placement and routing are completed, the performance specifications like

    silicon area, power consumed, path delays, etc., can be computed. Equivalent circuit can be

    extracted at the component level and performance analysis carried out. This constitutes the final

    stage called verification. One may have to go through the placement and rou ting activity once

    again to improve performance.

    Critical Subsystems

    The design may have critical subsystems. Their performance may be crucial to the overall

    performance; in other words, to improve the system performance substantially, one may have to

    design such subsystems afresh. The design here may imply redefinition of the basic feature size

    of the component, component design, placement of components, or routing done separately and

    specifically for the subsystem. A set of masks used in the foundry may have to be done afresh for

    the purpose.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    22/52

    CHAPTER 6

    TOOLS AND HDL USED

    5.1 ROLE OF HDL

    An HDL provides the framework for the complete logical design of the ASIC. All the

    activities coming under the purview of an HDL are shown enclosed in bold dotted lines . Verilog

    and VHDL are the two most commonly used HDLs today. Both have constructs with which the

    design can be fully described at all the levels. There are additional constructs available to

    facilitate setting up of the test bench, spelling out test vectors for them and observing the

    outputs from the designed unit.

    IEEE has brought out Standards for the HDLs, and the software tools conform to them.

    Verilog as an HDL was introduced by Cadence Design Systems; they placed it into the public

    domain in 1990. It was established as a formal IEEE Standard in 1995. The revised version has

    been brought out in 2001. However, most of the simulation tools available today conform only to

    the 1995 version of the standard.VHDL used by a substantial number of the VLSI designerstoday is the used in this project for modeling the design.

    We have used Xilinx ISE 9.2i for simulation and synthesis purposes. We implemented

    the prescribed design in VHDL, a famous Industry and IEEE standard HDL.

    5.2 Different Versions of Verilog

    o Verilog-95

    o

    Verilog 2001

    o Verilog 2005

    o SystemVerilog

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    23/52

    5.3 NEEDS OF (VERILOG)HDL

    o

    Interoperability.

    o Technology independence.

    o Design reuse.

    o

    Several levels of abstraction.

    o Readability.

    o Standard language.

    o Widely supported.

    5.4 BRIEF HISTORY

    o Verilog was invented by Phil Moorby andPrabhu Goel during the winter of 1983/1984 at

    Automated Integrated Design Systems (later renamed to Gateway Design Automation)

    o In 1985 it used as hardware modeling language.

    o Gateway Design Automation was later purchased by Cadence Design Systems in 1990..

    o Cadence transferred Verilog into the public domain under the Open Verilog International

    (OVI) organization.

    o IEEE Standard 1364-1995, commonly referred to as Verilog-95.

    5.4.1 Related Standards

    o Verilog-95 doesnt support (2's complement) signed nets and variables. To perform

    signed-operations using awkward bit-level manipulations.

    o

    rVerilog-2001 can be more succinctly described by one of the built-in operators: +, -, /, *,>>>. A generate/endgenerate construct.

    o SystemVerilog is a superset of Verilog-2005, with many new features and capabilities to

    aid design-verification and design-modeling.

    http://en.wikipedia.org/w/index.php?title=Prabhu_Goel&action=edit&redlink=1http://en.wikipedia.org/w/index.php?title=Prabhu_Goel&action=edit&redlink=1
  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    24/52

    5.5 VERILOG FEATURES

    o Case sensitive.

    o Verilog support concarancy and sequenshality.

    o Verilog syntaxes similar to the C-programming syntaxes.

    o A Verilog design consists of a hierarchy of modules

    5.6 LEVELS OF ABSTRACTIONVerilog supports many possible styles of design description, which differ primarily in

    how closely they relate to the HW.

    It is possible to describe a circuit in a number of ways.

    Switch level

    Gate level.

    Data flow level.

    Behavioral level.

    Switch level description

    This is the lowest level of abstraction provided by verilog. A module can be implemented interms

    Switches (PMOS and NMOS)

    storage nodes.

    Gate level description

    The module is implemented in terms of logic gates.

    Design at this level is similar to describing a design in terms of logic gate levels.

    For large circuits, a low-level description quickly becomes impractical.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    25/52

    Dataflow level Description

    Circuit is described in terms of how data moves through the system.

    In the dataflow style you described how information flows between registers in the

    system.

    The combinational of is described at a relatively high level, the placement and operation

    register is specified quite precisely.

    Fig 5.1.Data Flow Of Verilog Description

    The behavior of the system over the time is defined by registers.

    The lower level descriptions must be created or obtained.

    The behavioral description can be provided in the form of subprograms(functions or

    procedures).

    Behavioral level Description

    Circuit is described in terms of its operation over time.

    Representation might include, e.g., state diagram ,timing diagrams and algorithmic

    descriptions.

    The concept of time may be expressed precisely using delays(e.g., A=B# 10).

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    26/52

    If no actual delay is used, order of sequential operations is defined.

    In the lower level of abstraction (e.g., RTL) synthesis tools ignore detailed timing

    specifications.

    The actual timing results depend on implementation technology and efficiency of

    synthesis tools.

    There are few tools for behavioral synthesis.

    General format:

    Always @ [(sensitivity list)]

    Always _declarative_part

    Begin

    Always _statements

    [wait_statement]

    End

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    27/52

    CHAPTER 6SOFTWARE TOOLS

    6.1 SOFTWARE TOOL-XILINX:

    Xilinx ISEis a software tool produced by Xilinx for synthesis and analysis of HDL

    designs, which enables the developer to synthesize ("compile") their designs, perform timing

    analysis,examineRTLdiagrams, simulate a design's reaction to different stimuli, and configure

    the target device with theprogrammer.

    Xilinx was founded in 1984 by two semiconductorengineers,Ross Freeman andBernard

    Vonderschmitt,who were both working forintegrated circuit and solid-state device manufacturer

    Zilog Corp.

    While working for Zilog, Freeman wanted to create chips that acted like a blank tape,

    allowing users to program the technology themselves. At the time, the concept was paradigm-

    changing. "The concept required lots oftransistors and, at that time, transistors were considered

    extremely preciouspeople thought that Ross's idea was pretty far out", said Xilinx Fellow Bill

    Carter, who when hired in 1984 as the first IC designer was the company's eighth employee.

    Xilinx is a software tool, which is used to run the programs in VHDL language. It has

    various versions like Xilinx 92.1, Xilinx 10.1, Xilinx 10.5 etc. Xilinx has various pre-defined

    libraries ,packages.

    6.2 VERSION 9.2I:

    New Device Support.

    This release supports the new Spartan- 3A DSP family.

    New Software Features.

    Following are the new features in this release.

    Operating System Support:

    Support for Windows Vista Business 32-bit operating system.

    This operating system is supported, but has had limited testing.

    http://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Xilinxhttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Logic_synthesishttp://en.wikipedia.org/wiki/Logic_synthesishttp://en.wikipedia.org/wiki/Static_timing_analysishttp://en.wikipedia.org/wiki/Static_timing_analysishttp://en.wikipedia.org/wiki/Static_timing_analysishttp://en.wikipedia.org/wiki/Register_transfer_levelhttp://en.wikipedia.org/wiki/Register_transfer_levelhttp://en.wikipedia.org/wiki/Register_transfer_levelhttp://en.wikipedia.org/wiki/Programmer_(hardware)http://en.wikipedia.org/wiki/Programmer_(hardware)http://en.wikipedia.org/wiki/Programmer_(hardware)http://en.wikipedia.org/wiki/Engineershttp://en.wikipedia.org/wiki/Ross_Freemanhttp://en.wikipedia.org/wiki/Bernard_Vonderschmitthttp://en.wikipedia.org/wiki/Bernard_Vonderschmitthttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Ziloghttp://en.wikipedia.org/wiki/Paradigmhttp://en.wikipedia.org/wiki/Transistorshttp://en.wikipedia.org/wiki/Transistorshttp://en.wikipedia.org/wiki/Paradigmhttp://en.wikipedia.org/wiki/Ziloghttp://en.wikipedia.org/wiki/Integrated_circuithttp://en.wikipedia.org/wiki/Bernard_Vonderschmitthttp://en.wikipedia.org/wiki/Bernard_Vonderschmitthttp://en.wikipedia.org/wiki/Ross_Freemanhttp://en.wikipedia.org/wiki/Engineershttp://en.wikipedia.org/wiki/Programmer_(hardware)http://en.wikipedia.org/wiki/Register_transfer_levelhttp://en.wikipedia.org/wiki/Static_timing_analysishttp://en.wikipedia.org/wiki/Static_timing_analysishttp://en.wikipedia.org/wiki/Logic_synthesishttp://en.wikipedia.org/wiki/Hardware_description_languagehttp://en.wikipedia.org/wiki/Xilinx
  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    28/52

    Support for Windows XP Professional 64-bit operating system

    Support for Red Hat Enterprise WS 5.0 32-bit and 64-bit operating system. This operating

    system is supported, but has had limited testing.

    WHY XILINX ONLY?

    We have many software tools to run the VHDL programs like cadence .But compared to all

    software tools Xilinx is cost effective.

    6.3: A BRIEF TUTORIAL: IMPLEMENTING VHDL DESIGNS USING XILINX ISE.

    This tutorial shows how to create, implemented, simulate and synthesis VHDL designs

    for implemented in FPGA chips using Xilinx ISE 9.2i and Model Sim : Xilinx Edition III v6.2g.

    1. Launch Xilinx ISE from either the shortcut on your desktop or from your start menu

    under programs ->Xilinx ISE 9.2i -> Project Navigator.

    2.

    Start a new project by clicking File -> New Project..

    3. In the resulting window, verify the Top-Level Source Type is VHDL. Change the

    Project Location to a suitable directory and give it whatever name you

    choose,e.g.lab3.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    29/52

    4. The next window shows the details of the project and the target chip. We will be

    synthesizing designs into real chips so it is important to match the target chip with the

    particular board/chip you will be using. Beginning labs will be done in a Spartan 2E

    XC2S200E chip that comes in a PQ208 package with a spread grade of 6 as shown.

    5. Since we are starting a new design the text couple of pop-up windows arent relevant, just

    click Next and Next and Finish.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    30/52

    6. You should now be in the main Project Navigator window. Select Project -> New

    Source. From the menu.

    7. In the resulting pop-up window specify a VHDL Module source and give the file a name.

    I tend to just use the same name as the project itself, e.g. Lab 3. Click Next.

    8. The next pop-up window allows you to specify your inputs and outputs through the

    Wizard if you so desire. In this tutorial we will build a 2*1 multiplexer so we can specify

    the specify the inputs and outputs as shown below. Here, the default entity and

    architecture names have also been changed. Once all inputs and outputs are entered click

    Next and click Finish.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    31/52

    9.

    You can see that the wizard has used STD_LOGIC as the default type for your signalsand also filled in the basic entity and architecture details for you.

    10.Now you can fill in the rest of your code for your design. In this case, we can o the

    multiplexer as shown below. Make sure to frequently save your code.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    32/52

    11.Once the code is entered we can proceed with a simulation of the design by click on the

    simulation by setting source as Behavioral mode before going to simulation once check

    the syntax.

    12.Then we can get simulation output and then we want synthesis report just change the

    source into synthesis we can get open a small window just click on the that and we

    getting synthesis report an RTL schematic diagram and technology schematicdiagram.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    33/52

    CHAPTER-7

    HARDWARE TOOLS

    A field-programmable gate array (FPGA) is a semiconductor device that can be configured by the

    customer or designer after manufacturinghence the name "field-programmable". FPGAs are

    programmed using a logic circuit diagram or a source code in a hardware description language (HDL) to

    specify how the chip will work. They can be used to implement any logical function that an application-

    specific integrated circuit (ASIC) could perform, but the ability to update the functionality after shipping

    offers advantages for many applications.

    FPGAs contain programmable logic components called "logic blocks", and a hierarchy of

    reconfigurable interconnects that allow the blocks to be "wired together"somewhat like a one-chip

    programmable breadboard. Logic blocks can be configured to perform complex combinational functions,

    or merely simple logic gates like AND and XOR. In most FPGAs, the logic blocks also include memory

    elements, which may be simple flip-flops or more complete blocks of memory.

    7.1 HISTORY

    The FPGA industry sprouted from programmable read only memory (PROM) and programmable

    logic devices (PLDs). PROMs and PLDs both had the option of being programmed in batches in a factory

    or in the field (field programmable), however programmable logic was hard-wired between logic gates.

    Xilinx Co-Founders, Ross Freeman and Bernard Vonderschmitt, invented the first commercially

    viable field programmable gate array in 1985the XC2064. The XC2064 had programmable gates and

    programmable interconnects between gates, the beginnings of a new technology and market. The

    XC2064 boasted a mere 64 configurable logic blocks (CLBs), with two 3-input lookup tables (LUTs). More

    than 20 years later, Freeman was entered into the National Inventor's Hall of Fame for his invention.

    7.2 ARCHITECTURE

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    34/52

    The most common FPGA architecture consists of an array of configurable logic blocks (CLBs), I/O

    pads, and routing channels. Generally, all the routing channels have the same width (number of wires).

    Multiple I/O pads may fit into the height of one row or the width of one column in the array.

    An application circuit must be mapped into an FPGA with adequate resources. While the

    number of CLBs and I/Os required is easily determined from the design, the number of routing tracks

    needed may vary considerably even among designs with the same amount of logic.

    Fig 7.1 Internal Structure of FPGA

    7.3 APPLICATIONS

    Applications of FPGAs include digital signal processing, software-defined radio, aerospace and

    defense systems, ASIC prototyping, medical imaging, computer vision, speech recognition, cryptography,

    bioinformatics, computer hardware emulation, radio astronomy and a growing range of other areas.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    35/52

    SPECIFICATIONS OF SPARTAN-3 FPGA

    Figure 4.2Imageof Spartan-3E FPGAkit

    The Spartan-3 family of Field-Programmable Gate Arrays is specifically designed to

    meet the needs of high volume, cost-sensitive consumer electronic applications. The eight-

    member family offers densities ranging from 50,000 to five million system gates.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    36/52

    The Spartan-3 family builds on the success of the earlier Spartan-IIE family by increasing

    the amount of logic resources, the capacity of internal RAM, the total number of I/Os, and the

    overall level of performance as well as by improving clock management functions. Numerous

    enhancements derive from the Virtex-II platform technology. These Spartan-3 FPGA

    enhancements, combined with advanced process technology, deliver more functionality and

    bandwidth per dollar than was previously possible, setting new standards in the programmable

    logic industry.

    Because of their exceptionally low cost, Spartan-3 FPGAs are ideally suited to a wide

    range of consumer electronics applications, including broadband access, home networking,

    display/projection and digital television equipment.

    The Spartan-3 family is a superior alternative to mask programmed ASICs. FPGAs avoid

    the high initial cost, the lengthy development cycles, and the inherent inflexibility of

    conventional ASICs. Also, FPGA programmability permits design upgrades in the field with no

    hardware replacement necessary, an impossibility with ASICs.

    4.2.2 FEATURES OF SPARTAN 3E Low-cost, high-performance logic solution for high-volume, consumer-oriented applications

    With Densities up to 74,880 logic cells

    SelectIO interface signaling

    o Up to 633 I/O pins

    o 622+ Mb/s data transfer rate per I/O

    o

    18 single-ended signal standards

    o

    8 differential I/O standards including LVDS, RSDS

    o Termination by Digitally Controlled Impedance

    o

    Signal swing ranging from 1.14V to 3.465V

    o Double Data Rate (DDR) support

    o

    DDR, DDR2 SDRAM supportup to 333 Mbps

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    37/52

    Logic resources

    o Abundant logic cells with shift register capability

    o

    -Wide, fast multiplexers

    o Fast look-ahead carry logic

    o

    Dedicated 18 x 18 multipliers

    o JTAG logic compatible with IEEE 1149.1/1532

    SelectRAM hierarchical memory

    o

    Up to 1,872 Kbits of total block RAM

    o Up to 520 Kbits of total distributed RAM

    o

    Digital Clock Manager (up to four DCMs)o

    Clock skew elimination

    o Frequency synthesis

    o

    High resolution phase shifting

    o Eight global clock lines and abundant routing

    o Fully supported by Xilinx ISE and WebPACK software development systems

    4.2.3 ARCHITECTURAL OVER VIEW OF SPARTAN 3E

    Figure 4.3Architectural overview of Spartan-3E FPGAkit

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    38/52

    The Spartan-3 family architecture consists of five fundamental programmable functional

    elements:

    Configurable Logic Blocks (CLBs) contain RAM-based Look-Up Tables (LUTs) to implement logic

    and storage elements that can be used as flip-flops or latches. CLBs can be programmed to

    perform a wide variety of logical functions as well as to store data.

    Input/Output Blocks (IOBs) control the flow of data between the I/O pins and the internal logic

    of the device. Each IOB supports bidirectional data flow plus 3-state operation.

    Digital Clock Manager (DCM) blocks provide self-calibrating, fully digital solutions for

    distributing, delaying, multiplying, dividing, and phase shifting clock signals.

    Block RAM provides data storage in the form of 18-Kbit dual-port blocks.

    Multiplier blocks accept two 18-bit binary numbers as inputs and calculate the product.

    7.4 A BRIEF TUTORIAL: SOURCE CODE IS DUMPED INTO FPGA.

    1. Now lets look at the flow for actually synthesizing and implementing the design in the

    FPGA prototyping boards. Close ModelSim and go back to the Xilinx ISE environment.

    In the Sources subwindow change the selection in the dropdown box from Behavioral

    Simulation to Synthesis/Implementation.

    2. To properly synthesize the design we need to specify which pins on the chip all the inputs

    and outputs should be assigned to. In general of course we could assign the signals just

    about any way we want. Since we will be using specific prototype boards, we need to

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    39/52

    make sure our pins assignments match the switches, buttons, and LEDs so we can test our

    design. We will be starting with Digilab 2E boards that are connected to Digilab DIO2

    input/output boards. The I/O board has already been programmed and configured to have

    the following connections:

    3. To assign specific pins, expand the User Constraints selection under the Process

    subwindow and double-click on Assign Package Pins.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    40/52

    4. A new application called Xilinx PACE should be launched.

    a. In the Design Object List subwindow you should see a listing of all the input and

    output signals from our design.

    Here is where we can specify which pin locations we want for each signal. Simply

    enter the pins numbers from the tables shown in Step 19 above, making sure to use a

    capital letter P in front of the pin specification. Lets assign our signals as A

    P163 (Switch 1)

    I0P164 (Switch 2)

    I1P166 (Switch 3)

    YP149 (LED 0)

    Once all pins have been assigned, save your constraints by selecting FileSave

    from the menu bar and exit Xilinx Pace.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    41/52

    5. Back in the Xilinx ISE. In the Process subwindow double-click on the SynthesizeXST

    selection and wait for the process to complete. Then double-click on the Implement

    Design selection and wait for the process to complete. Then double-click on the

    Generate Programming File selection and wait for the process to complete. If all goes

    well, you should have green checks marks for the whole design.

    6. There is a lot of information you can obtain through all of the objects listed in the

    Processes subwindow, but let us proceed to downloading the design onto the prototyping

    board for testing. First make sure the prototyping board is connected to the PC and has

    power on. Also make sure the slide switch on the FPGA board by the parallel port is set

    to JTAG (as opposed to Port). Then select Configure Device (iMPACT) underneath

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    42/52

    the Generate Programming File selection. You should the following window

    7. Now you need to specify which bitstream file to use to configure the device. For this

    tutorial we want to select the mux.bit file and click Open.

    You will probably get the message below. Just click Yes.

    You will also get a warning message saying the JTAG clock was updated in the bitstream

    file (which is good) so just click OK. There is a way to correct for that in the original

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    43/52

    design flow, but Xilinx automatically catches it here so I dont usually bother.

    8. You should now see the Spartan XC2S200E chip in the main window. Right click on the

    chip to prepare for downloading the bitstream file.

    Select Program on the resulting window.

    9. Click OK.

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    44/52

    If all goes well you should get the Programming Succeeded message

    10. Now just test and verify your design on the actual FPGA board!

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    45/52

    SIMULATION RESULTS

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    46/52

    SYNTHESIS REPORT

    =====================================================================

    ====

    * Final Report *

    =========================================================================

    Final Results

    RTL Top Level Output File Name : topmodule_mac.ngrTop Level Output File Name : topmodule_mac

    Output Format : NGC

    Optimization Goal : SpeedKeep Hierarchy : No

    Design Statistics

    # IOs : 10

    Cell Usage :

    # BELS : 45

    # GND : 1# INV : 2

    # LUT2 : 2

    # LUT3 : 6# LUT3_D : 2

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    47/52

    # LUT3_L : 2

    # LUT4 : 23

    # LUT4_L : 5# MUXF5 : 2

    # FlipFlops/Latches : 19

    # FDCE : 19# Clock Buffers : 1# BUFGP : 1

    # IO Buffers : 9

    # IBUF : 1# OBUF : 8

    =====================================================================

    ====

    Device utilization summary:

    ---------------------------

    Selected Device : 3s500efg320-5

    Number of Slices: 21 out of 4656 0%

    Number of Slice Flip Flops: 19 out of 9312 0%Number of 4 input LUTs: 42 out of 9312 0%

    Number of IOs: 10

    Number of bonded IOBs: 10 out of 232 4%Number of GCLKs: 1 out of 24 4%

    ---------------------------

    Partition Resource Summary:---------------------------

    No Partitions were found in this design.

    ---------------------------

    =====================================================================

    ====

    TIMING REPORT

    NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.

    FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

    GENERATED AFTER PLACE-and-ROUTE.

    Clock Information:

    -----------------------------------------------------+------------------------+-------+

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    48/52

    Clock Signal | Clock buffer(FF name) | Load |

    -----------------------------------+------------------------+-------+

    clk | BUFGP | 19 |-----------------------------------+------------------------+-------+

    Asynchronous Control Signals Information:---------------------------------------------------------------------------+------------------------+-------+

    Control Signal | Buffer(FF name) | Load |

    -----------------------------------+------------------------+-------+rst | IBUF | 19 |

    -----------------------------------+------------------------+-------+

    Timing Summary:---------------

    Speed Grade: -5

    Minimum period: 4.588ns (Maximum Frequency: 217.958MHz)

    Minimum input arrival time before clock: No path found

    Maximum output required time after clock: 8.868ns

    Maximum combinational path delay: No path found

    Timing Detail:

    --------------All values displayed in nanoseconds (ns)

    =====================================================================

    ====Timing constraint: Default period analysis for Clock 'clk'

    Clock period: 4.588ns (frequency: 217.958MHz)

    Total number of paths / destination ports: 148 / 38-------------------------------------------------------------------------

    Delay: 4.588ns (Levels of Logic = 4)

    Source: accumulator_4 (FF)Destination: accumulator_7 (FF)

    Source Clock: clk rising

    Destination Clock: clk rising

    Data Path: accumulator_4 to accumulator_7

    Gate Net

    Cell:in->out fanout Delay Delay Logical Name (Net Name)

    ---------------------------------------- ------------FDCE:C->Q 11 0.514 0.823 accumulator_4 (accumulator_4)

    LUT3:I2->O 1 0.612 0.509 csa/st2[5].fa2/cout1_SW3_SW1_SW0 (N31)

    LUT4:I0->O 1 0.612 0.000 csa/st2[5].fa2/cout1_SW3_F (N53)MUXF5:I0->O 1 0.278 0.360 csa/st2[5].fa2/cout1_SW3 (N11)

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    49/52

    LUT4:I3->O 2 0.612 0.000 csa/st2[7].fa2/Mxor_s_xo1 (acc)

    FDCE:D 0.268 accumulator_7

    ----------------------------------------Total 4.588ns (2.896ns logic, 1.692ns route)

    (63.1% logic, 36.9% route)

    =========================================================================

    Timing constraint: Default OFFSET OUT AFTER for Clock 'clk'

    Total number of paths / destination ports: 117 / 8-------------------------------------------------------------------------

    Offset: 8.868ns (Levels of Logic = 6)

    Source: accumulator_4 (FF)

    Destination: fpgaout (PAD)Source Clock: clk rising

    Data Path: accumulator_4 to fpgaoutGate Net

    Cell:in->out fanout Delay Delay Logical Name (Net Name)

    ---------------------------------------- ------------

    FDCE:C->Q 11 0.514 0.823 accumulator_4 (accumulator_4)LUT3:I2->O 1 0.612 0.509 csa/st2[5].fa2/cout1_SW3_SW1_SW0 (N31)

    LUT4:I0->O 1 0.612 0.000 csa/st2[5].fa2/cout1_SW3_F (N53)

    MUXF5:I0->O 1 0.278 0.360 csa/st2[5].fa2/cout1_SW3 (N11)LUT4:I3->O 2 0.612 0.410 csa/st2[7].fa2/Mxor_s_xo1 (acc)

    LUT4:I2->O 1 0.612 0.357 mac1 (fpgaout_7_OBUF)

    OBUF:I->O 3.169 fpgaout_7_OBUF (fpgaout)

    ----------------------------------------Total 8.868ns (6.409ns logic, 2.459ns route)

    (72.3% logic, 27.7% route)

    =====================================================================

    ====

    Total REAL time to Xst completion: 28.00 secs

    Total CPU time to Xst completion: 28.01 secs

    -->

    Total memory usage is 261200 kilobytes

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    50/52

    CONCLUSION

    Optimized and Synthesizable VHDL code is developed for the implementation of 64 BIT MAC

    unit. Each module is tested with some of the sample vectors and output results are perfect with

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    51/52

    minimal delay. Since the delay of 64 bit is less, this design can be used in the system which

    requires high performance in processors involving large number of bits of the operation.

    FUTURE SCOPE

    The future scope of this project is to design a 128 bit MAC unit. This will be even more faster

    but at the expense of some additional hardware. More precisely, we can design an 8 tap filter

    with the present architecture.

    REFERENCES

    [1].Young-Ho Seo and Dong-Wook Kim, "New VLSI Architecture of Parallel Multiplier-

    Accumulator Based on Radix-2 Modified Booth Algorithm," IEEE Transactions on very large

    scale integration (vlsi) systems, vol. 18, no. 2,february 20 10

    (2). Ron S. Waters and Earl E. Swartzlander, Jr., "A Reduced Complexity Wall ace Multiplier

    Reduction, " IEEE Transactions On Computers, vol. 59, no. 8, Aug 20 10

    [3]. C. S. Wallace, "A suggestion for a fast multiplier," iEEE Trans. ElectronComput., vol. EC-

    13, no. I, pp. 14-17, Feb. 1964

    [4]. Shanthala S, Cyril Prasanna Raj, Dr.S.Y.Kulkarni, "Design and VLST Implementation of

    Pipelined Multiply Accumulate Unit," IEEE International Conference on Emerging Trends in

    Engineering and Technology, ICETET-09

    [5]. B.Ramkumar, Harish M Kittur and P.Mahesh Kannan, "ASIC Implementation of Modified

    Faster Carry Save Adder ", European Journal of Scientific Research, Vol. 42, Issue 1, 2010.

    [6]. R.UMA, Vidya Vijayan, M. Mohanapriya and Sharon Paul, "Area, Delay and Power

    Comparison of Adder Topologies", International Journal of VLSI design & Communication

    Systems (VLSICSj Vo1.3, No.1, February 2012

  • 8/21/2019 Dsp m.tech 64 Bit Mac Docx

    52/52

    [7]. V. G. Oklobdzija, "High-Speed VLSI Arithmetic Units: Adders and Multipliers", in "Design

    of High-Performance Microprocessor Circuits", Book edited by A.Chandrakasan,IEEE

    Press,2000

    [8]. Dadda, "Some Schemes for Parallel Multipliers," Alta Frequenza, vol. 34, pp. 349-356, 1965

    [9]. C.S. Wall ace "A Suggestion for a fast multipliers," IEEE Trans. Electronic Computers, vol.

    13, no.l,pp 14-17, Feb. 1967

    WEB Links:

    http://wikipeadia/

    http://ieeexplore.ieee.org/

    http://www.progressive-coding.com/tutorial.php?id=0&print=1

    www.ecommerce.hostip.info

    http://wikipeadia/http://wikipeadia/http://ieeexplore.ieee.org/http://ieeexplore.ieee.org/http://www.progressive-coding.com/tutorial.php?id=0&print=1http://www.progressive-coding.com/tutorial.php?id=0&print=1http://www.ecommerce.hostip.info/http://www.ecommerce.hostip.info/http://www.ecommerce.hostip.info/http://www.progressive-coding.com/tutorial.php?id=0&print=1http://ieeexplore.ieee.org/http://wikipeadia/