Signed Statement of participation · Web viewThe latter frequency is based on the lowest operating...

EE590Final Project: Power

MinimizationRyan ConradJohn Martin

2

ABSTRACT

The purpose of this project is to research and implement ways to improve the power consumption of digital circuits through hardware design. We will find the major sources of power consumption in a typical FPGA design, and then we will find ways to decrease the amount of power consumed by each source. As time permits we will implement the power reduction methods in a simple FPGA design and measure the power difference effected through simulation and/or implementation and measurement on a functioning FPGA. Design, simulation, and implementation of FPGA designs will be performed using Altera Stratix Field Programmable Gate Array, and Altera’s suite of hardware design software. This suite includes Quartus II 5.0 Hardware Design IDE for Altera FPGA’s, SOPC (System on a Programmable Chip) Builder, and the Nios II Development Kit for software compilation specific to a custom processor. INTRODUCTION

We have chosen to research methods of reducing power consumed by electronic devices. Based on what we uncover, we plan to implement a few methods to reduce the actual power consumed by a simple digital circuit design. These methods will be based upon or inspired by methods discovered in our research. In particular, we are interested in reducing the power consumed by a Field Programmable Gate Array (FPGA). This medium was chosen for several reasons. We believe that FPGAs and their relatives play a major role in the future of electronics, and have a desire to learn more about FPGAs and the design methods and tools associated with them. FPGAs also offer an ideal medium to test our power reduction schemes because they can be modified with relative ease to implement various designs of the same circuit without altering system externals or test equipment setup. This will result in a minimum of error in the differences in measured power due to changes in component configurations and test equipment. Finally, the design tools for FPGAs include the ability to simulate the power dissipated in a design implementation based on the configuration of the FPGA and user-defined input signals. Results of the design implementations and a comparison of the power differences will be presented.

DISCUSSION

SYSTEM REQUIREMENTS SPECIFICATIONS

Functional Requirements

The designed systems are mainly for research purposes. The goal is to extract data, both simulated and actual, collected on an FPGA. This data will quantitatively determine which methods work best for reducing power consumption in PC and embedded applications. The final system then must be set up so as to prove the results found previously, in a real-world application.

3

Logic to be tested must cover a wide range of implementations for the same function. The goal is to find implementations that cover a wide range of power consumption, and with as few tradeoffs as possible.

Simulated systems must provide data for both power consumption and propagation delays for each hardware design. All simulations must be able to output propagation delays for every block and path of a design, and also have a power analyzer to analyze the static and dynamic power dissipation of a design. Power simulations will be done over a 1 us period, with a 20ns periodicity clock as applicable. Addition operations will run through as many transitions (addition operations) as possible during the 1 us period. Additions are to be planned such that each operation has as many changing bits/gates as possible, to give a worse case estimate of power consumption for each type of logic. The same requirements are for shift register simulations. Timing analyses must attempt to account for worse case propagation delays.

Actual data collected from the FPGA should be similar to that of the simulated data collected, in order to make comparisons between simulations and real-world extracted data. Power measurements are expected to have about 1mW resolution based on initial tests. Timing analyses are expected to have about 1ns of resolution, or run at a 1GHz clock.

In order for the functional processor test to prove these concepts, it must implement all of the different types of hardware previously tested, and also have a fully functional processor on chip as well. We must be able to demonstrate that it can run a function either in software or in our additional hardware. Tests run must run a set of instructions solely on the processor, and then run the same instruction set, optimizing for either the processor or our hardware. Data collected must be able to show quantitatively power consumption (average over a period of time) and the amount of time required to complete.

System Inputs

The following inputs were used in the simulation and measurement of the adder (and shift register) power dissipation independently of the processor (Processor was not on the chip while the simulation and measurements were accomplished).

Addition Simulations - 32 bit A and B bus- 32 bit busses A and B vs. a 8 bit A and B busses

Shift Register Simulations- 32 bit shift buss- 32 bit shift bus vs. 8 bit shift bus- 50 MHz clock

Addition FPGA Measurements- start button- clk (internal to board)

4

Shift Register FPGA Measurements- start button- single LED output- clk (internal to board)

The following inputs were used in the implementation of the entire system including The processor and peripherals.

Processor/Hardware Accelerator- Clk – variable - Add 1,2,4 or 8.

System Outputs

The following outputs were used in the simulation and measurement of the adder (and shift register) power dissipation independently of the processor (Processor was not on the chip while the simulation and measurements were accomplished).

Addition Simulations - 32 bit sumOut- 32 bit sumOut vs. 8 bit sumOut

Shift Register Simulations- 32 bit shiftOut

- 32 bit shiftOut vs. 8 bit sumOut

Addition FPGA Measurements- single LED output- clkOut (timing only)- timing complete (timing only)- startOut (timing only)

Shift Register FPGA Measurements- single LED output- clkOut (timing only)- timing complete (timing only)- startOut (timing only)

5

The following outputs were used in the implementation of the entire system including The processor and peripherals.

Processor/Hardware Accelerator-8 LED Bank

User InterfaceIt doesn’t make a lot of sense to refer to a user interface for the simulation part of this

design, so all user interfaces will refer to actual hardware implementations.

Addition/shiftingUser will have a start/stop button on the Altera NIOS development board to pause the

addition or shift operation in power measurements. An oscilliscope will readout two voltage levels, with the product of the two representing the power dissipation. With timing measurements the user will have a start button, and the logic analyzer will capture clkOut and timingFinished for measurements of propagation delay.

Processor HardwareOn the NIOS development board the user will have four buttons: the choice of running

addition operations to add 1, 2, 4, or 8 to the accumulating buffer. The initial implementation of this will be on 8 bits, but extracted out to 32 bits eventually. The results of the addition will be displayed as a 8-bit binary number on a bank of LEDs. Other outputs the user will see are identical to those of the preceding activities, power measurements on the oscilliscope to compare, and timing measurements on the logic analyzer.

Operating SpecificationsThe system shall operate in a standard commercial / industrial environment

Temperature Range 0-85CPower 120 VAC 60 Hz/ 9VDC

SYSTEM DESIGN SPECIFICATIONS

We are to design a circuit we can use to test our chosen methods of power reduction. The circuit we have chosen to implement will add a value based on user input to an accumulator and display the sum in binary on a bank of LEDs. The design will use a custom processor with a custom-designed peripheral to perform the actual addition.

Modifications to the circuit which we will implement in an attempt to decrease the power dissipated will focus on three areas. The first area involves the design of the custom adder. Different implementations of the adder will be implemented including a ripple adder, a carry look-ahead adder, and hybrid combinations of the two. The parallel adder has been labeled a 32X2 adder, meaning it has two blocks, or one ripple chain. The 32X4 and 32X8 have respectively four blocks or 2 ripples, and 8 blocks or 4 ripples. The ripple adder is has 32 ripple blocks. The respective power usage of each type of adder will be simulated and measured. In order to accomplish this, the peripheral adder will be required to be strictly modular, with each

6

implementation of an adder interfacing with the same ‘wrapper’ hardware which will in turn interface properly with the processor. Also to be simulated and measured are shift registers. We have 4 different shift registers: one a shift by one, shifting in a zero. The other will be shift by 2, 4, and 8, which will all shift in zeros into their applicable bits.

The second modification is in the frequency of the clock. The frequency will be varied from a maximum of 50MHz to a minimum frequency which will allow the processor to completely finish its operations prior to the earliest expected possible subsequent input. This results in requirements to have various read and write latencies for the processor with respect to the custom adder. For slower implementations of the adder (e.g, a full ripple adder), working at the highest frequencies, a stall cycle will be required to allow the adder to complete its operation prior to the processor reading the results.

The third area of modification will involve a method of stalling the processor when it is not needed (i.e, between inputs). In our implementation this will involve controlling the clock in such a manner that the processor does not see a clock cycle unless it is actually processing information.

The final implementation, utilizing all of the modifications for a power-stingy design is outlined herein.

System Inputs/Outputs

Inputs to the system consist of 5 push-buttons and a clock, all of which are included in the Nios Development board. The 5 push-buttons are debounced, logic low switches. The clock is used at two frequencies: 50MHz (this is the highest speed of the on-board clock), and 210 Hz.

The latter frequency is based on the lowest operating frequency possible which will result in being able to recognize subsequent user inputs at the minimum possible interval between inputs, and fully execute the necessary code prior to the following input signal. The applicable block of code which needs to be performed will always fully execute if given a minimum of 21 clock cycles following the receipt of the input signal. This includes one clock cycle for set-up time of the incoming signal, 19 cycles to perform the read, write, branch, and compare operations, and one cycle for jitter/insurance. In our attempts to push the buttons as fast as possible, we were unable to initiate more than 7 signals in a single second. Assuming that someone can push the button faster than us, we set the limit at 10 signals per second. Therefore, in order to meet our requirements, we must be able to complete 21 clock cycles in 100 ms. This yields a minimum clock period of 4.7619 ms, or a frequency of 210 Hz.

User Interface

The user is able to select the following options using the switches on the development board: Add 1 Add 2 Add 4 Add 8

7

Reset: Resets the accumulator to 0000

Design Procedure

The first task was to implement several versions of an adder, including parallel (carry look-ahead), ripple, and hybrid. Each design was created in order to interface with a wrapper module that would allow easy substitution of distinct adder implementations. The wrapper module was designed, simulated, and implemented using test cases that would exercise each bit as regularly as possible. The resultant power dissipations and timing delays as simulated and as actually measured can be found in the results section of this paper.A similar process was followed for shifters, with implementations designed which would shift the bits of the register by 1,2,4,and 8. The results of the simulations can be found in the results section of this report. The shifter designs were not implemented in the processor peripheral designs due to time considerations.

After the initial simulations of each adder implementation, a peripheral adder module was set up to interface with an Altera Nios II Processor, and a device driver and API were created to interface the hardware with software which would be written to utilize the custom adders. Software was then written to utilize the processor with and without the peripheral custom adders.This was fairly simple to implement as the Nios II processor treats its custom peripherals as memory. As such, an interface was designed to correctly interpret a write signal and data to fill in the appropriate register, and a read signal with the correct address on the address bus would result in the data from the sum register to be written to the appropriate processor data bus. The API/Driver consisted mainly of modifying existing commands for writing to memory so that they would write to the adder registers instead.

Finally, hardware was designed which would allow the processor to be paused, and the software was modified to utilize the hardware which would pause the processor on completion of a task. This pausing process was implemented with the clock up-stream of the processor being enabled by an AND gate and an S-R Flip-flop. The Flip-flop initializes to an asserted state, and the clock propagates through the AND gate to the processor until the processor has completed its initialization routine. The processor then sends a ‘reset’ signal to the flip-flop, which disables the clock, pausing the processor. When an input signal comes in, the S-R flip-flop receives a ‘preset’ signal, the clock is enabled, the processor handles the data, and on completion of its task, sends a ‘reset’ signal to the S-R flip-flop, pausing the processor until a new input signal arrives. System Description

The public interface to this system is composed of a master reset button, and 4 switches. Each switch corresponds to an instruction to add a constant value (1,2,4 or 8) to the current value of the accumulator register. The current value of the accumulator is displayed in binary on the 8 LEDs available on the development board. In addition to these system signals, there are signals external to the system which are used to calculate the amount of power being dissipated. This

8

set-up consists of two probes connecting to an oscilloscope, and one 1-ohm resistor in series with the power supply to aid in the measurement of current. For more information on measuring power on this system, see the testing section.

Software Implementation

The Software portion of this system was developed using Altera’s software development environment, Nios II IDE. The code is extremely simple, involving only a loop which checks to see which (if any) input signal is asserted, adding the value which corresponds to that input signal to the value currently in the accumulator, and writing the lowest eight bits of the value to the register corresponding to the 8-bit LED bank. Adding values involves the use of the custom peripheral adder, and so that portion of the code for the add is user-defined, and involves an API written for this project to allow use of the peripheral. The functions in this API are available in an assembly-level function set, writing to a specific register within the peripheral adder or reading the value in the output register. This allows the maximum flexibility in writing code to allow the programmer to optimize for different uses. For example, if one is adding a constant value to many different variables, it is undesirable to write the same constant value to one of registers in the adder every time this operation is performed. In addition to the assembly-level function set, there is a C-level instruction which writes to each addend register is succession and then returns the sum. See the Appendix for the complete code. In addition to this software, some of the hardware was implemented using verilog behavioral statements. Hardware Implementation

Figure 1. Hardware Diagram for additional Adder Logic to be used with Embedded Processor

9

Adder Peripheral

The peripheral adder has four inputs and an output. The module includes the adder, 3 8-bit registers, and hardware to interface with the processor. The actual adder portion consists entirely of combinational logic and is interchangeable, to facilitate in choosing the most appropriate adder for the intended purpose (trade-offs can be made with respect to speed and power consumption).

The 3 8-bit wide registers are D-flip-flops with parallel inputs and outputs, sharing the same clock. The output register is clocked on a negative edge, while the input registers are clocked on a positive edge of the clock. This allows the processor to read a calculated value on the instruction immediately following a write without having to include a stall instruction (assuming the design of the adder is chosen in such a way that the adder operation will complete in less than one-half the period of the clock). The clock signal is only sent to the decoders and registers when the processor is actually processing (pausing the processor also pauses the registers), preventing unnecessary losses due to unused clock signal transmissions.

The interface portion of the peripheral adder allows the processor to treat the peripheral as low-latency memory. This is a common practice. When a write signal and the correct address are fed to the decoder, it allows the clock signal to reach the intended input register, writing the information contained on the processor’s write_data bus to the register. A read signal arriving in conjunction with the address for the output register routes the output of the register to the read_data bus of the processor.

The adder peripheral does not currently support overflow or carry-out outputs, but these would be relatively simple to add to the design.

10

Figure 2. Logic diagram for traditional CPU doing for adding

Known/Regular Period Implementation of System

If the input signal will always come at a known, regular interval, this is an optimal set-up for the system. Four inputs to the system are commands to add 1,2,4 or 8 to the accumulator on the CPU. The clock is chosen to be at a frequency where 19 clock cycles will take place between each occurrence of an input signal. This results in a system where every clock-cycle performs needed work, and no extra clock cycles are propagated nor logic gates changing levels unnecessarily. When combined with a low-power peripheral adder, maximum power savings are achieved. See the section on the analysis of results for further details.

11

Q

QSET

CLR

S

R

a11

a22

3a3

4a4

b1

b2

b3

b4

5

6

7

8

Vcc1

0

GND

0

CPU

50 M Hz

Figure 3. Hardware Logic for Setting up Adder Accumulator

Unknown/Irregular Period Implementation of system

This is the optimized implementation of the system when the period of the input signal is unknown or irregular. In this case, the four input signals going to the CPU are also routed to the ‘Set’ input of an SR flip-flop used as a clock-enable. An input signal allows the clock signal to reach the CPU, un-pausing the processor. 21 clock-cycles later, the CPU sends a logic-low signal to the ‘Reset’ input of the SR flip-flop, and the clock is disabled, pausing the processor once again. In this case, the clock frequency is chosen to allow 21 clock-cycles in between two consecutive inputs at the lowest expected time-delay between inputs, and the peripheral adder is chosen to allow a read operation on the clock-cycle immediately following the write operation. Typically this will result in a wide range of acceptable input clock frequencies. This allows for a less accurate (cheaper) clock source, and if division of the clock is necessary, a division method resulting in low power consumption should be achievable. The only disadvantage to this method vs. the previous method is the two additional clock cycles in order to set and reset the S/R flip-flop. In many cases this disadvantage is outweighed by the less demanding clock. See relts analysis section for further details.

12

TEST PLAN

Addition hardware will be first tested with two inputs switching, just to get an idea of power consumption. Afterwards a simulation will be run with adders to feed a constant B input in, and feed back the output to A at a regular clock interval. This same setup will be used for taking power measurements on the FPGA.

Shift register testing will be accomplished in simulation with the input to be shifted counting upwards by a random value. Simulation will also be run with a constant shift performed of varying shift lengths, with the output fed back into the input to be shifted. Timing will be performed as how long it takes to shift up to the start value to the N bit. FPGA measurements will also be performed in the same manner. The feedback will be performed internally to the device.

The following diagram shows the setup for testing both timing and latency for adders. The value of 0x5555 5555 was picked with the assumption that the most number of bits will be changing, thereby giving the worst-case power consumption and propagation delays.

Figure 4. Setup for running initial FPGA measurements with an adder

13

Power Measurement is to be performed with the following setup:

Figure 5. Test Equipment Setup for Measuring Power Across Nios Development Board

This power measurement unfortunately will be accomplished for the entire development board, not just the FPGA with our logic on it. To account for this, we will take some baseline data, to determine normal power consumption levels with nothing running, i.e. the power consumption of the board and peripherals.

Timing Analysis will be accomplished on the FPGA using a logic analyzer, via a start button, a startOut, clk, clkOut, and a finishedFlag. The start and clk outputs are connected directly to the inputs, and exist so the logic analyzer can detect them and display them along with the finishedFlag. In initial simulations, there is a delay between clk and clkOut, and start and startOut. There is also a delay due to logic setting the finishedFlag. Timing will be taken from positive edge of clkOut to finishedFlag going high.

14

TEST CASES

Additions: parallel 32X2, parallel 32X4, parallel 32X8, rippleTest Type Input A Input B Simulation

LengthPower dynamic

Tpd Worst-Case

Simulation Count by 1050

Count by 1085 1 us

Simulation Output 32’h5555 5555 1 us N/AFPGA data Output 32’h5555 5555 1 us N/ASimulationTiming

Output 32’h5555 5555 Output = FFFF FFFF

N/A

FPGA Timing data


N/A

Additions: ripple 32, ripple 8Test Type Input A Input B Simulation

LengthPower dynamic

Tpd Worst-Case

Simulation Count by 1050

Count by 1085 1 us

Simulation Output 32’h5555 5555 1 us N/AFPGA data Output 32’h5555 5555 1 us N/ASimulation Timing


N/A

FPGA Timing data


N/A

15

Shifting: shift by 1,2,4,8Test Type Input A Simulation

LengthPower dynamic

Tco Worst-Case

Simulation 0x10011 1 usSimulation Alternate

10s1 us N/A

FPGA data Alternate 10s

1 us N/A

Simulation Timing

Alternate 10s

1st bit shifts to end

N/A

FPGA Timing data

Alternate 10s


N/A

Shifting: 8 bit vs. 32 bitTest Type Input A Simulation

LengthPower dynamic

Tco Worst-Case

Simulation (2^N)-1And 0

1 us

Simulation Alternate 10s

1 us N/A

FPGA data Alternate 10s

1 us N/A

Simulation Timing

Alternate 10s


N/A

FPGA Timing data

Alternate 10s


N/A

16

PRESENTATION, DISCUSSION, AND ANALYSIS OF RESULTS

Adders Initial SimulationInitial simulations of 32 bit adders behave just expected, with a few anomalies. Just as one would expect, the parallel adder is the fastest, but takes up the most power, due to its larger size of hardware. Worst-case propagation delay ranges from 20ns to 35 ns, which also seems right in range. The following figure shows these relationships. The conclusion here is that a dynamic switching routine can be used for a hardware accelerator/coprocessor. If speed is desired, a full parallel adder should be used. If low power is desired, then a ripple or possibly a one-bit shift register adder should be used. This can be switched at runtime to achieve the best performance. The one drawback of this scheme may be that the energy consumed over the entire operation may be higher or not achieve better performance. The ripple adder consumes less power, but is slower, so the integral of power over time may not be less, and therefore there may not be any real savings. Energy unfortunately was not measured at this stage.

Figure 6. Initial Simulated Power Consumption and Worst-Case Tpd for different adders

17

19.96160.87

24.34136.61

31.4143.58

35.19638.9

0 50 100 150 200

par32X2

par32X4

par32X8

ripple

Adder Power vs. Latency

Latency (ns) Dynamic Power (mW)

The next thing that was tested short adders vs. long adders. This really was intended to test the concept of short arithmetic vs. long arithmetic. The idea was that for many embedded operations, a 32-bit number is not needed; a simple 8 bit number could be used. We tested adders to quantify how much savings this idea could be. Simulations with the Quartus software showed that the short adder was both faster and consumed less power than the long adder, as would be expected. We found that a short adder gives about a 60% savings in propagation delay, and a 55% savings in power. In contrast to comparisons of 32-bit adders, there is no tradeoff with using short arithmetic. This indicates that short arithmetic is one of the largest savings.

Figure 7. Initial Simulated Power Consumption and Worst-Case Tpd for short vs. long adder

Shifters Initial SimulationComparison of different types of 32-bit shifters was not very conclusive. The Tco values all seemed to be about in the same range for one shift, as well as the power consumption. This is as expected. The time it takes to perform a shift of 1 or 8 should be very similar, just one operation. It may be expected that it would take more power to shift by 8 than by one, but initial simulation results don’t indicate that. The only real savings with this type of operation is that more useful work is performed in the same amount of time using the same amount of power.

18

14.002

10.36

35.196

23.13

0 10 20 30 40

short adder

full adder

Short vs. Long Adder

Latency (ns) Dynamic Power (mW)

Figure 8. Simulate Power and Tco comparison between shifting different numbers of bits

Also as expected, the short shifter (shifting only 8 bits of a 32 bit word) used a lot less power than a long shifter), and the Tco for both were very similar. The short shifter uses almost 1/3 less power than the long shifter. This could be very useful again for dynamically using short arithmetic vs. long arithmetic.

Figure 9. Simulated Power and Tco comparison between short shifter and long shifter.

19

9.10690.1

8.198105.69

8.0887.94

7.9796.37

0 20 40 60 80 100 120

shiftbyOne

shiftByTwo

shiftByFour

shiftByEight

Shifters Power and Latency

Tco (ns) Dynamic Power (mW)

111.11

329.65

0 50 100 150 200 250 300 350

short shift

long shift

Long vs. Short Shifter

Tco (ns) Dynamic Power (mW)

Initial Simulation DiscussionThe following table sums up the results of initial simulations for several different comparisons among low-level logic. The main conclusion here is that there is much larger savings for short vs. long arithmetic operations, than for varying types of 32-bit operations. This seems to be the area to focus on for use with an embedded processor for saving power. This isn’t to say that it won’t also be useful to dynamically implement different types of arithmetic operations that have power vs. Tpd tradeoffs. It could still be very useful to implement such a dynamically controlled algorithm with an embedded processor.

Long Adders

Long vs. Short Adder

Long Shifters

Long vs. Short Shifter

Latency Difference

43% 60% 12.5% 4%

PowerDifference

76% 55% 6.5% 66%

Table 1. Percent Difference for High and Low Ends of Power and Latency Spectrum, with Varying Types of Hardware

FPGA Power DataUnfortunately, there was only enough time to measure power and latency of the different types of adders. The results were very similar to that of the simulations, with varying degrees of consistency. As mentioned previously, values measured were taken against baseline data of the FPGA running, with no logic programmed into it. This baseline value was 2.007 W, with all data taken over a 25ms window. All power values obtained are averaged over a 25 ms window. Power consumption actually measured on the FPGA did not have as pronounced a difference as the simulated values, but the general trends were there. Other applications on the development board may have been interfering, creating noise. The simulations may not be very accurate, or the differing lengths of time the power averages were taken for may have been a contributing factor. The result is that the general trend of ripple adders consuming less power than parallel adders holds true, but there seem to be too many variables in these tests to conclusively quantify this difference. However, the percentage difference for the short vs. long adder seems to of the same magnitude for short vs. long adder comparisons.

20

Figure 10. Measured Power Consumption of varying types of 32-bit adders

Figure 11. Measured Power Consumption of short vs. long adder

21

65

59

54 56 58 60 62 64 66

Dynamic Power (mW)

Ripple Adder

Short RippleAdder

Long vs. Short Adder Power

FPGA Latency DataThis data was also only taken for different types of adders. We found that the simulated data was actually very close to that of what we could measure on the logic analyzer. Now there is room for error in the process. We don’t exactly how long the delay is between clk input and clk output, nor for start input and start output. We also didn’t measure the delay between all the outputs being 0xFFFF FFFF and the long setting a flag to detect that. With the measured latencies, again we see that the general trends simulated previously hold true: the ripple adder is slower than the parallel adder, by about twice as much. Also, the short ripple adder is measurably faster than the long ripple adder. All of the measured data is within 2ns of the actual time, due to the fact that the logic analyzer runs on only a 500 MHz clock, giving a maximum resolution of 2ns.

Parrallel Adder 32X2 Par32X4 Par32X8 rippleAdderFPGA Measured (ns) 10 14 14 20Simulated (ns) 11.85 16.2 15.5 18Absolute Difference (ns) 1.85 2.2 1.5 2

Table 2. Measured and Simulated Tpd for Different Types of Adders

Ripple Short Adder Ripple Long AdderFPGA Measured (ns) 20 8Simulated (ns) 18 6.6Absolute Difference (ns) 2 1.4

Table 3. Measured and Simulated Tpd for Short vs. Long Adder

22

Final Implementation

The final version of the processor with the adder peripheral was not achieved due to difficulties with the software packages. Peripherals to the processor were successfully implemented on the Stratix FPGA, including a preliminary version of the adder which would read values, but not write values to the registers. In the attempt to debug this module, some parameter was changed in one of the three programs involved which resulted in further attempts to load software onto the custom processor unsuccessful. Even previously saved projects which had worked on the processor no longer functioned. Attempts to recover before the dead-line were unsuccessful. Due to this unfortunate occurrence, actual power readings from the processor-based modules could not be performed. However, calculations of the expected power and power savings are simple and straightforward:

CALCULATED POWER DISSIPATION AND SAVINGS WITH PROCESSOR IMPLEMENTATIONS:

The total power dissipated by the FPGA is the sum of two forms of power: Static, and Dynamic. Static power dissipation for our purposes is defined as the power consumed when no logic element is changing state. This is the case when no clock signal is being generated, and is nearly equivalent to the power dissipated when no logic is present on the FPGA. Since we were unable to measure the power used by the FPGA independent of other peripherals on the development board, this static power dissipation is not defined. However, the total static power dissipation for the development board was approximately 2.010 Watts. Dynamic Power is defined for our purposes as any power associated with the change in state of a logic element. In practice, we consider all power dissipated beyond that which we would expect from static dissipation to be dynamic. So, the calculation for dynamic power is:

Dynamic Power = Total Power – Static Power.

Once again, the Total Power that we measure is the power drop across the entire development board, and so the Dynamic power that we calculate includes the dynamic power of any peripherals on the board which are changing state due to the clock. When running its default program at 50MHz, the FPGA development board dissipates a measured total power of approximately 3.630 Watts. Plugging the two values into the equation above, we have a dynamic power dissipation of 1.620 Watts across the development board. When running our parallel adder at 50 MHz and driving a single LED, 0.083 Watts of dynamic power are dissipated. We expect our final design to dissipate somewhere in between these two extremes, approximately 0.8 Watts at 50MHz.

KNOWN, REGULAR PERIODIC INPUT:

23

This design is used when we know that the input signals we be coming in at a known, regular frequency. We capture the signal and perform our operations so that when we complete the loop and start over, the subsequent signal is ready to be captured and processed. For further details on the design of this implementation, see the hardware implementation portion of the system design specifications section.

Since Dynamic power is associated with the change of state of a logic element, and our design uses almost exclusively logic elements which change state only after a clock edge, the amount of dynamic power dissipated in our design can be approximated by:

Dynamic Power = kfWhere k = Average Energy dissipated per clock cycleAnd f = Number of clock cycles / unit of time.

Using our approximations we can solve for our constant k, yielding a value of 16 x10-9 J/cycle. Since our implementation uses 19 clock cycles for each iteration, the total energy used to complete one full loop is 19*16 x10-9 J or 304 nJ. This is a very small amount of energy, but at 1 GHz it would be consuming 16 Watts of power. Since the default clock on the development board is 50MHz, we set that as our baseline and calculate the power saved as follows:

Power Saved = Dynamic Power1 – Dynamic Power2

= k(f1-f2) = 0.8 – (16 x10-9 *f2)

If we know that a signal comes in every microsecond, we calculate a clock frequency of 19 cycles/μs or 19MHz. Plugging this into our formula we arrive at a power savings of 496 mW, 62% of dynamic power, or 18% of total power. A proper clock rate for a signal coming in every millisecond (19 KHz) results in a power savings of 799.7 mW, or over 99.9% savings of Dynamic power and a 28.5% reduction in total power.

The disadvantages to this design are the necessity of a precise, slow clock, and the necessity of synchronizing the CPU with the input signals. The synchronization of the data with the CPU cycles is doable, if annoying, using combinational logic techniques and is not expected to significantly increase the power consumed by the FPGA. However, the slow precise clock typically means dividing down a faster clock through the implementation of a counter, and this division and counting is done at the higher frequency of the faster clock and its dividends, which expends a fair amount of power. Other methods of producing slow clock signals such as charging and discharging capacitors, etc. can also be power hungry, and are less precise, resulting in the need to add more logic to the FPGA in order to dynamically synchronize the input signals and the clock cycles. Still, if the correct clock frequency is easily obtainable, this design will result in very significant decreases in power, and we have shown that the method of reducing the clock frequency is an effective method to provide significant power savings. Optimizing our code to take fewer clock cycles results in additional savings when using this method as we can then further reduce our frequency.

UNKNOWN OR IRREGULAR PERIODIC INPUT

24

This implementation is based on input signals which come at unknown or irregular intervals. Similar to the previous example, we are attempting to avoid unnecessary power being wasted on unused clock cycles. We want the operating loop to cycle through once and only once between subsequent input signals. However, instead of attempting to slow down our clock to match the incoming signal periodicity, we will pause the processor after each operation loop, and un-pause the chip when an input signal is received. The cost in terms of clock cycles is two; that is, our new operational loop uses 21 clock cycles to complete vs. 19 for the previous design. For further details on the design of this implementation, see the hardware implementation portion of the system design specifications section.

Once again, Dynamic power is associated with the change of state of a logic element, and our design uses almost exclusively logic elements which change state only after a clock edge. Therefore, the amount of dynamic power dissipated in our design can be approximated by:

Dynamic Power = kfWhere k = Average Energy dissipated per clock cycleAnd f = Number of clock cycles experienced / unit of time.

However, this time we are not limited by the frequency of the clock, but by the frequency of the input signals. Each incoming signal results in a use of 21*16 x10-9

J, or 336 nJ. For the total dynamic power dissipated:

Dynamic Power = 336x10-9 J * #inputs per second.

The highest number of input signals per second this design can support is approximately 2.4 million, based on our maximum frequency of 50MHz. Since our design is based on human manipulation, this is a ridiculously high number. There is no lower limit to the number of signals received. One input per year is easily accommodated (although a better power-saving scheme could undoubtedly be designed). Our formula for power saved is:

Power Saved = Dynamic Power1 – Dynamic Power2

= .8W – (336x10-9 * #inputs/s)

The savings are nearly identical to those of the previous implementation. 99.9% dynamic power reduction and 28.5% total power reduction is achieved at any incoming frequency less than 2380 inputs per second.

The disadvantage to this design is in the 2 extra clock cycles of code and additional logic elements involved in pausing the processor. However, these power dissipations are only noticeable at higher input frequencies- upward of 10,000 inputs per second before it even makes a 5 mW difference. For our design purpose (human user input), the added power dissipation is completely negligible (less than 1 μW). In our opinion the disadvantage is more than compensated by the ability to use any clock speed from 210 Hz to 50MHz. This allows us to use a cheap clock that can be incredibly imprecise without adversely affecting the outcome in any way. If a fast clock is used, it is unnecessary to perform a lot of division in order to gain a usable clock frequency, thus eliminating the need for these potentially power-intensive peripherals.

25

Optimizing the code to run with fewer instructions would improve this method only marginally. Once again, for frequencies around 10 times per second, the improvement would be less than a μWatt. This method is an excellent method to reduce power consumption when a processor is not being used for a large percentage of time, but actually turning the processor power off is not desirable due to latency, power usage on power-up or other considerations. This method is ideal for the purpose of this specific task, in which we expect signals to be received as often as several times per second. ERROR ANALYSIS

As already mentioned briefly, there were a few areas where error in our measurements could have been introduced. One of the big areas were the power measurements taken on the logic analyzer. Most of the signals measured had wide ranges (several hundred millivolts) which could have been noise or the actual signal. We were trying to see differences in power of less than 200 mV, so this introduces some error and uncertainty into our measurements. The other area is the confidence of our 1 Ohm resistor. The 1 Ohm is not precisely 1 Ohm, so the final calculated power may be off somewhat.

For timing analysis, some of the issues have also been mentioned. The biggest are unknown latencies between our signal and getting to the actual logic analyzer. We don’t know how long it takes the signal to go from clock input to clock output at the logic analyzer, and also the start input to the start output. Also, we don’t know how long it takes to go from getting the desired value (finish) to actually seeing the flag go high. We assume that these uncertainties are minimized because we are measuring from the start of an uncertain event to the start of another uncertain event. This makes the assumption that both of these uncertain events take about the same amount of time.

SUMMARY AND CONCLUSION

The methods researched and implemented in this project were successful in reducing the overall power consumed by the Altera FPGA. The research of different configurations of adders and shifters shows that reasonable reductions in power could be gained by having an 8-bit math function library available to a processor. This improvement could easily reach dozens of mWatts of power reduction if a processor were performing a large number of arithmetic operations on 8-bit numbers at moderate to high clock frequencies. In our experience, this is a very likely scenario as many loops, radix schemes, etc. use numbers that do not exceed the capabilities of an 8-bit variable.

The research into clock frequency reduction and processor suspension were also highly successful, resulting in savings of hundreds of mWatts in power dissipation for applications where the processor is spending a significant amount of time waiting for outside inputs.

It is evident from our research and implementations that different methods of power reduction are more effective for different applications. The methods we implemented were useful for the specific applications noted above, but would not be effective at all under different circumstances,

26

and could indeed result in increases to power consumption. It is necessary for a developer to thoroughly understand a system in order to implement effective power reduction techniques.

Finally, the Altera software packages had some flagrant issues. While they clearly had the ability to do everything that we needed to do, the documentation was poorly written, the user interface is not intuitive, and some of the default settings make it impossible to perform any

useful functions on the development board. At the end of the Appendix is a guide for any who plan on using Altera FPGAs in the future for their design projects. We hope that the information

we provide will allow others to learn to use these tools effectively in a much shorter period of time.

27

APPENDIX AAdditional Graphs and Charts

Simulation Setup for Initial Data Simulation

Figure A1. 32 Bit Adder Initial simulation waveform

Figure A2. 32 Bit adder vs. 8 bit adder Initial simulation

28

Figure A3. 32 Shift by 1, 2, 4, 8. (Value of 19 being shifted)

Figure A4. 8 bit vs. 32 bit shifting (Alternate shifting of (2^N)-1 and 0)

29

Ripple Adder Power Measurement

Figure A5. Oscilliscope screenshot of power measurement for Ripple Adder

Parallel Adder Power

Figure A6. Oscilliscope screenshot of power measurement for 32X2 Parallel Adder

30



31


32

Appendix BVerilog Code

/***************************************************************************//* John Martin/Ryan Conrad Enterprises *//* /**/

module Proc_interface( clk,

resetn,chip_select,address,write,write_data,read,read_data,//arith_out

);

//Parameter values to pass to pwm_register_file instance/*parameter clock_divide_reg_init = 32'h0000_0000;parameter duty_cycle_reg_init = 32'h0000_0000;*/

//Task I/Oinput clk; //System clock - tied to all blocksinput resetn; //System reset - tied to all blocksinput chip_select; //Avalon Chip selectinput [1:0]address; //Avalon Address bus input write; //Avalon Write signalinput [7:0]write_data; //Avalon Write data businput read; //Avalon Read signaloutput [7:0]read_data; //Avalon Read data bus

wire [7:0] sumWire;wire [7:0] AWire;wire [7:0] BWire;

//PWM InstanceparAdder8 parAdder8Inst(sumWire, AWire, BWire, 1'b0);

//Register File instance

taskRegFile mem_thing( //Avalon Signals

.clk (clk),

.resetn (resetn),

.chip_select (chip_select),

.address (address),

.write (write),

.write_data (write_data),

.read (read),

.read_data (read_data), .reg1Out (AWire), .reg2Out (BWire), .sumRegIn (sumWire)

);

endmodule

/***************************************************************************/

33

/* John Martin and Ryan Conrad *****************************************/

module taskRegFile( //Avalon Signals

clk,resetn, chip_select,address,write,write_data,read,read_data,

reg1Out, reg2Out, sumRegIn

);

//Parametersparameter regReset = 8'h00;

//Inputs

input clk; //System Clockinput resetn; //System Resetinput chip_select; //Avalon Chip select signalinput [1:0] address; //Avalon Address bus input write; //Avalon Write signalinput [7:0] write_data; //Avalon Write data businput read; //Avalon read signal

//Outputsoutput [7:0] read_data; //Avalon read data busoutput [7:0] reg1Out;output [7:0] reg2Out;input [7:0] sumRegIn;

reg [7:0] readDataOutReg; //Read_data busreg [7:0] writeDataInReg1;reg [7:0] writeDataInReg2;

assign reg1Out = writeDataInReg1;assign reg2Out = writeDataInReg2;assign read_data = readDataOutReg;

//Nodes used for address decodingwire InReg1Selected, InReg2Selected, OutRegSelected;

//Nodes for determining if a valid write occurred to a specific addresswire writeToRegIn1, writeToRegIn2, writeToOutReg;

//Nodes for determining if a valid read occurred to a specific addresswire ReadToRegIn1, ReadToRegIn2, ReadToOutReg;

//Nodes used to determine if a valid access has occurredwire valid_write, valid_read;

//Start Main Code

//address decodeassign InReg1Selected = !address[1] & !address[0]; //address 00assign InReg2Selected = !address[1] & address[0]; //address 01assign OutRegSelected = address[1] & !address[0]; //address 10

34

//determine if a vaild transaction was initiated assign valid_write = chip_select & write;assign valid_read = chip_select & read;

//determine if a write occurred to a specific addressassign writeToRegIn1 = valid_write & InReg1Selected;assign writeToRegIn2 = valid_write & InReg2Selected;assign writeToOutReg = valid_write & OutRegSelected;

//determine if a read occurred to a specific addressassign ReadToRegIn1 = valid_read & InReg1Selected;assign ReadToRegIn2 = valid_read & InReg2Selected;assign ReadToOutReg = valid_read & OutRegSelected;

//Write to InReg1 Registeralways@(posedge clk or negedge resetn)begin

if(~resetn)begin //Async ResetwriteDataInReg1 <= regReset; //32'h0000_0000;

endelse begin

if(writeToRegIn1) beginwriteDataInReg1 <= write_data;

endelse begin

writeDataInReg1 <= writeDataInReg1; end

endend

//Write to InReg2 Registeralways@(posedge clk or negedge resetn)begin

if(~resetn)begin //Async ResetwriteDataInReg2 <= regReset; //32'h0000_0000;

endelse begin

if(writeToRegIn2) beginwriteDataInReg2 <= write_data;

endelse begin

writeDataInReg2 <= writeDataInReg2; end

endend

//Read Data Busalways@(negedge clk or negedge resetn)begin

if(~resetn)begin //Async ResetreadDataOutReg <= regReset;

end else readDataOutReg = sumRegIn; end

endmodule

module parAdder8(Sum, A, B, Cin);

35

output [7:0]Sum; input [7:0]A;input [7:0]B;input Cin;

wire G0, G1, G2, G3, G4, G5, G6, G7, P0, P1, P2, P3, P4, P5, P6, P7, C1, C2, C3, C4, C5, C6, C7;

sumbit bit0(G0, P0, Sum[0], A[0], B[0], Cin);sumbit bit1(G1, P1, Sum[1], A[1], B[1], C1);sumbit bit2(G2, P2, Sum[2], A[2], B[2], C2);sumbit bit3(G3, P3, Sum[3], A[3], B[3], C3);sumbit bit4(G4, P4, Sum[4], A[4], B[4], C4);sumbit bit5(G5, P5, Sum[5], A[5], B[5], C5);sumbit bit6(G6, P6, Sum[6], A[6], B[6], C6);sumbit bit7(G7, P7, Sum[7], A[7], B[7], C7);

carryLogic instance1(C7, C6, C5, C4, C3, C2, C1, Cin, P0, P1, P2, P3, P4, P5, P6, G0, G1, G2, G3, G4, G5, G6);

endmodule

module carryLogic8(C7, C6, C5, C4, C3, C2, C1, Cin, P0, P1, P2, P3, P4, P5, P6, G0, G1, G2, G3, G4, G5, G6);

input Cin, P0, P1, P2, P3, P4, P5, P6, G0, G1, G2, G3, G4, G5, G6;output C7, C6, C5, C4, C3, C2, C1;

wire POCinWire;wire P1G0Wire, POP1CinWire;wire P2G1Wire, P1P2G0Wire, POP1P2CinWire;wire P3G2Wire, P2P3G1Wire, P1P2P3G0Wire, P0P1P2P3CinWire;wire P4G3Wire, P3P4G2Wire, P2P3P4G1Wire, P1P2P3P4G0Wire, P0P1P2P3P4CinWire;wire P5G4Wire, P4P5G3Wire, P3P4P5G2Wire, P2P3P4P5G1Wire, P1P2P3P4P5G0Wire, P0P1P2P3P4P5CinWire;wire P6G5Wire, P5P6G4Wire, P4P5P6G3Wire, P3P4P5P6G2Wire, P2P3P4P5P6G1Wire, P1P2P3P4P5P6G0Wire, P0P1P2P3P4P5P6CinWire;

or C1or(C1, G0, POCinWire);or C2or(C2, G1, P1G0Wire, POP1CinWire);or C3or(C3, G2, P2G1Wire, P1P2G0Wire, POP1P2CinWire);or C4or(C4, G3, P3G2Wire, P2P3G1Wire, P1P2P3G0Wire, P0P1P2P3CinWire);or C5or(C5, G4, P4G3Wire, P3P4G2Wire, P2P3P4G1Wire, P1P2P3P4G0Wire, P0P1P2P3P4CinWire);or C6or(C6, G5, P5G4Wire, P4P5G3Wire, P3P4P5G2Wire, P2P3P4P5G1Wire, P1P2P3P4P5G0Wire, P0P1P2P3P4P5CinWire);or C7or(C7, G6, P6G5Wire, P5P6G4Wire, P4P5P6G3Wire, P3P4P5P6G2Wire, P2P3P4P5P6G1Wire, P1P2P3P4P5P6G0Wire, P0P1P2P3P4P5P6CinWire);

and P0CinAnd(POCinWire, P0, Cin);

and P1G0And(P1G0Wire, P1, G0);and P0P1CinAnd(POP1CinWire, P0, P1, Cin);

and P2G1And(P2G1Wire, P2, G1);and P1P2G0And(P1P2G0Wire, P1, P2, G0);and P0P1P2CinAnd(POP1P2CinWire, P0, P1, P2, Cin);

and P3G2And(P3G2Wire, P3, G2);and P2P3G1And(P2P3G1Wire, P2, P3, G1);and P1P2P3G0And(P1P2P3G0Wire, P1, P2, P3, G0);and P0P1P2P3CinAnd(P0P1P2P3CinWire, P0, P1, P2, P3,Cin);

and P4G3And(P4G3Wire, P4, G3);and P3P4G2And(P3P4G2Wire, P3, P4, G2);and P2P3P4G1And(P2P3P4G1Wire, P2, P3, P4, G1);and P1P2P3P4G0And(P1P2P3P4G0Wire, P1, P2, P3, P4, G0);and P0P1P2P3P4CinAnd(P0P1P2P3P4CinWire, P0, P1, P2, P3, P4, Cin);

and P5G4And(P5G4Wire, P5, G4);and P4P5G3And(P4P5G3Wire, P4, P5, G3);and P3P4P5G2And(P3P4P5G2Wire, P3, P4, P5, G2);and P2P3P4P5G1And(P2P3P4P5G1Wire, P2,P3, P4, P5, G1);and P1P2P3P4P5G0And(P1P2P3P4P5G0Wire, P1, P2, P3, P4, P5, G0);

36

and P0P1P2P3P4P5CinAnd(P0P1P2P3P4P5CinWire, P0, P1, P2, P3, P4, P5, Cin);

and P6G5And(P6G5Wire, P6, G5);and P5P6G4And(P5P6G4Wire, P5, P6, G4);and P4P5P6G3And(P4P5P6G3Wire, P4, P5, P6, G3);and P3P4P5P6G2And(P3P4P5P6G2Wire, P3, P4, P5, P6, G2);and P2P3P4P5P6G1And(P2P3P4P5P6G1Wire, P2,P3, P4, P5, P6, G1);and P1P2P3P4P5P6G0And(P1P2P3P4P5P6G0Wire, P1, P2, P3, P4, P5, P6, G0);and P0P1P2P3P4P5P6CinAnd(P0P1P2P3P4P5P6CinWire, P0, P1, P2, P3, P4, P5, P6, Cin);endmodule

module sumbit(G, P, S, A, B, Cin);

input A, B, Cin;output G, P, S;

and OnestAnd(G, A, B);xor OnestXOR(P, A, B);xor TwondXOR(S, P, Cin);

endmodule

/*********@2005 jOHN mARTIN AND rYAN cONRAD*******/

// synthesis translate_off`timescale 1ns / 100ps// synthesis translate_onmodule short_adder_0 ( // inputs: address, chip_select, clk, read, resetn, write, write_data,

// outputs: read_data );

output [ 7: 0] read_data; input [ 1: 0] address; input chip_select; input clk; input read; input resetn; input write; input [ 7: 0] write_data;

wire [ 7: 0] read_data; Proc_interface the_Proc_interface ( .address (address), .chip_select (chip_select), .clk (clk), .read (read), .read_data (read_data), .resetn (resetn), .write (write), .write_data (write_data) );

endmodule

37

*************************Parallel Adder 32X2 Code********************************************module parAdder32(Sum, A, B, Cin);

output [31:0]Sum; input [31:0]A;input [31:0]B;input Cin;

wire G0, G1, G2, G3, G4, G5, G6, G7, G8, G9, G10, G11, G12, G13, G14, G15, G16, G17, G18, G19, G20, G21, G22, G23, G24, G25, G26, G27, G28, G29, G30, G31;wire P0, P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11, P12, P13, P14, P15, P16, P17, P18, P19, P20, P21, P22, P23, P24, P25, P26, P27, P28, P29, P30, P31;wire C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, C11, C12, C13, C14, C16, C17, C18, C19, C20, C21, C22, C23, C24, C25, C26, C27, C28, C29, C30, C31;wire C15out;wire C15;

sumbit bit0(G0, P0, Sum[0], A[0], B[0], Cin);sumbit bit1(G1, P1, Sum[1], A[1], B[1], C1);sumbit bit2(G2, P2, Sum[2], A[2], B[2], C2);sumbit bit3(G3, P3, Sum[3], A[3], B[3], C3);sumbit bit4(G4, P4, Sum[4], A[4], B[4], C4);sumbit bit5(G5, P5, Sum[5], A[5], B[5], C5);sumbit bit6(G6, P6, Sum[6], A[6], B[6], C6);sumbit bit7(G7, P7, Sum[7], A[7], B[7], C7);

sumbit bit8(G8, P8, Sum[8], A[8], B[8], C8);sumbit bit9(G9, P9, Sum[9], A[9], B[9], C9);sumbit bit10(G10, P10, Sum[10], A[10], B[10], C10);sumbit bit11(G11, P11, Sum[11], A[11], B[11], C11);sumbit bit12(G12, P12, Sum[12], A[12], B[12], C12);sumbit bit13(G13, P13, Sum[13], A[13], B[13], C13);sumbit bit14(G14, P14, Sum[14], A[14], B[14], C14);/*([OUTPUTS]G,P,S, Cout, [INPUTS]A, B, Cin);*/sumbitWithCarryOut bit15(G15, P15, Sum[15], C15out, A[15], B[15], C15);

sumbit bit16(G16, P16, Sum[16], A[16], B[16], C15out);sumbit bit17(G17, P17, Sum[17], A[17], B[17], C17);sumbit bit18(G18, P18, Sum[18], A[18], B[18], C18);sumbit bit19(G19, P19, Sum[19], A[19], B[19], C19);sumbit bit20(G20, P20, Sum[20], A[20], B[20], C20);sumbit bit21(G21, P21, Sum[21], A[21], B[21], C21);sumbit bit22(G22, P22, Sum[22], A[22], B[22], C22);sumbit bit23(G23, P23, Sum[23], A[23], B[23], C23);

sumbit bit24(G24, P24, Sum[24], A[24], B[24], C24);sumbit bit25(G25, P25, Sum[25], A[25], B[25], C25);sumbit bit26(G26, P26, Sum[26], A[26], B[26], C26);sumbit bit27(G27, P27, Sum[27], A[27], B[27], C27);sumbit bit28(G28, P28, Sum[28], A[28], B[28], C28);sumbit bit29(G29, P29, Sum[29], A[29], B[29], C29);sumbit bit30(G30, P30, Sum[30], A[30], B[30], C30);sumbit bit31(G31, P31, Sum[31], A[31], B[31], C31);

carryLogic16 instance1(C15, C14, C13, C12, C11, C10, C9, C8, C7, C6, C5, C4, C3, C2, C1, Cin, P0, P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11, P12, P13, P14, G0, G1, G2, G3, G4, G5, G6, G7, G8, G9, G10, G11, G12, G13, G14);carryLogic16 instance2(C31, C30, C29, C28, C27, C26, C25, C24, C23, C22, C21, C20, C19, C18, C17, C15out, P16, P17, P18, P19, P20, P21, P22, P23, P24, P25, P26, P27, P28, P29, P30, G16, G17, G18, G19, G20, G21, G22, G23, G24, G25, G26, G27, G28, G29, G30);

endmodule

38

module carryLogic16(C15, C14, C13, C12, C11, C10, C9, C8, C7, C6, C5, C4, C3, C2, C1, Cin, P0, P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11, P12, P13, P14, G0, G1, G2, G3, G4, G5, G6, G7, G8, G9, G10, G11, G12, G13, G14);

input Cin, P0, P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11, P12, P13, P14, G0, G1, G2, G3, G4, G5, G6, G7, G8, G9, G10, G11, G12, G13, G14;output C15, C14, C13, C12, C11, C10, C9, C8, C7, C6, C5, C4, C3, C2, C1;

wire POCinWire;wire P1G0Wire, POP1CinWire;wire P2G1Wire, P1P2G0Wire, POP1P2CinWire;wire P3G2Wire, P2P3G1Wire, P1P2P3G0Wire, P0P1P2P3CinWire;wire P4G3Wire, P3P4G2Wire, P2P3P4G1Wire, P1P2P3P4G0Wire, P0P1P2P3P4CinWire;wire P5G4Wire, P4P5G3Wire, P3P4P5G2Wire, P2P3P4P5G1Wire, P1P2P3P4P5G0Wire, P0P1P2P3P4P5CinWire;wire P6G5Wire, P5P6G4Wire, P4P5P6G3Wire, P3P4P5P6G2Wire, P2P3P4P5P6G1Wire, P1P2P3P4P5P6G0Wire, P0P1P2P3P4P5P6CinWire;wire P7G6Wire, P6P7G5Wire, P5P6P7G4Wire, P4P5P6P7G3Wire, P3P4P5P6P7G2Wire, P2P3P4P5P6P7G1Wire, P1P2P3P4P5P6P7G0Wire, P0P1P2P3P4P5P6P7CinWire;wire P8G7Wire, P7P8G6Wire, P6P7P8G5Wire, P5P6P7P8G4Wire, P4P5P6P7P8G3Wire, P3P4P5P6P7P8G2Wire, P2P3P4P5P6P7P8G1Wire, P1P2P3P4P5P6P7P8G0Wire, P0P1P2P3P4P5P6P7P8CinWire;wire P9G8Wire, P8P9G7Wire, P7P8P9G6Wire, P6P7P8P9G5Wire, P5P6P7P8P9G4Wire, P4P5P6P7P8P9G3Wire, P3P4P5P6P7P8P9G2Wire, P2P3P4P5P6P7P8P9G1Wire, P1P2P3P4P5P6P7P8P9G0Wire, P0P1P2P3P4P5P6P7P8P9CinWire;wire P10G9Wire, P9P10G8Wire, P8P9P10G7Wire, P7P8P9P10G6Wire, P6P7P8P9P10G5Wire, P5P6P7P8P9P10G4Wire, P4P5P6P7P8P9P10G3Wire, P3P4P5P6P7P8P9P10G2Wire, P2P3P4P5P6P7P8P9P10G1Wire, P1P2P3P4P5P6P7P8P9P10G0Wire, P0P1P2P3P4P5P6P7P8P9P10CinWire;wire P11G10Wire, P10P11G9Wire, P9P10P11G8Wire, P8P9P10P11G7Wire, P7P8P9P10P11G6Wire, P6P7P8P9P10P11G5Wire, P5P6P7P8P9P10P11G4Wire, P4P5P6P7P8P9P10P11G3Wire, P3P4P5P6P7P8P9P10P11G2Wire, P2P3P4P5P6P7P8P9P10P11G1Wire, P1P2P3P4P5P6P7P8P9P10P11G0Wire, P0P1P2P3P4P5P6P7P8P9P10P11CinWire;wire P12G11Wire, P11P12G10Wire, P10P11P12G9Wire, P9P10P11P12G8Wire, P8P9P10P11P12G7Wire, P7P8P9P10P11P12G6Wire, P6P7P8P9P10P11P12G5Wire, P5P6P7P8P9P10P11P12G4Wire, P4P5P6P7P8P9P10P11P12G3Wire, P3P4P5P6P7P8P9P10P11P12G2Wire, P2P3P4P5P6P7P8P9P10P11P12G1Wire, P1P2P3P4P5P6P7P8P9P10P11P12G0Wire, P0P1P2P3P4P5P6P7P8P9P10P11P12CinWire;wire P13G12Wire, P12P13G11Wire, P11P12P13G10Wire, P10P11P12P13G9Wire, P9P10P11P12P13G8Wire, P8P9P10P11P12P13G7Wire, P7P8P9P10P11P12P13G6Wire, P6P7P8P9P10P11P12P13G5Wire, P5P6P7P8P9P10P11P12P13G4Wire, P4P5P6P7P8P9P10P11P12P13G3Wire, P3P4P5P6P7P8P9P10P11P12P13G2Wire, P2P3P4P5P6P7P8P9P10P11P12P13G1Wire, P1P2P3P4P5P6P7P8P9P10P11P12P13G0Wire, P0P1P2P3P4P5P6P7P8P9P10P11P12P13CinWire;wire P14G13Wire, P13P14G12Wire, P12P13P14G11Wire, P11P12P13P14G10Wire, P10P11P12P13P14G9Wire, P9P10P11P12P13P14G8Wire, P8P9P10P11P12P13P14G7Wire, P7P8P9P10P11P12P13P14G6Wire, P6P7P8P9P10P11P12P13P14G5Wire, P5P6P7P8P9P10P11P12P13P14G4Wire, P4P5P6P7P8P9P10P11P12P13P14G3Wire, P3P4P5P6P7P8P9P10P11P12P13P14G2Wire, P2P3P4P5P6P7P8P9P10P11P12P13P14G1Wire, P1P2P3P4P5P6P7P8P9P10P11P12P13P14G0Wire, P0P1P2P3P4P5P6P7P8P9P10P11P12P13P14CinWire;

or C1or(C1, G0, POCinWire);or C2or(C2, G1, P1G0Wire, POP1CinWire);or C3or(C3, G2, P2G1Wire, P1P2G0Wire, POP1P2CinWire);or C4or(C4, G3, P3G2Wire, P2P3G1Wire, P1P2P3G0Wire, P0P1P2P3CinWire);or C5or(C5, G4, P4G3Wire, P3P4G2Wire, P2P3P4G1Wire, P1P2P3P4G0Wire, P0P1P2P3P4CinWire);or C6or(C6, G5, P5G4Wire, P4P5G3Wire, P3P4P5G2Wire, P2P3P4P5G1Wire, P1P2P3P4P5G0Wire, P0P1P2P3P4P5CinWire);or C7or(C7, G6, P6G5Wire, P5P6G4Wire, P4P5P6G3Wire, P3P4P5P6G2Wire, P2P3P4P5P6G1Wire, P1P2P3P4P5P6G0Wire, P0P1P2P3P4P5P6CinWire);or C8or(C8, G7, P7G6Wire, P6P7G5Wire, P5P6P7G4Wire, P4P5P6P7G3Wire, P3P4P5P6P7G2Wire, P2P3P4P5P6P7G1Wire, P1P2P3P4P5P6P7G0Wire, P0P1P2P3P4P5P6P7CinWire);or C9or(C9, G8, P8G7Wire, P7P8G6Wire, P6P7P8G5Wire, P5P6P7P8G4Wire, P4P5P6P7P8G3Wire, P3P4P5P6P7P8G2Wire, P2P3P4P5P6P7P8G1Wire, P1P2P3P4P5P6P7P8G0Wire, P0P1P2P3P4P5P6P7P8CinWire);or C10or(C10, G9, P9G8Wire, P8P9G7Wire, P7P8P9G6Wire, P6P7P8P9G5Wire, P5P6P7P8P9G4Wire, P4P5P6P7P8P9G3Wire, P3P4P5P6P7P8P9G2Wire, P2P3P4P5P6P7P8P9G1Wire, P1P2P3P4P5P6P7P8P9G0Wire, P0P1P2P3P4P5P6P7P8P9CinWire);or C11or(C11, G10, P10G9Wire, P9P10G8Wire, P8P9P10G7Wire, P7P8P9P10G6Wire, P6P7P8P9P10G5Wire, P5P6P7P8P9P10G4Wire, P4P5P6P7P8P9P10G3Wire, P3P4P5P6P7P8P9P10G2Wire, P2P3P4P5P6P7P8P9P10G1Wire, P1P2P3P4P5P6P7P8P9P10G0Wire, P0P1P2P3P4P5P6P7P8P9P10CinWire);or C12or(C12, G11, P11G10Wire, P10P11G9Wire, P9P10P11G8Wire, P8P9P10P11G7Wire, P7P8P9P10P11G6Wire, P6P7P8P9P10P11G5Wire, P5P6P7P8P9P10P11G4Wire, P4P5P6P7P8P9P10P11G3Wire, P3P4P5P6P7P8P9P10P11G2Wire, P2P3P4P5P6P7P8P9P10P11G1Wire, P1P2P3P4P5P6P7P8P9P10P11G0Wire, P0P1P2P3P4P5P6P7P8P9P10P11CinWire);or C13or(C13, G12, P12G11Wire, P11P12G10Wire, P10P11P12G9Wire, P9P10P11P12G8Wire, P8P9P10P11P12G7Wire, P7P8P9P10P11P12G6Wire, P6P7P8P9P10P11P12G5Wire, P5P6P7P8P9P10P11P12G4Wire, P4P5P6P7P8P9P10P11P12G3Wire, P3P4P5P6P7P8P9P10P11P12G2Wire, P2P3P4P5P6P7P8P9P10P11P12G1Wire, P1P2P3P4P5P6P7P8P9P10P11P12G0Wire, P0P1P2P3P4P5P6P7P8P9P10P11P12CinWire);or C14or(C14, G13, P13G12Wire, P12P13G11Wire, P11P12P13G10Wire, P10P11P12P13G9Wire, P9P10P11P12P13G8Wire, P8P9P10P11P12P13G7Wire, P7P8P9P10P11P12P13G6Wire, P6P7P8P9P10P11P12P13G5Wire, P5P6P7P8P9P10P11P12P13G4Wire, P4P5P6P7P8P9P10P11P12P13G3Wire, P3P4P5P6P7P8P9P10P11P12P13G2Wire, P2P3P4P5P6P7P8P9P10P11P12P13G1Wire, P1P2P3P4P5P6P7P8P9P10P11P12P13G0Wire, P0P1P2P3P4P5P6P7P8P9P10P11P12P13CinWire);or C15or(C15, G14, P14G13Wire, P13P14G12Wire, P12P13P14G11Wire, P11P12P13P14G10Wire, P10P11P12P13P14G9Wire, P9P10P11P12P13P14G8Wire, P8P9P10P11P12P13P14G7Wire, P7P8P9P10P11P12P13P14G6Wire, P6P7P8P9P10P11P12P13P14G5Wire, P5P6P7P8P9P10P11P12P13P14G4Wire, P4P5P6P7P8P9P10P11P12P13P14G3Wire, P3P4P5P6P7P8P9P10P11P12P13P14G2Wire,

39

P2P3P4P5P6P7P8P9P10P11P12P13P14G1Wire, P1P2P3P4P5P6P7P8P9P10P11P12P13P14G0Wire, P0P1P2P3P4P5P6P7P8P9P10P11P12P13P14CinWire);

and P0CinAnd(POCinWire, P0, Cin);

and P1G0And(P1G0Wire, P1, G0);and P0P1CinAnd(POP1CinWire, P0, P1, Cin);

and P2G1And(P2G1Wire, P2, G1);and P1P2G0And(P1P2G0Wire, P1, P2, G0);and P0P1P2CinAnd(POP1P2CinWire, P0, P1, P2, Cin);

and P3G2And(P3G2Wire, P3, G2);and P2P3G1And(P2P3G1Wire, P2, P3, G1);and P1P2P3G0And(P1P2P3G0Wire, P1, P2, P3, G0);and P0P1P2P3CinAnd(P0P1P2P3CinWire, P0, P1, P2, P3,Cin);

and P4G3And(P4G3Wire, P4, G3);and P3P4G2And(P3P4G2Wire, P3, P4, G2);and P2P3P4G1And(P2P3P4G1Wire, P2, P3, P4, G1);and P1P2P3P4G0And(P1P2P3P4G0Wire, P1, P2, P3, P4, G0);and P0P1P2P3P4CinAnd(P0P1P2P3P4CinWire, P0, P1, P2, P3, P4, Cin);

and P5G4And(P5G4Wire, P5, G4);and P4P5G3And(P4P5G3Wire, P4, P5, G3);and P3P4P5G2And(P3P4P5G2Wire, P3, P4, P5, G2);and P2P3P4P5G1And(P2P3P4P5G1Wire, P2,P3, P4, P5, G1);and P1P2P3P4P5G0And(P1P2P3P4P5G0Wire, P1, P2, P3, P4, P5, G0);and P0P1P2P3P4P5CinAnd(P0P1P2P3P4P5CinWire, P0, P1, P2, P3, P4, P5, Cin);

and P6G5And(P6G5Wire, P6, G5);and P5P6G4And(P5P6G4Wire, P5, P6, G4);and P4P5P6G3And(P4P5P6G3Wire, P4, P5, P6, G3);and P3P4P5P6G2And(P3P4P5P6G2Wire, P3, P4, P5, P6, G2);and P2P3P4P5P6G1And(P2P3P4P5P6G1Wire, P2,P3, P4, P5, P6, G1);and P1P2P3P4P5P6G0And(P1P2P3P4P5P6G0Wire, P1, P2, P3, P4, P5, P6, G0);and P0P1P2P3P4P5P6CinAnd(P0P1P2P3P4P5P6CinWire, P0, P1, P2, P3, P4, P5, P6, Cin);

and P7G6And(P7G6Wire, P7, G6);and P6P7G5And(P6P7G5Wire, P6, P7, G5);and P5P6P7G4And(P5P6P7G4Wire, P5, P6, P7, G4);and P4P5P6P7G3And(P4P5P6P7G3Wire, P4, P5, P6, P7, G3);and P3P4P5P6P7G2And(P3P4P5P6P7G2Wire, P3, P4, P5, P6, P7, G2);and P2P3P4P5P6P7G1And(P2P3P4P5P6P7G1Wire, P2, P3, P4, P5, P6, P7, G1);and P1P2P3P4P5P6P7G0And(P1P2P3P4P5P6P7G0Wire, P1,P2, P3, P4, P5, P6, P7, G0);and P0P1P2P3P4P5P6P7CinAnd(P0P1P2P3P4P5P6P7CinWire, P0, P1, P2, P3, P4, P5, P6, P7, Cin);

and P8G7And(P8G7Wire, P8, G7);and P7P8G6And(P7P8G6Wire, P7, P8, G6);and P6P7P8G5And(P6P7P8G5Wire, P6, P7, P8, G5);and P5P6P7P8G4And(P5P6P7P8G4Wire, P5, P6, P7, P8, G4);and P4P5P6P7P8G3And(P4P5P6P7P8G3Wire, P4, P5, P6, P7, P8, G3);and P3P4P5P6P7P8G2And(P3P4P5P6P7P8G2Wire, P3, P4, P5, P6, P7, P8, G2);and P2P3P4P5P6P7P8G1And(P2P3P4P5P6P7P8G1Wire, P2, P3, P4, P5, P6, P7, P8, G1);and P1P2P3P4P5P6P7P8G0And(P1P2P3P4P5P6P7P8G0Wire, P1,P2, P3, P4, P5, P6, P7, P8, G0);and P0P1P2P3P4P5P6P7P8CinAnd(P0P1P2P3P4P5P6P7P8CinWire, P0, P1, P2, P3, P4, P5, P6, P7, P8, Cin);

and P9G8And(P9G8Wire, P9, G8);and P8P9G7And(P8P9G7Wire, P8, P9, G7);and P7P8P9G6And(P7P8P9G6Wire, P7, P8, P9, G6);and P6P7P8P9G5And(P6P7P8P9G5Wire, P6, P7, P8, P9, G5);and P5P6P7P8P9G4And(P5P6P7P8P9G4Wire, P5, P6, P7, P8, P9, G4);and A1(P4P5P6P7P8P9G3Wire, P4, P5, P6, P7, P8, P9, G3);and A2( P3P4P5P6P7P8P9G2Wire, P3, P4, P5, P6, P7, P8, P9, G2);and A3(P2P3P4P5P6P7P8P9G1Wire, P2, P3, P4, P5, P6, P7, P8, P9, G1);and A4(P1P2P3P4P5P6P7P8P9G0Wire, P1,P2, P3, P4, P5, P6, P7, P8, P9, G0);and A5(P0P1P2P3P4P5P6P7P8P9CinWire, P0, P1, P2, P3, P4, P5, P6, P7, P8, P9, Cin);

and A6(P10G9Wire, P10, G9);

40

and A7(P9P10G8Wire, P9, P10, G8);and A8(P8P9P10G7Wire, P8, P9, P10, G7);and A9(P7P8P9P10G6Wire, P7, P8, P9, P10, G6);and B1(P6P7P8P9P10G5Wire, P6, P7, P8, P9, P10, G5);and B2(P5P6P7P8P9P10G4Wire, P5, P6, P7, P8, P9,P10, G4);and B3(P4P5P6P7P8P9P10G3Wire, P4, P5, P6, P7, P8, P9, P10, G3);and B4(P3P4P5P6P7P8P9P10G2Wire, P3, P4, P5, P6, P7, P8, P9, P10, G2);and B5(P2P3P4P5P6P7P8P9P10G1Wire, P2, P3, P4, P5, P6, P7, P8, P9, P10, G1);and B6(P1P2P3P4P5P6P7P8P9P10G0Wire, P1,P2, P3, P4, P5, P6, P7, P8, P9, P10, G0);and B7(P0P1P2P3P4P5P6P7P8P9P10CinWire, P0, P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, Cin);

and B8(P11G10Wire, P11, G10);and B9(P10P11G9Wire, P10, P11, G9);and Z1(P9P10P11G8Wire, P9, P10, P11, G8);and Z2(P8P9P10P11G7Wire, P8, P9, P10, P11,G7);and Z3(P7P8P9P10P11G6Wire, P7, P8, P9, P10, P11,G6);and Z4(P6P7P8P9P10P11G5Wire, P6, P7, P8, P9, P10, P11,G5);and Z5(P5P6P7P8P9P10P11G4Wire, P5, P6, P7, P8, P9,P10, P11, G4);and Z6(P4P5P6P7P8P9P10P11G3Wire, P4, P5, P6, P7, P8, P9, P10,P11, G3);and Z7(P3P4P5P6P7P8P9P10P11G2Wire, P3, P4, P5, P6, P7, P8, P9, P10, P11,G2);and Z8(P2P3P4P5P6P7P8P9P10P11G1Wire, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,G1);and Z9(P1P2P3P4P5P6P7P8P9P10P11G0Wire, P1,P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,G0);and Z10(P0P1P2P3P4P5P6P7P8P9P10P11CinWire,P0, P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,Cin);

and D1(P12G11Wire, P12, G11);and D2(P11P12G10Wire, P11, P12, G10);and D3(P10P11P12G9Wire, P10, P11, P12, G9);and D4(P9P10P11P12G8Wire, P9, P10, P11, P12, G8);and D5(P8P9P10P11P12G7Wire, P8, P9, P10, P11,P12, G7);and D6(P7P8P9P10P11P12G6Wire, P7, P8, P9, P10, P11,P12, G6);and D7(P6P7P8P9P10P11P12G5Wire, P6, P7, P8, P9, P10, P11,P12, G5);and D8(P5P6P7P8P9P10P11P12G4Wire, P5, P6, P7, P8, P9,P10, P11, P12, G4);and D9(P4P5P6P7P8P9P10P11P12G3Wire, P4, P5, P6, P7, P8, P9, P10,P11, P12, G3);and D10(P3P4P5P6P7P8P9P10P11P12G2Wire, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, G2);and D11(P2P3P4P5P6P7P8P9P10P11P12G1Wire, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, G1);and D12(P1P2P3P4P5P6P7P8P9P10P11P12G0Wire, P1,P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, G0);and D13(P0P1P2P3P4P5P6P7P8P9P10P11P12CinWire, P0, P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, Cin);

and E1(P13G12Wire, P13, G12);and E2(P12P13G11Wire, P12, P13, G11);and E3(P11P12P13G10Wire, P11, P12, P13, G10);and E4(P10P11P12P13G9Wire, P10, P11, P12, P13, G9);and E5(P9P10P11P12P13G8Wire, P9, P10, P11, P12, P13, G8);and E6(P8P9P10P11P12P13G7Wire, P8, P9, P10, P11,P12, P13, G7);and E7(P7P8P9P10P11P12P13G6Wire, P7, P8, P9, P10, P11,P12, P13, G6);and E8(P6P7P8P9P10P11P12P13G5Wire, P6, P7, P8, P9, P10, P11,P12, P13, G5);and E9(P5P6P7P8P9P10P11P12P13G4Wire, P5, P6, P7, P8, P9,P10, P11, P12, P13, G4);and E10(P4P5P6P7P8P9P10P11P12P13G3Wire, P4, P5, P6, P7, P8, P9, P10,P11, P12, P13, G3);and E11(P3P4P5P6P7P8P9P10P11P12P13G2Wire, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, P13, G2);and E12(P2P3P4P5P6P7P8P9P10P11P12P13G1Wire, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, P13, G1);and E13(P1P2P3P4P5P6P7P8P9P10P11P12P13G0Wire, P1,P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, P13, G0);and E14(P0P1P2P3P4P5P6P7P8P9P10P11P12P13CinWire,P0, P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, P13, Cin);

and F1(P14G13Wire, P14, G13);and F2(P13P14G12Wire, P13, P14, G12);and F3(P12P13P14G11Wire, P12, P13, P14, G11);and F4(P11P12P13P14G10Wire, P11, P12, P13, P14, G10);and F5(P10P11P12P13P14G9Wire, P10, P11, P12, P13, P14, G9);and F6(P9P10P11P12P13P14G8Wire, P9, P10, P11, P12, P13, P14, G8);and F7(P8P9P10P11P12P13P14G7Wire, P8, P9, P10, P11,P12, P13, P14, G7);and F8(P7P8P9P10P11P12P13P14G6Wire, P7, P8, P9, P10, P11,P12, P13, P14, G6);and F9(P6P7P8P9P10P11P12P13P14G5Wire, P6, P7, P8, P9, P10, P11,P12, P13, P14, G5);and F10(P5P6P7P8P9P10P11P12P13P14G4Wire, P5, P6, P7, P8, P9,P10, P11, P12, P13, P14, G4);and F11(P4P5P6P7P8P9P10P11P12P13P14G3Wire, P4, P5, P6, P7, P8, P9, P10,P11, P12, P13, P14, G3);and F12(P3P4P5P6P7P8P9P10P11P12P13P14G2Wire, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, P13, P14, G2);and F13(P2P3P4P5P6P7P8P9P10P11P12P13P14G1Wire, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, P13, P14, G1);and F14(P1P2P3P4P5P6P7P8P9P10P11P12P13P14G0Wire, P1,P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, P13, P14, G0);and F15(P0P1P2P3P4P5P6P7P8P9P10P11P12P13P14CinWire,P0, P1, P2, P3, P4, P5, P6, P7, P8, P9, P10, P11,P12, P13, P14, Cin);

endmodule******************************************************************************************

41

***************************Shift By Eight Logic**********************************************************module shiftByEight(shiftOut, setInputs, clk, set, count);

input [31:0] setInputs;input clk;input set;

integer choiceInput;output [31:0] shiftOut;output [6:0] count;reg [6:0]count;

alwaysbegin

if (set)begin

choiceInput = setInputs;

end

elsechoiceInput = shiftOut;

endshiftByEightReg myReg(shiftOut, choiceInput, clk);

always @ (posedge clk)beginif (set)

count = 1;else

count = count + 1;end

endmodule

module shiftByEightReg(outBuss, inBuss, clk);input [31:0] inBuss;input clk;output [31:0] outBuss;

integer i;reg [31:0] outBuss;

always @ (posedge clk)

begin

outBuss[0] = 1'b0;outBuss[1] = 1'b0;outBuss[2] = 1'b0;outBuss[3] = 1'b0;outBuss[4] = 1'b0;outBuss[5] = 1'b0;outBuss[6] = 1'b0;outBuss[7] = 1'b0;

for (i = 8;i<32;i = i+1)beginoutBuss[i] = inBuss[i-8];end

end

endmodule******************************************************************************************

42

****************************Shift 8 bits only *************************************************module shift8Only(outBuss, inBuss, clk);input [31:0] inBuss;input clk;output [31:0] outBuss;

integer i;reg [31:0] outBuss;

always @ (posedge clk)

begin

outBuss[0] = 1'b0;

for (i = 1; i<9; i = i+1)beginoutBuss[i] = inBuss[i-1];endfor (i = 9; i < 32; i = i + 1)outBuss[i] = 1'b0;

end

endmodule******************************************************************************************

43

Appendix CC Code

#define COUNT_BINARY_C

#include "count_binary.h"#include "adder_perif.h"/* A "accumulator" variable. */static alt_u8 count;

//outputs the contents of the accumulator to the LED bankstatic void count_led(){#ifdef LED_PIO_BASE IOWR_ALTERA_AVALON_PIO_DATA(LED_PIO_BASE, count);#endif}

int main(void){ /* Declare variables used only in main() */int i;int wait_time;count = 0;

while( 1 ) {

IOWR_PAUSE();

if (IORD_ADD1())IOWR_ADDEND1(SHORT_ADDER_0_BASE, 0x01);

else if (IORD_ADD2())IOWR_ADDEND1(SHORT_ADDER_0_BASE, 0x02);



IOWR_ADDEND2(SHORT_ADDER_0_BASE, count);

count = IORD_SUM(SHORT_ADDER_0_BASE);

count_led(); IOWR_UNPAUSE();

} return(0);}

#endif

/******************************************************************************* ** Copyright (c) 2005 John Martin and Ryan Conrad Enterprizes (CVN 68's) *******************************************************************************/

#ifndef __ADD_PERIF_H_#define __ADD_PERIF_H_

#include <io.h>

44

#define IOWR_ADDEND1(base, data) IOWR(base, 0, data)#define ADDEND1_MSK (0xFF)#define ADDEND1_OFST (0)

#define IOWR_ADDEND2(base, data) IOWR(base, 1, data)#define ADDEND2_MSK (0xFF)#define ADDEND2_OFST (0)

#define IORD_SUM(base) IORD(base, 2) #define SUM_MSK (0xFF)#define SUM_OFST (0)

#endif /* __ADD_PERIF_H__ */

45

ALTERA User’s Guide:

The following information is intended to allow the reader to more quickly familiarize himself with the Altera software packages. These tools are powerful, and can help you to make a great design relatively quickly once you are familiar with them and can filter out the unnecessary garbage.

First, understand that the Altera FPGA is an expensive powerhouse of electronics. The chip runs at about $450, and the development kit approaches $1000. upgraded chips are available at up to $3500. This is not something you will likely be able to make use of at your home hobby electronics bench. If you are looking for something cheaper, try Cypress Electronics PSOC. The chips cost $3 to $20, and a development board comes to $65.

Second, plan on investing some significant start-up time to run through several tutorials before implementing your own design. Take the tutorials in the order listed below as needed.

1st look at the getting started user guide for your development board, available at atera.com/literature, follow the link to development boards, then your specific board. This will make sure you get all the peripherals working together properly. The LCD display will not work properly if the FLASH disk is inserted into the slot. Remove it and restart the FPGA to see a proper display.

Next, take the Quartus 5.0 tutorial located in the help menu of the Quartus software environment. This is an excellent tutorial to help you understand the quartus software, and some of its most useful tools.

I would suggest programming something of your own at this point, to get used to the quartus environment, and try some of the things you learned in the tutorial. The following resources will help you to make the most of your design:

Nios development Board Reference Manual (your FPGA edition)This can be found in the same place as the getting started user guide. It has information

on the various peripherals available on the development board, including pin-outs from the processor to/from each peripheral. This is an indispensable resource.

Quartus II HandboookThis can be found at Altera.com/literature and follow the link to ‘Quartus II’. It contains

information on everything the software can do. If you need more information on how to implement a given feature, this is your reference manual.

After you create a new project, DO THIS FIRST: go to the assignments menu, click settings, go to the general section on the left, click on Device, click on Device and Pin Options, click on Unused Pins, select As Inputs, Tri-Stated, click OK. If you fail to do this, any design you attempt to program to your FPGA will, upon writing to the FPGA, immediately reset itself and go back to the default program. This is because some unused random pin somewhere outputs a logic low to the reset module on the board, and resets the board and the FPGA.

46

We suggest starting with an incredibly simple part of your design, and build iteratively. Save often, and keep back-ups of entire projects, not just files. Sometimes deleting a file deletes associated libraries as well (Nios II). We had to re-load the entire Quartus and Nios packages twice because a system library was deleted. These programs are all very memory-intensive, and can slow down a computer to a crawl. Only once did our computer freeze, but our projects were fairly simple.

Compiling can take a very long time. Use the iterative design options so that the programs will only change the necessary portions of your compiled design.

If you plan on using a processor as part of your design (and if you aren’t, you probably don’t need this FPGA…), and you want to use the Altera Nios Processor, take the following tutorials after familiarizing yourself with Quartus II 5.0:

Nios II Hardware Development TutorialThis can be found at altera.com/literature, and follow the link to Nios II processor. This

tutorial runs through the process of designing and instantiating a processor and other peripherals using SOPC Builder, a powerful System on a chip development package.

Nios Software Development TutorialThe only place I can find this tutorial is by searching Altera.com or as a link in the

getting started user’s guide. It will help you learn how to use the Nios II IDE for writing code that will run on your custom processor.

Also, there is a section under the SOPC section (Volume 4) of the Quartus II handbook that will show you how to create a custom peripheral you can add to your SOPC. The example design is a PWM circuit and is very informative. If you are planning on writing your own peripheral, it will require some picking apart of the software files associated with this tutorial.

And that’s about it. Hopefully, by using this information, you will be able to get to actually working on your project weeks sooner than we did. I know I wish I had all this information before I started….

47

Signed Statement of participation · Web viewThe latter frequency is based on the lowest operating...

Documents

Transcript of Signed Statement of participation · Web viewThe latter frequency is based on the lowest operating...