Project Report Ashish

103
***** DESIGN OF 8-BIT RISC PROCESSOR ***** A Project Report submitted in partial fulfilment for the award of the Degree of Bachelor of Technology in Department of electronic & communication Engineering by ***** ASHISH TOMAR***** ***** Enrolment No.-JNU08BTEC013 ***** Under the supervision of ***** ARPAN SHAH ***** ***** Designation ***** Department of electronic & communication Engineering Jagan Nath University 1

Transcript of Project Report Ashish

Page 1: Project Report Ashish

*****DESIGN OF 8-BIT RISC PROCESSOR*****

A

Project Report

submitted

in partial fulfilment

for the award of the Degree of

Bachelor of Technology

in Department of electronic & communication Engineering

by

***** ASHISH TOMAR*****

***** Enrolment No.-JNU08BTEC013 *****

Under the supervision of

***** ARPAN SHAH *****

***** Designation *****

Department of electronic & communication Engineering

Jagan Nath University

Jaipur

May 2012

1

Page 2: Project Report Ashish

Candidate Declaration

I, ASHSIH TOMAR .hereby declare that the work presented in this report entitled “

8 BIT RISC MICROPROCESSOR” in partial fulfillment of the requirements for

the award of Degree of Bachelor of Technology, submitted in the Department of

ELECTRONIC & COMMUNICATION at Jagan Nath University, Jaipur, is an

authentic record of my own work under the supervision of ARPAN SHAH

I also declare that the work embodied in the present project report is my original

work/extension of the existing work and has not been copied from any

Journal/thesis/book, and has not been submitted by me for any other

Degree/Diploma.

(Name & Signature of Candidate)

Enrolment No.: JNU08BTEC013

Date: 29TH MAY 2012

2

Page 3: Project Report Ashish

Certificate of the Supervisor(s)

This is to certify that the project report entitled"8 BIT RISC

MICROPROCESSOR” submitted by ASHISH TOMAR for the award of

Degree of Bachelor of Technology in the Department of ELECTRONIC &

COMMUNICATION of Jagan Nath University, Jaipur, is a record of authentic

work carried out by him/her under my/our supervision.

The matter embodied in this project report is the original work of the candidate and

has not been submitted for the award of any other degree or diploma. It is further

certified that he/she has worked with me/us for the required period in the

Department of ELCTRONIC & COMMUNICATION, Jagan Nath University,

Jaipur.

(Name and Signature of Supervisor)

Date:………………………………….

3

Page 4: Project Report Ashish

Acknowledgements

I would like to express my sincere gratitude to my project guide “ARPAN SHAH” for giving me the opportunity to work on this topic. It would never be possible for us to take this project to this level without his innovative ideas and his relentless support and encouragement.

Name of Student(s):ASHISH TOMAR(Roll Number):-0802BTEC016

4

Page 5: Project Report Ashish

Abstract

Field Programmable Gate Array (FPGA) devices offer a large set of advantages due to

their reconfigurable nature. Although their performance is not comparable to ASIC devices, their

flexibility is usually more important especially when fast time-to-market is an issue and the

production is on small scale basis. For that reason they are widely used in electronic applications

both during prototyping but also for final-production systems. Processors are the most demanding

when is comes to flexibility, cost and time to market.

RISC (Reduced Instruction Set Computer) are machines that have fixed size instructions,

that can execute in one clock, and instructions interface with memory via fixed mechanism.

There are only a small number of primitive instructions. RISC is based on using many simpler

and faster instructions to do the same work as a single complicated instruction on CISC

(Complex Instruction Set Computer) machine.

The aim of this project is the design of a 8-bit RISC processor for FPGA implementation.

The Processor can execute 14 instructions, including 2 memory access operations. Verilog is

chosen HDL for design entry. Xilinx Web Pack -ISE generates the programming file for the

target device, SPARTAN -3.

5

Page 6: Project Report Ashish

INDEX

1. INTRODUCTION

1.1. Reduced Instruction Set Computers

…………………………………..1

1.2. Field Programmable Gate Array

1.2.1. Look Up Tables……………………………………………………

4

1.2.2. Programmable Logic

Array……………………………………...4

1.2.3. Programmable Array

Logic……………………………………...4

1.2.4. FPGA……………………………………………………………….5

1.2.5. Spartan-

3…………………………………………………………...7

1.3. Hardware Description Languages

1.3.1. Importance of

HDLs………………………………………………8

1.3.2. Verilog HDL……………………………………………………….8

2. FUNCTIONAL DESCRIPTION

2.1. Block Diagram……………………………………………………………

9

2.2. Specifications……………………………………………………………..

9

2.3. Instructions

6

Page 7: Project Report Ashish

2.3.1. Move

Instructions………………………………………………..11

2.3.2. Arithmetic

Instructions………………………………………….11

2.3.3. Jump

Instructions………………………………………………..13

2.3.4. Memory Access

Instructions……………………………………14

2.4. Targeted Performance Parameters…………………….

……………...14

3. DESIGN ARCHITECTURE

3.1. Instruction Set Architecture

3.1.1. Instruction Format………………………………………………

15

3.1.2. Source/Destination

Format…………………………………….16

3.1.3. Instruction Examples……………………………………………

17

3.2. Modular

Design………………………………………………………..18

3.3. Top Level Entity

3.3.1. Block

Diagram…………………………………………………...19

3.3.2. Ports

Description………………………………………………..19

7

Page 8: Project Report Ashish

3.3.3. Architecture……………………………………………………...2

2

3.3.4. Source Register

Selection………………………………………23

3.3.5. Memory Access Operations……………………………………

23

3.3.6. Data Bus..............

………………………………………………...23

3.3.7. Destination Decoder……………………………………………

24

3.3.8. Output Port

Xout……………………………………………….24

3.4. Move Unit……………………………………………………………..25

3.5. Shift Unit………………………………………………………………26

3.6. Arithmetic Unit

3.6.1. Block

Diagram…………………………………………………..27

3.6.2. Ports

Description………………………………………………..28

3.6.3. Architecture……………………………………………………..2

9

3.6.4. Functionality…………………………………………………….3

1

3.6.5. Flags……………………………………………………………...32

3.7. Program

Counter……………………………………………………..32

3.8. Instruction

Register…………………………………………………..34

8

Page 9: Project Report Ashish

3.9. Instruction

Decoder…………………………………………………..34

3.10. Control Unit………………………………………………………….36

3.11. Data Memory………………………………………………………...38

3.12. Program Memory…………………………………………………....38

4. DESIGN IMPLEMENTATION

4.1. HDL Entry……………………………………………………………..39

4.2. Functional

Simulation………………………………………………..40

4.3. Synthesis……………………………………………………………….41

4.3.1. Synthesis

Constraints…………………………………………..41

4.3.2. Synthesis

Report………………………………………………..42

4.4. Translate

4.4.1. NGD Build Overview………………………………………….43

4.4.2. Conversion of Netlist to

NGD………………………………...43

4.5. MAP

4.5.1. MAP Input Files………………………………………………..44

4.5.2. MAP Output Files……………………………………………...45

4.5.3. MAP Report…………………………………………………….46

4.5.4. Post MAP Timing

Report……………………………………...46

4.6. Place & Route

4.6.1. Overview………………………………………………………..49

4.6.2. Placing…………………………………………………………...50

9

Page 10: Project Report Ashish

4.6.3. Routing………………………………………………………….50

4.6.4. Post PAR Timing Report………………………………………

50

4.7. BitGen

Overview……………………………………………………...53

5. SIMULATION RESULTS…………………………….…….…...….54

6. CONCLUSION

6.1. Performance Parameters…………………………………….

….......58

6.2. Future Improvements…………………………………………….…

58

APPENDIX A – RTL CODING

A.1. Move Unit………………………………………………………....…59

A.2. Shift Unit……………………………………………………….….…59

A.3. Arithmetic Unit…………………………………………………..….59

A.4. Program Counter………………………………………………..…..60

A.5. Instruction Register…………………………………………….…...61

A.6. Instruction Decoder………………………………………………...62

A.7. Control Unit………………………………………………………....63

A.8. Main Processor Unit………………………………………………..64

APPENDIX B – INSTRUCTION SET………………………………..68

10

Page 11: Project Report Ashish

11

Page 12: Project Report Ashish

8 BIT RISC MICROPROCESSOR ARCHITECHTURE

12

Page 13: Project Report Ashish

1. INTRODUCTION

1.1 REDUCED INSTRUCTION SET COMPUTER (RISC)

An important factor in computer design prior to 1980 was that all memories,

including the memory to store program instructions, were very expensive. So

if you were a computer designer, you would want to make each of the

instructions you design to be short but powerful. That way, when

programmers write programs using your instructions, their code will be dense

and will require little memory, but each bit of code would do a lot of work.

This would in a bunch of instructions of different lengths. Finally, you would

also end up with a very rich collection of instructions that can interface with

the computer’s data memory in many different ways: either dealing directly

with the data memory, or demanding that data first be stored into temporary

locations (“registers”) first, or some mix of the two.

Now because of this rich, powerful, and variable-length group of (compact)

instructions you’ve designed, the computer would have several

characteristics. First, each instruction might take several clock cycles to

complete. That’s because each instruction would be of a different size, so

figuring out what each one says is complicated; because each instruction

could talk to memory in a different way; and because each instruction could

potentially do a lot of work. Second, and for the same reasons just given, the

computer speed might be fairly slow.

But as time passed, memory became cheaper, compilers got better, and

13

Page 14: Project Report Ashish

the motivation for making small but really powerful instructions faded. In

1980, Patterson and Ditzel at Berkeley argued in favor of a different

architecture having simple instructions, all of uniform length and that simpler

operations. Sure, you’d need to specify more of these simpler instructions to

equal one of the old-style complicated instructions, and yes, this takes more

instruction memory, but memory is cheap, and your computer can run faster

and take fewer clocks.

For example, say you had a complicated instruction called “MUL” that told

the computer to take two pieces of data from memory and multiply their sum

with a third piece of data and put the result back somewhere else. This one

instruction might take 10 clock cycles to complete. Now suppose we had a

simple instruction set. To do the same work as “MUL” did, we’d need perhaps

8 different instructions (a few loads, an add, a multiply, a store, etc.). But

each instruction completes in a single clock cycle because each is so simple.

And mybe the computer’s clock can run much faster, too. The downside of

the simple system, of course, is that it requires you to store 8 times as many

instructions.

A Comparison:

• Complicated system does “MUL”:

• 1 instruction x 10 clocks/instr x 10 nsecond/clock = 100ns

• Simple system does the same work as “MUL”:

• 8 instructions x 1 clock/instr x 9 nseconds/clock = 72ns

Three systems based on this idea were built in the early 80’s: the Berkeley

machines RISC-I and RISC-II, the Stanford MIPS processor [2], and the IBM

801 [3]. Based on comparisons between these machines and what came

before, some characteristics commonly associated with RISC and CISC arose.

14

Page 15: Project Report Ashish

Reduced Instruction Set Computer (RISC) is based on using many

simpler and faster instructions to do the same work as a single

complicated instruction on a Complex Instruction Set Computer

(CISC).

RISC machines are machines that have

• Instructions execute in one clock

• Instructions of a fixed size

• Instructions interface with memory via fixed mechanism

• A small number of primitive instructions

• Pipelining, a way to do more than one instruction at a time.

15

Page 16: Project Report Ashish

1.2. FIELD PROGRAMMABLE GATE ARRAYS (FPGA)

There is a better way to implement a logic function than to hook together

discrete 74XX packages. One can use semiconductor memory, integrated

circuits known as “Programmable Logic Devices” or get a Custom made IC to

implement logic.

1.2.1 LOOK UP TABLES (MEMORY)

To implement N functions of some K variables, we need a memory with 2K

locations and N bits per location (use one address line for each variable, use

data out line for each function). Thus Memory is not efficient at implementing

functions with lots of input variables or multiple functions with different

inputs.

1.2.2 PROGRAMMABLE LOGIC ARRAY (PLA)

PLA was the first device used specially for implementing logic circuits,

introduced in the early 1079s by Philips; the array consists of 2 levels of logic

gates, a programmable “wired” AND-plane followed by a programmable

“wired” OR-plane. It is designed to implement random logic expression in

SOP form. PLAs are difficult to manufacture, because of 2 levels of

configurable logic. Further this introduces significant propagation delay.

1.2.3 PROGRAMMABLE ARRAY LOGIC (PLA)

To overcome the problems of PLA, PAL devices were developed. It has

single level of programmability. It is programmable “wired” AND-plane and

fixed OR-plane. In PLA, Logic is represented in SOP form. The number of

products in a SOP from will be limited to a fixed number. The number of

variables in each product term limited by number of input pins. The numbers

of independent functions are limited by number of output pins.

16

Page 17: Project Report Ashish

1.2.4 FIELD PROGRAMMABLE GATE ARRAYS

A Field Programmable Gate Array or FPGA is a semiconductor device con-

taining programmable logic components and programmable interconnects.

The programmable logic components can be programmed to duplicate the

functionality of basic logic gates such as AND, OR, XOR, NOT or more com-

plex combinatorial functions such as decoders or simple math functions. In

most FPGAs, these programmable logic components (or logic blocks, in FPGA

parlance) also include memory elements, which may be simple flip-flops or

more complete blocks of memories.

A hierarchy of programmable interconnects allows the logic blocks of an

FPGA to be interconnected as needed by the system designer, somewhat like

a one-chip programmable breadboard. These logic blocks and interconnects

can be programmed after the manufacturing process by the customer/de-

signer (hence the term "field programmable") so that the FPGA can perform

whatever logical function is needed.

FPGAs are generally slower than their application-specific integrated circuit

(ASIC) counterparts, can't handle as complex a design, and draw more

power. However, they have several advantages such as a shorter time to

market, ability to re-program in the field to fix bugs, and lower non-recurring

engineering costs.

The historical roots of FPGAs are in complex programmable logic devices

(CPLDs). CPLD logic gate densities range from the equivalent of several thou-

sand to tens of thousands of logic gates, while FPGAs typically range from

tens of thousands to several million. The primary differences between CPLDs

and FPGAs are architectural. A CPLD has a somewhat restrictive structure

consisting of one or more programmable SOP logic arrays feeding a relatively

17

Page 18: Project Report Ashish

small number of clocked registers. The result of this is less flexibility, with the

advantage of more predictable timing delays and a higher logic to intercon-

nect ratio. The FPGA architectures, on the other hand, are dominated by in-

terconnect. This makes them far more flexible, but also far more complex to

design for.

Another notable difference between CPLDs and FPGAs is the presence in

most FPGAs of higher-level embedded functions (such as adders and multipli-

ers) and embedded memories. A related, important difference is that many

modern FPGAs support partial in-system reconfiguration, allowing their de-

signs to be changed "on the fly" either for system upgrades or for dynamic

reconfiguration.

A recent trend has been to take the architectural approach a step further

by combining the logic blocks and interconnects of traditional FPGAs with em-

bedded microprocessors and related peripherals to form complete "systems

on a programmable chip". Examples of such hybrid technologies can be

found in the Xilinx Virtex-II PRO and Virtex-4 devices, which include one or

more PowerPC processors embedded within the FPGA's logic fabric. An alter-

nate approach is to make use of "soft" processor cores that are implemented

within the FPGA logic. These cores include the Xilinx MicroBlaze and Pi-

coBlaze, and the Altera Nios and Nios II processors, as well as third-party

processor cores.

Applications of FPGAs include DSP, software-defined radio, aerospace and

defense systems, ASIC prototyping, medical imaging, computer vision,

speech recognition, cryptography, bioinformatics, computer hardware emula-

tion and a growing range of other areas. As their size, capabilities and speed

increased they began to take over larger and larger functions to the state

where they are now marketed as competitors for full systems on chips. They

now find applications in any area or algorithm that can make use of the mas-

sive parallelism offered by their architecture.

18

Page 19: Project Report Ashish

To define the behavior of the FPGA the user provides a hardware descrip-

tion language (HDL) or a schematic design. Common HDLs are VHDL and Ver-

ilog. Then, using an electronic design automation tool, a technology-mapped

netlist is generated. The netlist can then be fitted to the actual FPGA archi-

tecture using a process called place-and-route, usually performed by the

FPGA Company’s proprietary place-and-route software. The user will validate

the map, place and route results via timing analysis, simulation, and other

verification methodologies. Once the design and validation process is com-

plete, the binary file generated (also using the FPGA company's proprietary

software) is used to (re)configure the FPGA device. To simplify the design of

complex systems in FPGAs, there exist libraries of predefined complex func-

tions and circuits that have been tested and optimized to speed up the de-

sign process. These predefined circuits are commonly called IP cores, and are

available from FPGA vendors and third-party IP suppliers. In a typical design

flow, an FPGA application developer will simulate the design at multiple

stages throughout the design process. Initially the RTL description in VHDL or

Verilog is simulated by creating test benches to stimulate the system and ob-

serve results. Then, after the synthesis engine has mapped the design to a

netlist, the netlist is translated to a gate level description where simulation is

repeated to confirm the synthesis proceeded without errors. Finally the de-

sign is laid out in the FPGA at which point propagation delays can be added

and the simulation run again with these values back annotated onto the

netlist.

1.2.5 SPARTAN – 3

The Spartan-3 families of FPGA offer densities ranging from 50,000 to five

million system gates. Spartan-3 FPGAs are ideally suited to a wide range of

consumer electronics applications, including broadband access, home

networking, display/projection & digital television equipment, because of

their exceptionally low cost.

19

Page 20: Project Report Ashish

Features:

- Up to 784 I/O pins

- 622 Mb/s data transfer rate per I/O

- Signal swing ranging from 1.14V to 3.45V

- Double Data Rate (DDR) support

- DDR, DDR2 SDRAM support up to 333 Mbps

1.3 HARDWARE DESCRIPTION LANGUAGE – VERILOG

The HDLs allow designers to model the concurrency of processes found in

hardware elements. HDLs such as Verilog HDL and VHDL became very

popular.

1.3.1. IMPORTANCE OF HDLs

HDLs have many advantages compared to traditional schematic-based

design.

Design can be described at a very abstract level by use of HDLs.

Functional Verification of the design can be done early in the design

cycle.

A textual description with comments is an easier way to develop and

debug circuits.

1.3.2. Verilog HDL

Verilog HDL has evolved as a standard hardware description language.

Verilog HDL offers many useful features for the hardware design.

Verilog is easy to learn and use. It is similar in syntax to the C

programming language.

Allows different levels of abstraction to be mixed in the same model.

Most popular synthesis tools support Verilog HDL.

20

Page 21: Project Report Ashish

2. FUNCTIONAL DESCRIPTION

This chapter gives the detailed information about the functionality of the

design and the implementation constraints.

2.1. BLOCK DIAGRAM

Fig 2.1 Functional Block Diagram

2.2. SPECIFICATIONS

21

Page 22: Project Report Ashish

The following instructions have to be implemented:

1. MOV dst, src -- dst <= src

2. INC dst, src -- dst <= src + 1

3. DEC dst, src -- dst <= src - 1

4. ADD src -- src <= src + A

5. SUB src -- src <= src - A

6. SL dst, src -- dst <= shift left src

7. SR dst, src -- dst <= shift right src

8. CMP src -- set Z flag if src = A

9. MVI A, immediate -- A <= immediate data

10. LOAD dst -- dst <= memory contents at -- address [CD]

11. STORE src -- memory at [CD] <= src

12. JMP immediate_offset -- jump to PC + imm_offset

13. JZ immediate_offset -- jump to PC + imm_offset if Z=1

14. JMPCD -- jump to address pointed by [CD]

Src, dst can be either A, B, C, D or X. PC is the program counter. [CD]

represents the contents of register C and D after concatenation. D is the least

significant byte.

A, B, C and D are 8-bit registers.

X is 8 bits wide Input and Output port.

X is visible at the periphery as "X In" and "X Out" as I/O ports. When anything

is assigned to X, it will appear at "X Out". When X is read, the contents at "X

In" will be used.

Z flag is set whenever the result of any operation is zero. C flag is set

whenever the result of any arithmetic operation results in a carry. S flag is

22

Page 23: Project Report Ashish

set whenever the result of any arithmetic operation results in a negative

number.

It is assumed that the program memory and the data memory have

synchronous writes and asynchronous reads.

Write operation: On a clock edge when the WR is asserted the data on the

data bus is written into the location pointed by address.

Read operation: When the RD is asserted, the contents of the location

pointed by address will be presented at the data bus by the memory. When

RD is de-asserted the memory will stop driving the bus.

For the sake of simplicity, it is assumed that both the memories are fast

enough to complete the read and write operations in one clock.

2.3. INSTRUCTIONS

2.3.1. MOVE INSTRUCTIONS

There are two move instructions

2.3.1.1. Move

INSTRUCTION: MOV dst, src

This instruction copies the 8-bit data from the source register to the

destination register. Destination & Source can be registers A/B/C/D or the

input-output port X

2.3.1.2. Move Immediate Data

INSTRUCTION: MVI, immediate data

This instruction moves the 8-bit data which is a part of the instruction

itself, to the register A.

23

Page 24: Project Report Ashish

2.3.2. ARITHMETIC INSTRUCTIONS

There are 5 arithmetic instructions

2.3.2.1. Increment

INCINSTRUCTION: INC dst, src

This instruction retrieves the 8-bit data from the source register/port,

increments it by 1 and stores in the destination register/port. The contents

of source register remain unchanged.

2.3.2.2. Decrement

INSTRUCTION: DEC dst, src

This instruction retrieves the 8-bit data from the source register/port,

decrements it by 1 and stores in the destination register/port. The

contents of source register remain unchanged.

2.3.2.3. Addition

INSTRUCTION: ADD src

This instruction retrieves the 8-bit data from the source register/port,

increments it by the contents of register A, and stores the result back in

the source register/port.

2.3.2.4. Subtraction

INSTRUCTION: SUB src

This instruction retrieves the 8-bit data from the source register/port,

decrements it by the contents of register A and stores the result back in

the source register/port.

2.3.2.5. Compare

INSTRUCTION: CMP src

24

Page 25: Project Report Ashish

This instruction retrieves the 8-bit data from the source register/port,

compares it with the contents of register A, and sets Z flag high if both are

equal. This instruction does not modify the contents of the source

register/port.

2.3.2.6. Shift Left

INSTRUCTION: SL dst, src

This instruction retrieves the 8-bit data from the source register/port and

left shifts the data by 1-bit and stores the result in destination

register/port. This instruction does not modify the contents of the source

register/port.

2.3.2.7. Shift Right

INSTRUCTION: SR dst, src

This instruction retrieves the 8-bit data from the source register/port and

right shifts the data by 1-bit and stores the result in destination

register/port. This instruction does not modify the contents of the source

register/port.

2.3.3. JUMP INSTRUCTIONS

The jump instructions are used to modify the sequence of instruction

execution, by changing the value of program counter. The processor can

execute three kinds of jump instructions.

2.3.3.1. Jump by immediate offset

INSTRUCTION: JMP immediate_offset

The value of the program counter is incremented by the value given as

the immediate data. Immediate data is a part of the instruction itself.

2.3.3.2. Jump by immediate offset if Z flag is Set

INSTRUCTION: JZ immediate_offset

25

Page 26: Project Report Ashish

The value of the program counter is incremented by the value given as

the immediate data, if the Z flag is high. Immediate data is a part of the

instruction itself. If the Z flag is not set, then the program counter will

increment by 1 as in other instructions.

2.3.3.3. Direct Jump

INSTRUCTION: JMPCD

The value of the program counter is changed to the address pointed by

the concatenation of the contents of the register C and D.

2.3.4. MEMORY ACCESS OPERATOINS

The processor can execute 2 memory access instructions.

2.3.4.1. Load Data

INSTRUCTION: LOAD dst

This instruction loads the destination register/port with 8-bit data retrieved

from the Data Memory. The 16-Bit address of the data memory, from

which data is retrieved, is given by the concatenation of the contents at

registers C and D.

2.3.4.2. Store Data

INSTRUCTION: STORE src

This instruction stores data memory with the 8-bit data of the source

register. The address of the data memory where the contents of source is

stored is given by the concatenation of the contents at registers C and D.

2.4. TARGETED PERFORMANCE PARAMETERS

There are few performance parameters that the design needs to reach.

The design is expected to have a worst case delay of 5ns, i.e. the

processor is expected to have a maximum frequency of 200 MHz.

26

Page 27: Project Report Ashish

Instruction opcodes are to be designed in such a way that

implementation requires minimum hardware delays.

An optimum instruction size is to be chosen.

Tristate buffers are allowed inside the processor.

Each instruction has to be executed in a single clock cycle.

Modification of the instructions to improve performance is allowed.

More instructions may also be added.

3. DESIGN ARCHITECTURE

This chapter explains the internal architecture of the top level entity and the

sub modules. First the instruction set architecture was finalized an then the

final design

3.1. INSTRUCTION SET ARCHITECTURE

The design is made for a total of 14 Instructions. The instruction set is

designed to have equal instruction size for every instruction. The instruction

size is chosen to be 11-Bits. The ‘X’ in the instructions means ‘don’t care’

condition i.e. the instruction will work in the same way either ‘1’ or ‘0’ is

entered in that position.

3.1.1 INSTRUCTION FORMAT

Instructions MVI, JMP and JZ have immediate data/offset as the part of

the instructions

1. MVI : 01_< 8-Bit Immediate Data>_X

2. JMP : 10_< 8-Bit Immediate Offset>_X

27

Page 28: Project Report Ashish

3. JZ : 01_< 8-Bit Immediate Offset>_X

Instructions MOV, INC, DEC, SL and SR have both destination and

source as the part of the instruction.

4. MOV : 00_001_< 3-Bit Destination>_< 3_Bit Source>

5. INC : 00_010_< 3-Bit Destination>_< 3_Bit Source>

6. DEC : 00_011_< 3-Bit Destination>_< 3_Bit Source>

7. SL : 00_100_< 3-Bit Destination>_< 3_Bit Source>

8. SR : 00_101_< 3-Bit Destination>_< 3_Bit Source>

The destination register/port for the instructions ADD, CMP and SUB are

same as the source, so no need of mentioning the destination in the

instruction.

9. CMP : 00_000_00X_< 3_Bit Source>

10. ADD : 00_000_010_< 3_Bit Source>

11. SUB : 00_000_011_< 3_Bit Source>

The source in the case of LOAD instruction is fixed i.e. the data

memory, and in the case of STORE instruction, the SOURCE is fixed i.e. Data

Memory

12. LOAD : 00_110_< 3_Bit Destination>_XXX

13. STORE : 00_111_XXX_< 3_Bit Source>

The direct jump instruction JMPCD doesn’t require any destination,

source or immediate data to be the part of the instruction

14. JMPCD : 00_000_1XX_XXX

28

Page 29: Project Report Ashish

3.1.2. SOURCE / DESTINATION FORMAT

Source can be one of the registers A, B, C, D or the input port Xin

Total of 3-bits are required to define the source

A : 000

B : 001

C : 010

D : 011

Xin : 1XX

Destination can be one of the registers A, B, C, D or the output port Xout

Total of 3-bits are required to define the source

A : 000

B : 001

C : 010

D : 011

Xout : 1XX

3.1.3. INSTRUCTION EXAMPLES

1. MOV B,A – i.e. Move the contents of register A to B

Destination is B : 001

Source is A : 000

Instruction Code : 00_001_001_000

2. ADD D – i.e. Add the contents of register D to A and store the result in

D

Destination is B : Not Required, Same as Source

Source is D : 011

Instruction Code : 00_000_010_011

29

Page 30: Project Report Ashish

3. MVI A7 – i.e. Move immediate data ‘A7’ to register A

Destination is A : Not Required, It is fixed

Data : 1010_0111

Instruction Code : 01_1010_0111_1 / 01_1010_0111_0

3.2. MODULAR DESIGN

Selection of the correct design hierarchy is advantageous for the following

reasons.

Improves simulation and synthesis results

Improves debugging and modifying modular designs

Allows parallel engineering (a team of engineers can work on different

parts of the design at the same time)

Improves the placement and routing of the design by reducing routing

congestion and improving timing

Allows for easier code reuse in the current design, as well as in future

designs

In my design there are modules for arithmetic operations, logical

operations, move operations, jump operations, instructions register and

control unit. All the units are interconnected inside the Top module. The

different modules are:

Move unit

Shift Unit

Arithmetic Unit

Program Counter

Instruction Register

30

Page 31: Project Report Ashish

MAIN PROCESSOR UNIT

Xin

Xout

Clk

Rst

IR_in

wr_datard_data

8

8

8

11

16

16

Data_inout

Addr_PC

Instruction Decoder

Control Unit

Data Memory.

Program Memory.

Selection of source register, Selection of the Destination register, Selection

of input data to the destination register, control signal for the buffer for Xout

and control signal for data bus connected to the data memory are generated

inside the top level entity.

3.3. TOP LEVEL ENTITY

3.3.1. BLOCK DIAGRAM

Addr_data

31

Page 32: Project Report Ashish

3.3.2. PORTS DESCRIPTION

1. XinLength : 8 BitType : InputUse : This port can be used by the user for providing immediate

data for various instructions

2. XoutLength : 8 BitType : OutputUse : This port can be used by the user for getting the

immediate result of various instructions

3. ClkLength : 1 BitType : InputUse : This port provides the global clock signal used to

synchronize the internal registers, program memory and the data memory

4. RstLength : 1 BitType : InputUse : This port provides the global reset signal to all the internal

registers, program memory, data memory, instruction register etc.

5. Addr_PCLength : 16 BitType : OutputUse : This port serves as the address lines for the 6K x 11 Bits

program memory

6. IR_inLength : 11 Bit

32

Page 33: Project Report Ashish

Type : InputUse : This port provides 11-bit instruction to the processor

fetched from the program memory

7. Data_inoutLength : 8 itType : InoutUse : This port provides the 8-bit data to-and-from the data

memory. Buffers control the direction of data flow

8. Addr_dataLength : 16 BitType : OutputUse : This port serves as the address lines for the 6K x 8 Bits

data memory

9. wr_dataLength : 1 BitType : OutputUse : This port provides the write signal to the data memory

when data has to be written to the data memory

10. rd_dataLength : 1 BitType : OutputUse : This port provides the read signal to the data memory

when data has to be read from the data memory

33

Page 34: Project Report Ashish

34

Page 35: Project Report Ashish

3.3.4. SOURCE REGISTER SELECTION

There are four registers A, B, C, D and one input port Xin. The source can

be identified with the help of Instruction bits I[3:1}. The instruction bits I[2:1]

are used to identify the source register A/B/C/D. The instruction bit I[3] is

used to identify that weather the source is input port Xin or one of the

registers.

A 8-bit, 4-to-1 multiplexer with the select lines as I[2:1] is used to identify

the register. Another 8-bit, 2-to-1 multiplexer with the select line as I[3] is

used to select either the input port Xin or the already selected register. For

e.g. if I[3] bit is ‘1’ then irrespective of the bits I[2:1], the source will be input

port Xin and if I[3] is ‘0’ then the source will be selected according to the

value of the bits I[2:1].

3.3.5. MEMORY ACCESS OPERATIONS

There are two memory access operations, load and store. The load

operation and the store use the same bi-directional data bus to read and

write data. So the direction of flow of data is controlled with the help of 2, 8-

bit tristate buffers. The control lines wr_data and rd_data are generated

inside the control unit. The write/store operation is synchronous and the

read/load operation is asynchronous. The address for the data bus is given by

the concatenation of the registers C and D.

35

Page 36: Project Report Ashish

3.3.6. DATA BUS

The contents of the source register/port are modified by 3-parallel

modules, i.e. Move Unit, Arithmetic Unit and Shift Unit. The data to be sent of

data bus is selected by a 8-bit, 4-to-1 multiplexer with three of the inputs

being the three above mentioned units and the fourth input being the 8-bit

line from data memory (for LOAD instruction). The select lines for this

multiplexer are generated by the control unit.

3.3.7. DESTINATION DECODER

The Data bus is the common input to all the registers. The data from the

data bus is stored on a particular destination register by enabling the ‘load’

signal of that particular register. The load signals are generated using a 2X4

decoder. The four outputs represent the load signals of the four registers.

The 2-bit input to the decoder comes from the destination bits of the

instruction.

The destination is represented in the bits I[6:4] of the instruction. Only two

(least significant, I[5:4]) of these bits are required to select one of the four

registers, the third bit is used to select Xout as the destination.

The instructions ADD and SUB have the destination same as the source. So

for these two instructions the bits used as input to the destination decoder

are I[2:1]. A 2-bit, 2-to-1 multiplexer is used for this purpose. The input to this

MUX are I[5:4] and I[2:1]. The select line is generated inside the control unit.

One more signal ‘En_dec’ is used which serves as the enable for the

decoder. This signal is also generated inside the control unit. If the control

signal for Xout goes high, then also the destination decoder get disabled.

36

Page 37: Project Report Ashish

srcI[9:2]

8 8

3.3.7. OUTPUT PORT Xout

There is a latency of one clock between the loading of the instruction and

the storing of result when the destination is selected to be one of the

registers, because the registers are loaded with the result only on positive

edge of the clock. But when the destination is selected to be Xout port, then

there is no latency. So to make the operations symmetric I have included one

more 8-bit register X. The output of this register is connected to the port

Xout. So the value of Xout also changes only on the rising edge of the clock.

A 1-bit register is also being introduced in the design to store the value of

control signal for tristate buffer for Xout.

As for the destination decoder, the control signal for the Xout tristate

buffer is generated using a 1-bit, 2-to-1 multiplexer. The inputs to this MUX

are I[6] and I[3]. The select line is generated inside the control signal.

Another signal ‘Xout_buf ‘is used which is ANDed with the output of the MUX.

The result is stored in a 1-bit register ‘X_buf’, the output of which is

connected to the control line of the tristate buffer for Xout. The signal

‘Xout_buf’ is generated inside the control unit.

3.4. MOVE UNIT

The move unit performs two instructions:

1. MOV dst, src

2. MVI, immediate data

3.4.1. ARCHITECTURE

37

Page 38: Project Report Ashish

Instruction Instruction Code

1. MVI immediate data : 0__1__< 8-bit immediate data>_X

2. MOV dst, src : 0__0__001_< 3-bit Destination>_< 3-bit

source>

I[10]

So depending upon the instruction bit I[10 ], the multiplexer will select either

the instruction bits I[9:2] (i.e. the immediate data) or the source

3.5. SHIFT UNIT

The shift unit performs two instructions:

1. SL dst, src

2. SR dst, src

3.5.1. ARCHITECTURE

38

Page 39: Project Report Ashish

1 0

{src[6:0], 0}{0, src[7:1]}

8 8

8

I[7]

Result_suInstruction Instruction Code

1. SL dst, src : 00_10__0_< 3-bit Destination>_< 3-bit

source>

2. SR dst, src : 00_10__1_< 3-bit Destination>_< 3-bit

source>

I[7]

So depending upon the instruction bit I[7 ], the multiplexer will either left

shift the source by 1-bit or right shift by 1-bit.

3.6. ARITHMETIC UNIT

The arithmetic unit performs five instructions:

1. INC dst, src

2. DEC dst, src

3. ADD src

4. SUB src

5. CMP src

3.6.1. BLOCK DIAGRAM

39

Page 40: Project Report Ashish

ARITHMETIC UNIT

8

8 8

src A Cin Sub I[8] q_S q_C

Result_au S C Z

3.6.2. PORTS DESCRIPTION

1. srcLength : 8 BitType : InputUse : This port provides the data from the source register/port.

2. ALength : 8 BitType : InputUse : This port always provides the contents of register A for

SUB, ADD and CMP instructions.

3. CinLength : 1 BitType : InputUse : This port provides the carry-in signal to the adder inside

the arithmetic unit. This signal is generated inside the control unit.

4. SubLength : 1 BitType : InputUse : This signal is generated inside the control unit. If Sub

goes high then the 2nd input the adder is converted to its 2’s complement form

5. I[8]Length : 1 BitType : Input

40

Page 41: Project Report Ashish

Use : This is the 8th bit of the instruction. This line is used to select the 2nd input to the adder inside the unit.

6. q_CLength : 1 BitType : InputUse : This signal is enable signal for the carry signal for the

carry flag.

7. q_SLength : 1 BitType : InputUse : This signal is enable signal for the Sign signal for the Sign

flag.

8. Result_auLength : 8 BitType : OutputUse : This port gives the result of the arithmetic unit.

9. ZLength : 1 BitType : OutputUse : This signal is given to the Zero flag inside the top entity

10. CLength : 1 BitType : OutputUse : This signal is given to the Caary flag inside the top entity

11. SLength : 1 BitType : OutputUse : This signal is given to the Sign flag inside the top entity

3.6.3. ARCHITECTURE

The basic block inside the arithmetic unit is an 8-bit ripple carry adder.

41

Page 42: Project Report Ashish

One input to the adder is fixed, i.e. the 8-bit source. The second input to the

adder depends upon the instruction to execute. The subtraction operations

are also performed using the same adder by performing the 2’s complement

operation of the input to be subtracted by using 8 XOR gates.

One input to the arithmetic unit comes from the Source register/port

and the second input is fixed to register A

Sign, Carry and Zero flags are the part of the top level entity, but their

values are generated inside the arithmetic unit only.

Inst.

No.

InstInst. Code I/P1 I/P2 Cin Sub

Operatio

n

q5

q6

q10

q11

q9

INC

DEC

ADD

SUB

CMP

000_ 1 _0_<dst><src>

000_ 1 _1_<dst><src>

000_ 0 _00_10__<src>

000_ 0 _00_11__<src>

000__0__00_0X__<src>

Src

Src

Src

Src

Src

0

0

A

A

A

1

0

0

1

1

0

1

0

1

1

Src + 1

Src - 1

Src + A

Src - A

Src - A

42

Page 43: Project Report Ashish

8-Bit Adder

-------I/P2--------I/P1

Cin

Sub

8

88

I[8]

0

A

Src

8

Result_au

Z

Cout

Depending upon the value of instruction bit I[8], the input 2 will be ei-

ther 0 or register A

Instruction nos. given here are generated by the instruction register

discussed later

Thus by controlling the values Cin, Sub and I/P2, different operation can

be performed by the same unit.

o If Sub is ‘1’ and Cin is ‘0’ then the 2nd input is converted to its 1’s

complement form.

o If Sub is ‘1’ and Cin is ‘0’ then the 2nd input is converted to its 2’s

complement form i.e. to its negative value.

43

Page 44: Project Report Ashish

3.6.4. FUNCTIONALITY

1. INC: The 2nd input to the adder is 0 and Cin is high, so the result

comes out to be source +1

2. DEC: The 2nd input is Zero, Sub is high and Cin is low, the result is

source + 1’s complement of 0 i.e. 1111_1111 which is also the 2’s com-

plement of 1. So the result comes out to be source – 1

3. ADD: Cin and Sub both are low, so the 2nd input i.e. A, is passed as it

is. The result comes out to be source + contents of register A.

4. SUB: Cin and Sub both are high, so the 2nd input i.e. A, is converted to

its 2’s complement form i.e. its negative value. The result comes out to

be source - contents of register A.

5. CMP: Its functionality is exactly the same as Sub, the only difference

being that the result in this case is not stored in any register.

3.6.5. FLAGS

The flags are the part of the top level entity, but the values to be

loaded in them are generated inside the arithmetic unit

1. Carry: This is be high only if there is a carry out and the instruction be-

ing executed is ADD or INC

2. Sign: This is high only if carry out is low and the instruction being exe-

cuted is SUB, CMP or DEC

3. Zero: This is high if the result of the arithmetic unit is 0

The signals q_S and q_C controlling the Sign and Carry flags are generated

inside the Control unit.

44

Page 45: Project Report Ashish

PROGRAM COUNTER

This unit performs three instructions:

1. JMP immediate offset

2. JZ immediate offset

3. JMPCD

3.7.1. ARCHITECTURE

If instruction is JMPCD i.e. q14 is high then the program counter will be

loaded with the value stored in registers C & D

If q14 is low then there can be three cases

1. Instruction is JMP

2. Instruction is JZ and Zero flag is set. In both these cases the pro-

gram counter will be loaded with a new value which is equal to

the old value plus the 8- bit immediate offset which is specified in

the instruction bits I[9:2].

3. If all of the above conditions are not met then the program

counter will be just incremented by 1.

45

Page 46: Project Report Ashish

16-BIT ADDER

PROGRAM COUNTER

q14

CD16

16

rst

clk

16

Address Lines forProgram Memory

0S4

I[9:2]

0000-0001

8

8

Signal S4 is generated inside the control unit

3.8. INSTRUCTION REGISTER

The instruction register is a 11-bit triggered register. It loads the instructions

on the positive edge of the clock. The instruction to the instruction register is

fed from the program memory. The address for the program memory is taken

by the value of the program counter.

46

Page 47: Project Report Ashish

3.9. INSTRUCTION DECODER

This unit is used to identity the instruction being executed. The input to this

unit is the op-code part of the instruction which comes from the instruction

register. Output of this unit is a 14-bit port where each bit represents one of

the 14 instructions. All the instructions have different operation codes, so at

time only one of the 14 bits will be high in the output.

1. MVI : 01_< 8-Bit Immediate Data>_X

q[1] = I[11]’ I[10]

2. JMP : 10_< 8-Bit Immediate Offset>_X

q[2] = I[11] I[10]’

3. JZ : 01_< 8-Bit Immediate Offset>_X

q[3] = I[11] I[10]

4. MOV : 00_001_< 3-Bit Destination>_< 3_Bit Source>

q[4] = I[11]’ I[10]’ I[9]’ I[8]’ I[7]

5. INC : 00_010_< 3-Bit Destination>_< 3_Bit Source>

q[5] = I[11]’ I[10]’ I[9]’ I[8] I[7]’

6. DEC : 00_011_< 3-Bit Destination>_< 3_Bit Source>

q[6] = I[11]’ I[10]’ I[9]’ I[8] I[7]

7. SL : 00_100_< 3-Bit Destination>_< 3_Bit Source>

q[7] = I[11]’ I[10]’ I[9] I[8]’ I[7]’

8. SR : 00_101_< 3-Bit Destination>_< 3_Bit Source>

q[8] = I[11]’ I[10]’ I[9] I[8]’ I[7]

47

Page 48: Project Report Ashish

9. CMP : 00_000_00X_< 3_Bit Source>

q[9] = I[11]’ I[10]’ I[9]’ I[8]’ I[7]’ I[6]’ I[5]’

10. ADD : 00_000_010_< 3_Bit Source>

q[10] = I[11]’ I[10]’ I[9]’ I[8]’ I[7]’ I[6]’ I[5] I[4]’

11. SUB : 00_000_011_< 3_Bit Source>

q[11] = I[11]’ I[10]’ I[9]’ I[8]’ I[7]’ I[6]’ I[5] I[4]

12. LOAD : 00_110_< 3_Bit Destination>_XXX

q[12] = I[11]’ I[10]’ I[9] I[8] I[7]’

13. STORE : 00_111_XXX_< 3_Bit Source>

q[13] = I[11]’ I[10]’ I[9] I[8] I[7]

14. JMPCD : 00_000_1XX_XXX

q[14] = I[11]’ I[10]’ I[9]’ I[8]’ I[7]’ I[6]

3.10. CONTROL UNIT

Control unit generates many control signals required by different modules

and the top level entity. The inputs to the control unit are Decoded

Instructions from the instruction decoder and the values of the flags. The

output is many control signals.

Signals to arithmetic unit

1. q_C: This is the enabling signal for the carry flag. It is high only if the in-

struction being executed is ADD(q10) or INC(q5).

q_C = q[5] + q[10]

48

Page 49: Project Report Ashish

2. q_S: This is the enabling signal for the sign flag. It is high only if the in-

struction being executed is SUB(q11) or DEC(q6) or CMP(q9).

q_C = q[6] + q[9] + q[11]

3. Sub: As shown in the table in arithmetic unit, this signal is high in the

case of DEC, CMP and SUB

Sub = q[6] + q[9] + q[11]

4. Cin: As shown in the table in arithmetic unit, this signal is high in the

case of INC, CMP and SUB

Cin = q[5] + q[9] + q[11]

Signals to Program Counter

1. S4: This signal selects the immediate offset to be added to contents of

the program counter. It is high if “the instruction being executed is

JMP” or if “the instruction begin executed is JZ and Zero flag is set at

the same time”

S4 = q[2] + q[3].Z

Signals to Data Memory

1. wr_data: This signal goes high if the instruction being executed is

STORE.

wr_data = q[13]

2. rd_data: This signal goes high if the instruction being executed is LOAD.

rd_data = q[12]

Signals to Top level Entity

49

Page 50: Project Report Ashish

1. ld_flags: This is the load signals for the flags. This signal is high if the

instruction being executed in an arithmetic instruction.

ld_flags = q[5] + q[6] + q[9] + q[10] + q[11]

2. S2: This signal selects the either destination or the source bits for the

input to the destination decoder. This signal is high only if the instruc-

tions being executed is ADD or SUB which have destination same as

the source.

S2 = q[10] + q[11]

3. Xout_buf: This signal is ANDed with the destination bit to generate the

control signal for the Xout tristate buffer. This signal is high only if the

instruction being executed involves any destination.

Xout_buf = q[4] + q[5] + q[6] + q[7] + q[8] + q[10] + q[11] + q[12]

4. En_dec: This signal is NORed with the control signal of Xout tristate

buffer to generate the enable signal for the Destination Decoder. This

signal is high only if the instruction being executed doesn’t involve any

destination. So if either the control signal for Xout goes high or this

En_dec signal goes high, it will disable the destination decoder.

En_dec = q[1] + q[2] + q[3] + q[9] + q[13] + q[14]

5. S1, S0: These are the select lines for the multiplexer which selects the

result of which unit should be present on the data bus.

Their value is 00 for Move Unit

01 for Arithmetic Unit

10 for Shift Unit

11 for LOAD Instruction

So these signals are generated by 4X2 Encoder. The Input to the

encoder begin E[3:0] where:

E[0] = q[1] + q[4]

50

Page 51: Project Report Ashish

E[1] = q[5] + q[6] + q[9] + q[10] + q[11]

E[2] = q[7] + q[8]

E[3] = q[12]

3.11. DATA MEMORY

The data memory is a block RAM of size 65kbytes. The data memory has a

synchronous write and asynchronous read. The address lines for it comes

from the concatenation of the contents of the registers C & D. The data line

for the memory is bidirectional. Write and Read operations are controlled by

the wr_data and rd_data signals generated by the control unit.

3.12. PROGRAM MEMORY

The program memory is a block RAM with 65536 locations and 11 bits per

location. This stores the instructions to be executed by the processor. Read

operation is asynchronous. The address line for the program memory comes

from the 16-bit program counter.

4. DESIGN IMPLEMENTATION

This chapter details the complete design flow for the FPGA implementation

of the design. The target device is SPARTAN 3.

51

Page 52: Project Report Ashish

Fig 4.1 FPGA Design Flow

4.1. HDL ENTRY

The first step in implementation of the design is creating the HDL code

based on the design criteria. The following recommendations were taken care

of to create effective design.

Using RTL Code

Usage of register transfer level (RTL) code and avoiding (when possible)

instantiating specific components creates designs with the following

characteristics.

Readable code

Faster and simpler simulation

Portable code for migration to different device families

52

Page 53: Project Report Ashish

Reusable code for future designs

In my design, Verilog is the HDL used to make the design entry.

4.2. FUNCTIONAL SIMULATION

Functional or RTL simulation is used to verify the syntax and functionality of

the design. The following recommendations were used for simulating the

design.

Typically with larger hierarchical HDL designs, one should perform

separate simulations on each module before testing the entire design.

This makes it easier to debug your code.

Once each module functions as expected, a test bench is created to

verify that entire design functions as planned. The same test bench is

used again for the final timing simulation to confirm that the design

functions as expected under worst-case delay conditions.

My design’s functionality was tested successfully

4.3. SYNTHESIS

After creating HDL design, you must synthesize it. During synthesis,

behavioral information in the HDL file is translated into a structural netlist,

and the design is optimized for a Xilinx device. Xilinx offers its own synthesis

tool, Xilinx Synthesis Technology (XST). XST is a Xilinx® tool that synthesizes

HDL designs to create Xilinx® specific netlist files called NGC files. The NGC

file is a netlist that contains both logical design data and constraints that

takes the place of both EDIF and NCF files.

53

Page 54: Project Report Ashish

4.3.1. SYNTHESIS CONSTRAINTS

Constraints are essential to help you meet your design goals or obtain the

best implementation of your design. Constraints are available in XST to

control various aspects of the synthesis process itself, as well as placement

and routing. Synthesis algorithms have been tuned to automatically provide

optimal results in most situations. In some cases, however, synthesis may fail

to initially achieve optimal results; some of the available constraints allow

you to explore different synthesis alternatives to meet your specific needs.

Following is a list of some HDL Options that can be set within the HDL

Options tab of the Process Properties dialog box for FPGA devices:

FSM Encoding Algorithm

Case Implementation Style

FSM Style

RAM Extraction

RAM Style

Mux Style

Decoder Extraction

Priority Encoder Extraction

Shift Register Extraction

Logical Shifter Extraction

4.3.2. SYNTHESIS REPORT

While synthesizing the design, Xilinx XST creates a synthesis report also

having my details like Device utilization, Macro Statistics, Timing etc. The

following shows some parts of the synthesis report generated for the top

level entity of my design

HDL Synthesis Report====================

54

Page 55: Project Report Ashish

Macro Statistics----------------# Adders/Subtractors : 2 16-bit adder carry out : 1 8-bit adder carry in/out : 1# Registers : 9 1-bit register : 2 11-bit register : 1 16-bit register : 1 8-bit register : 5# Multiplexers : 3 1-bit 4-to-1 multiplexer : 2 8-bit 4-to-1 multiplexer : 1# Tristates : 3 8-bit tristate buffer : 3# Xors : 1 8-bit xor2 : 1

Device utilization summary:---------------------------

Selected Device : 3s200pq208-5

Number of Slices: 82 out of 1920 4% Number of Slice Flip Flops: 77 out of 3840 2% Number of 4 input LUTs: 146 out of 3840 3% Number of bonded IOBs: 71 out of 141 50% Number of GCLKs: 1 out of 8 12%

TIMING REPORT-------------

Minimum period: 10.599ns (Maximum Frequency: 94.347MHz) Minimum input arrival time before clock: 7.845ns Maximum output required time after clock: 10.277ns Maximum combinational path delay: 7.862ns

4.4. TRANSLATE

4.4.1. NGD Build Overview

NGD Build reads in a netlist file in EDIF or NGC format and creates a NGD file

that contains a logical description of the design in terms of logic elements,

such as AND gates, OR gates, decoders, flip-flops, and RAMs.

The NGD file contains both a logical description of the design reduced to

55

Page 56: Project Report Ashish

Xilinx Native Generic Database (NGD) primitives and a description of the

original hierarchy expressed in the input netlist. The output NGD file can be

mapped to the desired device family.

4.4.2. Conversion of Netlist to NGD File

NGD Build performs the following steps to convert a netlist to an NGD file:

1. Reads the source netlist. NGD Build invokes the Netlist Launcher. The

Netlist Launcher determines the input netlist type and starts the appropriate

netlist reader program. The netlist reader incorporates NCF files associated

with each netlist. NCF files contain timing and layout constraints for each

module.

2. Reduces all components in the design to NGD primitives. NGD Build

merges components that reference other files. NGD Build also finds the

appropriate system library components, physical macros (NMC files), and

behavioral models.

3. Checks the design by running a Logical Design Rule Check (DRC) on the

converted design Logical DRC is a series of tests on a logical design.

4. Writes an NGD file as output

4.5. MAP

The MAP program maps a logical design to a Xilinx FPGA. The input to MAP

is an NGD file, which is generated using the NGD Build program. The NGD file

contains a logical description of the design that includes both the hierarchical

components used to develop the design and the lower level Xilinx primitives.

The NGD file also contains any number of NMC (macro library) files, each of

which contains the definition of a physical macro. MAP first performs a logical

56

Page 57: Project Report Ashish

DRC (Design Rule Check) on the design in the NGD file. MAP then maps the

design logic to the components (logic cells, I/O cells, and other components)

in the target Xilinx FPGA. The output from MAP is an NCD (Native Circuit

Description) file—a physical representation of the design mapped to the

components in the targeted Xilinx FPGA. The mapped NCD file can then be

placed and routed using the PAR program.

4.5.1. MAP Input Files

MAP uses the following files as input:

• NGD file—Native Generic Database file. This file contains a logical

description of the design expressed both in terms of the hierarchy used when

the design was first created and in terms of lower-level Xilinx primitives to

which the hierarchy resolves. The file also contains all of the constraints

applied to the design during design entry or entered in a UCF (User

Constraints File). The NGD file is created by the NGD Build program.

• NMC file—Macro library file. An NMC file contains the definition of a physical

macro. When there are macro instances in the NGD design file, NMC files are

used to define the macro instances. There is one NMC file for each type of

macro in the design file.

• Guide NCD file—An optional input file generated from a previous MAP run.

An NCD file contains a physical description of the design in terms of the

components in the target Xilinx device. A guide NCD file is an output NCD file

from a previous MAP run that is used as an input to guide a later MAP run.

• Guide NGM file—A binary design file containing all of the data in the input

NGD file as well as information on the physical design produced by the

mapping.

57

Page 58: Project Report Ashish

4.5.2. MAP Output Files

Output from MAP consists of the following files:

• NCD (Native Circuit Description) file—a physical description of the design in

terms of the components in the target Xilinx device.

• PCF (Physical Constraints File)—an ASCII text file that contains constraints

specified during design entry expressed in terms of physical elements. The

physical constraints in the PCF are expressed in Xilinx’s constraint language.

MAP creates a PCF file if one does not exist or rewrites an existing file.

• NGM file—a binary design file that contains all of the data in the input NGD

file as well as information on the physical design produced by mapping. The

NGM file is used to correlate the back-annotated design netlist to the

structure and naming of the source design.

• MRP (MAP report)—a file that contains information about the MAP run. The

MRP file lists any errors and warnings found in the design, lists design

attributes specified, and details on how the design was mapped (for example,

the logic that was removed or added and how signals and symbols in the

logical design were mapped into signals and components in the physical

design). The file also supplies statistics about component usage in the

mapped design.

4.5.3. MAP REPORT

The MAP report is generated in the following format

______________________________

Table of Contents

----------------------------------------------

58

Page 59: Project Report Ashish

Section 1 - Errors

Section 2 - Warnings

Section 3 - Informational

Section 4 - Removed Logic Summary

Section 5 - Removed Logic

Section 6 - IOB Properties

Section 7 - RPMs

Section 8 - Guide Report

Section 9 - Area Group Summary

Section 10 - Modular Design Summary

Section 11 - Timing Report

Section 12 - Configuration String Information

Section 13 - Additional Device Resource Counts

____________________________________

4.5.4. POST MAP TIMING REPORT

The timing report generated after MAP process contains all the component

delays. But this report doesn’t take care of the interconnect delays. So the

delays for the same type of components come out to be exactly same.

The Post MAP Timing Report for my Design is:

Data Sheet report:-----------------All values displayed in nanoseconds (ns)

Setup/Hold to clock clk+-------------+------------+------------+| | | || Clock | Setup to | Hold to |

59

Page 60: Project Report Ashish

| Source | clk (edge) | clk (edge) |+-------------+------------+------------+data_in_out<0>| 1.356(R)| 0.134(R)data_in_out<1>| 1.305(R)| 0.134(R)data_in_out<2>| 1.356(R)| 0.134(R)data_in_out<3>| 1.305(R)| 0.134(R)data_in_out<4>| 1.356(R)| 0.134(R)data_in_out<5>| 1.305(R)| 0.134(R)data_in_out<6>| 1.356(R)| 0.134(R)data_in_out<7>| 1.305(R)| 0.134(R)ir_in<10> | 3.202(R)| 0.643(R)ir_in<11> | 3.202(R)| -1.117(R)ir_in<1> | 3.202(R)| -1.117(R)ir_in<2> | 3.202(R)| -1.117(R)ir_in<3> | 3.202(R)| -1.117(R)ir_in<4> | 3.202(R)| -1.117(R)ir_in<5> | 3.202(R)| -1.117(R)ir_in<6> | 3.202(R)| -1.117(R)ir_in<7> | 3.202(R)| 0.643(R)ir_in<8> | 3.202(R)| 0.643(R)ir_in<9> | 3.202(R)| -1.117(R)xin<0> | 4.237(R)| -0.832(R)xin<1> | 4.247(R)| -0.349(R)xin<2> | 4.026(R)| -0.832(R)xin<3> | 4.036(R)| -0.832(R)xin<4> | 3.815(R)| -0.832(R)xin<5> | 3.825(R)| -0.832(R)xin<6> | 3.380(R)| -0.349(R)xin<7> | 3.004(R)| -0.832(R)+-------------+------------+------------+

Clock clk to Pad+-------------+------------+| | clk (edge) || Destination | to PAD |+-------------+------------+addr_data<0> | 6.407(R)addr_data<10> | 6.407(R)addr_data<11> | 6.407(R)addr_data<12> | 6.407(R)addr_data<13> | 6.407(R)addr_data<14> | 6.407(R)addr_data<15> | 6.407(R)addr_data<1> | 6.407(R)addr_data<2> | 6.407(R)addr_data<3> | 6.407(R)addr_data<4> | 6.407(R)

60

Page 61: Project Report Ashish

addr_data<5> | 6.407(R)addr_data<6> | 6.407(R)addr_data<7> | 6.407(R)addr_data<8> | 6.407(R)addr_data<9> | 6.407(R)addr_pc<0> | 6.407(R)addr_pc<10> | 6.407(R)addr_pc<11> | 6.407(R)addr_pc<12> | 6.407(R)addr_pc<13> | 6.407(R)addr_pc<14> | 6.407(R)addr_pc<15> | 6.407(R)addr_pc<1> | 6.407(R)addr_pc<2> | 6.407(R)addr_pc<3> | 6.407(R)addr_pc<4> | 6.407(R)addr_pc<5> | 6.407(R)addr_pc<6> | 6.407(R)addr_pc<7> | 6.407(R)addr_pc<8> | 6.407(R)addr_pc<9> | 6.407(R)data_in_out<0>| 7.565(R)data_in_out<1>| 7.565(R)data_in_out<2>| 7.565(R)data_in_out<3>| 7.565(R)data_in_out<4>| 7.565(R)data_in_out<5>| 7.565(R)data_in_out<6>| 7.565(R)data_in_out<7>| 7.565(R)rd_data | 7.164(R)wr_data | 7.164(R)xout<0> | 6.618(R)xout<1> | 6.618(R)xout<2> | 6.618(R)xout<3> | 6.618(R)xout<4> | 6.618(R)xout<5> | 6.618(R)xout<6> | 6.618(R)xout<7> | 6.618(R)Pad to Pad+--------------+---------------+---------+| Source Pad |Destination Pad| Delay |---------------+---------------+---------+xin<0> |data_in_out<0> | 6.159xin<1> |data_in_out<1> | 6.159xin<2> |data_in_out<2> | 6.159xin<3> |data_in_out<3> | 6.159xin<4> |data_in_out<4> | 6.159xin<5> |data_in_out<5> | 6.159

61

Page 62: Project Report Ashish

xin<6> |data_in_out<6> | 6.159xin<7> |data_in_out<7> | 6.159+--------------+---------------+---------+

Analysis completed Tue May 30 13:11:29 2006

4.6. PLACE AND ROUTE

4.6.1. OVERVIEW

After you create a Native Circuit Description (NCD) file with the MAP

program, you can place and route that design file using PAR. PAR accepts a

mapped NCD file as input, places and routes the design, and outputs an NCD

file to be used by the bit stream generator (BitGen). The NCD file output by

PAR can also be used as a guide file for additional runs of PAR that may be

done after making minor changes to your design.

PAR places and routes a design based on the following considerations:

• Timing-driven—The Xilinx timing analysis software enables PAR to place

and route a design based upon timing constraints.

• Non Timing-driven (cost-based)—Placement and routing are performed

using various cost tables that assign weighted values to relevant factors such

as constraints, length of connection, and available routing resources. Non

timing-driven placement and routing is used if no timing constraints are

present.

4.6.2 PLACING

The PAR placer executes multiple phases of the placer. PAR writes the NCD

after all the placer phases are complete. During placement, PAR places

62

Page 63: Project Report Ashish

components into sites based on factors such as constraints specified in the

PCF file, the length of connections, and the available routing resources.

4.6.3. ROUTING

After placing the design, PAR executes multiple phases of the router. The

router performs a converging procedure for a solution that routes the design

to completion and meets timing constraints. Once the design is fully routed,

PAR writes an NCD file, which can be analyzed against timing. PAR writes a

new NCD as the routing improves throughout the router phases.

Note: Timing-driven place and timing-driven routing are automatically

invoked if PAR finds timing constraints in the physical constraints file

4.6.3. POST PAR TIMING REPORT

The timing report generated after MAP process contains all the component

delays. But the timing report generated after PAR have both the component

as well as the interconnect delays. The interconnect delays comes out to be

comparable to the component delays. Now the delays for the same type of

components will not be same because of different routing paths.

The Post PAR Timing Report for my Design is:

Data Sheet report:-----------------All values displayed in nanoseconds (ns)

Setup/Hold to clock clk+-------------+------------+------------+| Clock | Setup to | Hold to || Source | clk (edge) | clk (edge) |+-------------+------------+------------+data_in_out<0>| 3.411(R)| 0.111(R)data_in_out<1>| 3.599(R)| -0.001(R)|

63

Page 64: Project Report Ashish

data_in_out<2>| 3.271(R)| 0.056(R)|data_in_out<3>| 3.497(R)| -0.016(R)|data_in_out<4>| 3.551(R)| 0.085(R)|data_in_out<5>| 4.543(R)| -0.583(R)|data_in_out<6>| 3.925(R)| -0.144(R)|data_in_out<7>| 3.065(R)| -0.077(R)|ir_in<10> | 2.622(R)| 0.534(R)|ir_in<11> | 2.623(R)| -0.401(R)|ir_in<1> | 2.623(R)| -0.400(R)|ir_in<2> | 2.622(R)| -0.400(R)|ir_in<3> | 2.623(R)| -0.400(R)|ir_in<4> | 2.623(R)| -0.401(R)|ir_in<5> | 2.623(R)| -0.400(R)|ir_in<6> | 2.622(R)| -0.400(R)|ir_in<7> | 2.622(R)| 0.753(R)|ir_in<8> | 2.623(R)| 0.394(R)|ir_in<9> | 2.623(R)| -0.401(R)|xin<0> | 8.194(R)| -2.109(R)|xin<1> | 7.703(R)| -1.613(R)|xin<2> | 7.778(R)| -1.795(R)|xin<3> | 8.995(R)| -1.671(R)|xin<4> | 7.548(R)| -1.576(R)|xin<5> | 8.228(R)| -1.709(R)|xin<6> | 7.885(R)| -1.298(R)|xin<7> | 6.916(R)| -2.378(R)|+-------------+------------+------------+

Clock clk to Pad+-------------+------------+| Destination | clk (edge) || | to PAD |+-------------+------------+addr_data<0> | 9.442(R)addr_data<10> | 9.144(R)addr_data<11> | 9.149(R)addr_data<12> | 8.825(R)addr_data<13> | 8.607(R)addr_data<14> | 8.525(R)addr_data<15> | 8.784(R)addr_data<1> | 8.424(R)addr_data<2> | 8.638(R)addr_data<3> | 9.100(R)addr_data<4> | 8.361(R)addr_data<5> | 8.380(R)addr_data<6> | 8.640(R)addr_data<7> | 9.172(R)addr_data<8> | 8.907(R)addr_data<9> | 8.852(R)

64

Page 65: Project Report Ashish

addr_pc<0> | 9.099(R)addr_pc<10> | 8.084(R)addr_pc<11> | 8.290(R)addr_pc<12> | 8.528(R)addr_pc<13> | 8.178(R)addr_pc<14> | 8.735(R)addr_pc<15> | 8.824(R)addr_pc<1> | 9.076(R)addr_pc<2> | 9.333(R)addr_pc<3> | 9.067(R)addr_pc<4> | 11.008(R)addr_pc<5> | 8.909(R)addr_pc<6> | 9.681(R)addr_pc<7> | 9.785(R)addr_pc<8> | 8.384(R)addr_pc<9> | 9.393(R)data_in_out<0>| 12.164(R)data_in_out<1>| 13.308(R)data_in_out<2>| 12.626(R)data_in_out<3>| 12.626(R)data_in_out<4>| 12.632(R)data_in_out<5>| 14.403(R)data_in_out<6>| 12.408(R)data_in_out<7>| 14.370(R)rd_data | 12.080(R)wr_data | 12.561(R)xout<0> | 9.245(R)xout<1> | 9.249(R)xout<2> | 9.616(R)xout<3> | 8.549(R)xout<4> | 9.300(R)xout<5> | 9.623(R)xout<6> | 9.578(R)xout<7> | 9.265(R)+-------------+------------+Pad to Pad+--------------+---------------+---------+| Source Pad |Destination Pad| Delay |+--------------+---------------+---------+xin<0> |data_in_out<0> | 9.217xin<1> |data_in_out<1> | 8.800xin<2> |data_in_out<2> | 9.069xin<3> |data_in_out<3> | 8.827xin<4> |data_in_out<4> | 8.639xin<5> |data_in_out<5> | 9.744xin<6> |data_in_out<6> | 9.310xin<7> |data_in_out<7> | 10.122+--------------+---------------+---------+

65

Page 66: Project Report Ashish

Analysis completed Tue May 30 13:15:43 2006

4.7 BITGEN OVERVIEW

BitGen produces a bit stream for Xilinx device configuration. After the

design is completely routed, it is necessary to configure the device so that it

can execute the desired function. This is done using files generated by

BitGen, the Xilinx bit stream generation program. BitGen takes a fully routed

NCD (native circuit description) file as input and produces a configuration bit

stream—a binary file with a .bit extension. The BIT file contains all of the

configuration information from the NCD file that defines the internal logic and

interconnections of the FPGA, plus device-specific information from other files

associated with the target device. The binary data in the BIT file is then

downloaded into the FPGAs memory cells, or it is used to create a PROM file.

The final bit file was downloaded into the FPGA device and real time

verification was done.

5. CONCLUSION

The design was successfully implemented on the target device. The design

was tested successfully by both Functional and Post PAR Simulation.

5.1. PERFORMANCE PARAMETERS

Here are some of the performance parameters that my design achieved.

1. Throughput : 1 instruction/cycle

2. Initial Latency : 1 Clock

66

Page 67: Project Report Ashish

3. No. of Pipelining Stages : 2

4. Max. Operating Freq : 97 Mhz

5.2. FUTURE IMPROVEMENTS

1. More instructions can be included in the design with the same instruc-

tion size by using the don’t care bits.

2. Number of pipelining stages can be increased to 4-5 from the current

number of 2. First pipelining stage is Read-Fetch-Execute and the Sec-

ond pipelining stage is Write. By dividing the First stage further in to

three stages, maximum operating frequency will also be improved by

great extent.

APPENDIX A – RTL CODING

A.1. MOVE UNIT

/* ~~~~MOVE UNIT~~~~ */

module move_unit(I_10, src, I_2_9, result); input I_10; input [7:0] src; input [7:0] I_2_9; output [7:0] result;

assign result = I_10 ? I_2_9 : src;

endmodule

67

Page 68: Project Report Ashish

A.2. SHIFT UNIT

/* ~~~~SHIFT UNIT~~~~ */

module shift_unit(src, I_7, result); input [7:0] src; input I_7; output [7:0] result;

assign result = I_7 ? {1'b0, src[7:1]} : {src[6:0], 1'b0};

endmodule

A.3. ARITHMETIC UNIT

/* ~~~~8-BTIT FULL-ADDER~~~~ */

module full_adder_8bit(in1, in2, sum, cout, cin);

input [7:0] in1, in2;input cin;

output [7:0] sum;output cout;

assign {cout, sum} = in1 + in2 + cin;

endmodule/* ~~~~ARITHMETIC UNIT~~~~ */

module arithmetic_unit(A, src, I_8, cin, sub, q_c, q_s, result, S, C, Z); input [7:0] A; input [7:0] src; input I_8; input cin; input sub; input q_c, q_s;

output [7:0] result;output S, C, Z;

wire [7:0] in2, in2_final;wire cout;

assign in2 = I_8 ? 8'b0 : A;assign in2_final = in2 ^ {8{sub}};

68

Page 69: Project Report Ashish

full_adder_8bit a1(.in1(src), .in2(in2_final), .cin(cin), .cout(cout), .sum(result));

assign C = q_c && cout; //CARRY FLAGassign S = q_s && (!cout); //SIGN FLAG

assign Z = (!result[7]) & (!result[6]) & (!result[5]) & (!result[4]) & (!result[3]) & (!result[2]) & (!result[1]) & (!result[0]); //ZERO FLAG

endmodule

A.4. PROGRAM COUNTER

/* ~~~~16-BIT ADDER~~~~ */

module adder_16(in1, in2, out, cout, cin);

input cin;input [15:0] in1, in2;

output cout;output [15:0] out;

assign {cout, out}=in1 + in2 + cin;

endmodule

/* ~~~~PROGRAM COUNTER~~~~ */

module program_counter(ld_pc, rst, clk, c, d, I_2_9, s4, q14, PC); input ld_pc; input rst; input clk; input [7:0] c; input [7:0] d; input [7:0] I_2_9; input s4, q14; output reg [15:0] PC;

wire cin=1'b0; wire [15:0] in2, adder_out, pc_in; wire [7:0] in2_half;

assign in2_half = s4 ? I_2_9 : 8'b0000_0001; assign in2 = {8'b0000_0000, in2_half}; assign pc_in = q14 ? {c,d} : adder_out;

69

Page 70: Project Report Ashish

adder_16 a16 (.in1(PC), .in2(in2), .out(adder_out), .cin(cin));

always@(posedge clk, posedge rst) begin

if (rst)PC = 8'b0;

else if (ld_pc)PC = pc_in;

end

endmodule

A.5. INSTRUCTION REGISTER

/* ~~~~INSTRUCTION REGISTER~~~~ */

module instruction_register(clk, rst, ld_ir, ir_in, I);

input clk; input rst; input ld_ir; input [11:1] ir_in;

output reg [11:1] I;

always@(posedge clk, posedge rst)begin

if(rst)I = 11'b0100_0000_000;

else if (ld_ir)I = ir_in;

end

endmodule

A.6. INSTRUCTION DECODER

70

Page 71: Project Report Ashish

/* ~~~~INSTRUCTION DECODER~~~~ */

module instruction_decoder(I_4_11, q); input [11:4] I_4_11; output [14:1] q;

assign q[1] = (!I_4_11[11]) & I_4_11[10]; //MVI assign q[2] = I_4_11[11] & (!I_4_11[10]); //JMP offset assign q[3] = I_4_11[11] & I_4_11[10]; //JZ assign q[4] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!

I_4_11[8]) & I_4_11[7]; //MOV

assign q[5] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & I_4_11[8] & (!I_4_11[7]); //INC

assign q[6] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & I_4_11[8] & I_4_11[7]; //DEC

assign q[7] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & (!I_4_11[8]) & (!I_4_11[7]); //SL

assign q[8] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & (!I_4_11[8]) & I_4_11[7]; //SR

assign q[9] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & (!I_4_11[5]); //CMP

assign q[10] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & I_4_11[5] & (!I_4_11[4]); //ADD

assign q[11] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & I_4_11[5] & I_4_11[4]; //SUB

assign q[12] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & I_4_11[8] & (!I_4_11[7]); //LOAD

assign q[13] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & I_4_11[8] & I_4_11[7]; //STORE

assign q[14] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & I_4_11[6]; //JMPCD

endmodule

A.7. CONTROL UNIT

/* ~~~~CONTROL UNIT~~~~ */

module control_unit(q, z, q_c, q_s, s0, s1, s2, wr_data, rd_data, ld_pc,

71

Page 72: Project Report Ashish

ld_ir, en_dec, sub, cin, s4, xout_buf, ld_flags); input [14:1] q; input z; output reg q_c, q_s; output reg s0, s1, s2, s4; output wr_data, rd_data; output ld_pc, ld_ir; output reg ld_flags; output reg en_dec; output reg sub, cin; output reg xout_buf;

reg [3:0] E;

assign ld_pc=1'b1; assign ld_ir=1'b1;

assign rd_data=q[12]; assign wr_data=q[13];

always @ * begin

E[0]=q[1] | q[4];E[1]=q[5] | q[6] | q[9] | q[10] | q[11];E[2]=q[7] | q[8];E[3]=q[12];

case (E)4'b0010: begin s0=1'b1; s1=1'b0; end4'b0100: begin s0=1'b0; s1=1'b1; end4'b1000: begin s0=1'b1; s1=1'b1; enddefault: begin s0=1'b0; s1=1'b0; end

endcase

sub=q[6] | q[9] | q[11];cin=q[5] | q[9] | q[11];q_c=q[5] | q[10];q_s=q[6] | q[11] | q[9];ld_flags = E[1];

s4=q[2] | (q[3] && z);

s2=q[10] | q[11];

en_dec=q[1] | q[2] | q[3] | q[9] | q[13] | q[14];

xout_buf=q[4] | q[5] | q[6] | q[7] | q[8] | q[10] | q[11] | q[12];

end

72

Page 73: Project Report Ashish

endmodule

A.8. MAIN PROCESSOR UNIT

/* ~~~~MAIN PROCESSOR UNIT~~~~ */

module main_processor(clk, rst, xin, xout, wr_data, rd_data, addr_data, data_in_out, ir_in, addr_pc); input clk; input rst; input [7:0] xin; input [11:1] ir_in; inout [7:0] data_in_out;

output [7:0] xout; output wr_data; output rd_data; output [15:0] addr_data; output [15:0] addr_pc;

reg Sign, Carry, Zero; reg [7:0] A_reg, B_reg, C_reg, D_reg, X_reg; reg X_buf;

reg [7:0] data_bus; reg ld_a_temp, ld_B, ld_C, ld_D;

wire cin, sub, q_C, q_S, s, c, z; wire s0, s1, s2, s4; wire xout_buf, en_dec; wire ld_A, ld_ir, ld_pc, ld_flags; wire [11:1] I; wire [14:1] q; wire [7:0] result_au, result_su, result_mu; wire [7:0] src; wire [7:0] data_in;

arithmetic_unit au1(A_reg, src, I[8], cin, sub, q_C, q_S, result_au, s, c, z);

control_unit cu1(q, Zero ,q_C, q_S, s0, s1, s2, wr_data, rd_data, ld_pc, ld_ir, en_dec, sub, cin, s4, xout_buf, ld_flags);

instruction_decoder id1(I[11:4], q); instruction_register ir1(clk, rst, ld_ir, ir_in, I);

move_unit mu1(I[10], src, I[9:2], result_mu);

73

Page 74: Project Report Ashish

program_counter pc1(ld_pc, rst, clk, C_reg, D_reg, I[9:2], s4, q[14], addr_pc);

shift_unit su1(src, I[7], result_su);

assign xout = X_buf ? X_reg : 8'bz; assign ld_A = ld_a_temp || q[1]; assign addr_data = {C_reg, D_reg};

//SRC Multiplexerassign src = I[3] ? xin : (I[2] ? (I[1] ? D_reg : C_reg) : (I[1] ? B_reg : A_reg));

assign data_in = rd_data ? data_in_out : 8'bz; assign data_in_out = wr_data ? src : 8'bz;

always @ (posedge clk, posedge rst) begin

if (rst)begin

A_reg=8'b0;B_reg=8'b0;C_reg=8'b0;D_reg=8'b0;X_reg=8'b0;X_buf=1'b0;

Sign=1'b0;Carry=1'b0;Zero=1'b0;

end

elsebegin

X_reg = data_bus;X_buf = xout_buf & (s2 ? I[3] : I[6]);

if(ld_flags)begin

Carry=c;Zero=z;Sign=s;

end

if (ld_A)

74

Page 75: Project Report Ashish

A_reg = data_bus;

if (ld_B)B_reg = data_bus;

if (ld_C)C_reg = data_bus;

if (ld_D)D_reg = data_bus;

end

end

always @ *begin

// Destination Decoderif (!((xout_buf & (s2 ? I[3] : I[6])) || en_dec))begin

case (s2 ? I[2:1] : I[5:4])2'b00: begin ld_a_temp =1'b1; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b0; end

2'b01: begin ld_a_temp =1'b0; ld_B = 1'b1; ld_C = 1'b0; ld_D=1'b0; end

2'b10: begin ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b1; ld_D=1'b0; end

2'b11: begin ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b1; end

endcaseend

elsebegin

ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b0;end

case ({s1, s0})

2'b01: data_bus = result_au;2'b10: data_bus = result_su;2'b11: data_bus = data_in;default: data_bus = result_mu;

endcase

end

75

Page 76: Project Report Ashish

endmodule

APPENDIX B – INSTRUCTION SET

ADD A : 11’H010ADD B : 11’H011ADD C : 11’H012ADD D : 11’H013

76

Page 77: Project Report Ashish

ADD X : 11’H014

CMP A : 11’H000CMP B : 11’H001CMP C : 11’H002CMP D : 11’H003CMP X : 11’H004

DEC A, A : 11’H0C0DEC A, B : 11’H0C1DEC A, C : 11’H0C2DEC A, D : 11’H0C3DEC A, X : 11’H0C4

DEC B, A : 11’H0C8DEC B, B : 11’H0C9DEC B, C : 11’H0CADEC B, D : 11’H0CBDEC B, X : 11’H0CC

DEC C, A : 11’H0D0DEC C, B : 11’H0D1DEC C, C : 11’H0D2DEC C, D : 11’H0D3DEC C, X : 11’H0D4

DEC D, A : 11’H0D8DEC D, B : 11’H0D9DEC D, C : 11’H0DADEC D, D : 11’H0DBDEC D, X : 11’H0DC

DEC X, A : 11’H0E0DEC X, B : 11’H0E1DEC X, C : 11’H0E2DEC X, D : 11’H0E3DEC X, X : 11’H0E4

INC A, A : 11’H080INC A, B : 11’H081INC A, C : 11’H082INC A, D : 11’H083INC A, X : 11’H084

INC B, A : 11’H088

77

Page 78: Project Report Ashish

INC B, B : 11’H089INC B, C : 11’H08AINC B, D : 11’H08BINC B, X : 11’H08C

INC C, A : 11’H090INC C, B : 11’H091INC C, C : 11’H092INC C, D : 11’H093INC C, X : 11’H094

INC D, A : 11’H098INC D, B : 11’H099INC D, C : 11’H09AINC D, D : 11’H09BINC D, X : 11’H09C

INC X, A : 11’H0A0INC X, B : 11’H0A1INC X, C : 11’H0A2INC X, D : 11’H0A3INC X, X : 11’H0A4

JMP : [2’B10, < 8-Bit Data>, 1’b0]JMPCD : 11’H020JZ : [2’B11, < 8-Bit Data>, 1’b0]

LOAD A : 11’H180LOAD B : 11’H188LOAD C : 11’H190LOAD D : 11’H198LOAD X : 11’H1A0

MOV A, A : 11’H041MOV A, B : 11’H041MOV A, C : 11’H042MOV A, D : 11’H043MOV A, X : 11’H044

MOV B, A : 11’H048MOV B, B : 11’H049MOV B, C : 11’H04AMOV B, D : 11’H04BMOV B, X : 11’H04C

MOV C, A : 11’H050

78

Page 79: Project Report Ashish

MOV C, B : 11’H051MOV C, C : 11’H052MOV C, D : 11’H053MOV C, X : 11’H054

MOV D, A : 11’H058MOV D, B : 11’H059MOV D, C : 11’H05AMOV D, D : 11’H05BMOV D, X : 11’H05C

MOV X, A : 11’H060MOV X, B : 11’H061MOV X, C : 11’H062MOV X, D : 11’H063MOV X, X : 11’H064

MVI : [2’B01, < 8-Bit Data>, 1’b0]

SL A, A : 11’H100SL A, B : 11’H101SL A, C : 11’H102SL A, D : 11’H103SL A, X : 11’H104

SL B, A : 11’H108SL B, B : 11’H109SL B, C : 11’H10ASL B, D : 11’H10BSL B, X : 11’H10C

SL C, A : 11’H110SL C, B : 11’H111SL C, C : 11’H112SL C, D : 11’H113SL C, X : 11’H114

SL D, A : 11’H118SL D, B : 11’H119SL D, C : 11’H11ASL D, D : 11’H11BSL D, X : 11’H11C

SL X, A : 11’H120SL X, B : 11’H121SL X, C : 11’H122

79

Page 80: Project Report Ashish

SL X, D : 11’H123SL X, X : 11’H124

SR A, A : 11’H140SR A, B : 11’H141SR A, C : 11’H142SR A, D : 11’H143SR A, X : 11’H144

SR B, A : 11’H148SR B, B : 11’H149SR B, C : 11’H14ASR B, D : 11’H14BSR B, X : 11’H14C

SR C, A : 11’H150SR C, B : 11’H151SR C, C : 11’H152SR C, D : 11’H153SR C, X : 11’H154

SR D, A : 11’H158SR D, B : 11’H159SR D, C : 11’H15ASR D, D : 11’H15BSR D, X : 11’H15C

SR X, A : 11’H160SR X, B : 11’H161SR X, C : 11’H162SR X, D : 11’H163SR X, X : 11’H164

STORE A : 11’H1A0STORE B : 11’H1A1STORE C : 11’H1A2STORE D : 11’H1A3STORE X : 11’H1A4

SUB A : 11’H018SUB B : 11’H019SUB C : 11’H01ASUB D : 11’H01BSUB X : 11’H01C

80

Page 81: Project Report Ashish

81

Page 82: Project Report Ashish

82

Page 83: Project Report Ashish

83

Page 84: Project Report Ashish

84

Page 85: Project Report Ashish

References

[1] R. Aceves, Desarrollo de un enlace inalámbrico paratelefonía fija empleando una FPGA. Final Project at theETSII, University of Valladolid, Spain, 2006.

[2] M. Alonso, Diseño de un Entorno de Desarrollo de Altoy Bajo Nivel para un Procesador de Propósito Generalintegrado en FPGA, Final Project at the ETSII,University of Valladolid, Spain, 2003.

[3] J. del Barrio, Desarrollo sobre FPGA de un Emulador deuna Planta de Microgeneración Eléctrica, Final Projectat the ETSII, University of Valladolid, Spain, 2004.

[4] K. Chapman, “PicoBlaze 8-Bit Microcontroller forVirtex-E and Spartan-II/IIE Devices”, Xilinx XAPP213(v2.0), online at http://www.xilinx.com/xapp/xapp213.pdf, December, 2002.[5] J. Gray, “Designing a Simple FPGA-Optimized RISCCPU and System-on-a-Chip”, DesignCon’2001, online athttp://www.fpgacpu.org/gr/index.html, 2001.

[6] J. Gray, “FPGA CPU Links”, on line at http://www.fpgacpu.org/links.html, September, 2002.

[7] S. K. Knapp, “XC4000 Series Edge-Triggered and Dual-Port RAM Capability”, Xilinx XAPP065, 1996.

[8] J. Kent, “John’s FPGA Page”, online at http://members.optushome.com.au/jekent/FPGA.htm, January, 2002.

[9] G. Moore, “Cramming more components onto integratedcircuits”, Electronics Magazine, 19 April, 1965.

[10] Opencores: http://www.opencores.org/

[11] S. de Pablo et al., “A soft fixed-point Digital SignalProcessor applied in Power Electronics”, FPGAworldConference 2005, Stockholm, Sweden, 2005.

[12] I. Rodríguez, Desarrollo en FPGA de un interfaz USB.

85