Project Report Ashish
-
Upload
santanu-kumar -
Category
Documents
-
view
64 -
download
0
Transcript of Project Report Ashish
*****DESIGN OF 8-BIT RISC PROCESSOR*****
A
Project Report
submitted
in partial fulfilment
for the award of the Degree of
Bachelor of Technology
in Department of electronic & communication Engineering
by
***** ASHISH TOMAR*****
***** Enrolment No.-JNU08BTEC013 *****
Under the supervision of
***** ARPAN SHAH *****
***** Designation *****
Department of electronic & communication Engineering
Jagan Nath University
Jaipur
May 2012
1
Candidate Declaration
I, ASHSIH TOMAR .hereby declare that the work presented in this report entitled “
8 BIT RISC MICROPROCESSOR” in partial fulfillment of the requirements for
the award of Degree of Bachelor of Technology, submitted in the Department of
ELECTRONIC & COMMUNICATION at Jagan Nath University, Jaipur, is an
authentic record of my own work under the supervision of ARPAN SHAH
I also declare that the work embodied in the present project report is my original
work/extension of the existing work and has not been copied from any
Journal/thesis/book, and has not been submitted by me for any other
Degree/Diploma.
(Name & Signature of Candidate)
Enrolment No.: JNU08BTEC013
Date: 29TH MAY 2012
2
Certificate of the Supervisor(s)
This is to certify that the project report entitled"8 BIT RISC
MICROPROCESSOR” submitted by ASHISH TOMAR for the award of
Degree of Bachelor of Technology in the Department of ELECTRONIC &
COMMUNICATION of Jagan Nath University, Jaipur, is a record of authentic
work carried out by him/her under my/our supervision.
The matter embodied in this project report is the original work of the candidate and
has not been submitted for the award of any other degree or diploma. It is further
certified that he/she has worked with me/us for the required period in the
Department of ELCTRONIC & COMMUNICATION, Jagan Nath University,
Jaipur.
(Name and Signature of Supervisor)
Date:………………………………….
3
Acknowledgements
I would like to express my sincere gratitude to my project guide “ARPAN SHAH” for giving me the opportunity to work on this topic. It would never be possible for us to take this project to this level without his innovative ideas and his relentless support and encouragement.
Name of Student(s):ASHISH TOMAR(Roll Number):-0802BTEC016
4
Abstract
Field Programmable Gate Array (FPGA) devices offer a large set of advantages due to
their reconfigurable nature. Although their performance is not comparable to ASIC devices, their
flexibility is usually more important especially when fast time-to-market is an issue and the
production is on small scale basis. For that reason they are widely used in electronic applications
both during prototyping but also for final-production systems. Processors are the most demanding
when is comes to flexibility, cost and time to market.
RISC (Reduced Instruction Set Computer) are machines that have fixed size instructions,
that can execute in one clock, and instructions interface with memory via fixed mechanism.
There are only a small number of primitive instructions. RISC is based on using many simpler
and faster instructions to do the same work as a single complicated instruction on CISC
(Complex Instruction Set Computer) machine.
The aim of this project is the design of a 8-bit RISC processor for FPGA implementation.
The Processor can execute 14 instructions, including 2 memory access operations. Verilog is
chosen HDL for design entry. Xilinx Web Pack -ISE generates the programming file for the
target device, SPARTAN -3.
5
INDEX
1. INTRODUCTION
1.1. Reduced Instruction Set Computers
…………………………………..1
1.2. Field Programmable Gate Array
1.2.1. Look Up Tables……………………………………………………
4
1.2.2. Programmable Logic
Array……………………………………...4
1.2.3. Programmable Array
Logic……………………………………...4
1.2.4. FPGA……………………………………………………………….5
1.2.5. Spartan-
3…………………………………………………………...7
1.3. Hardware Description Languages
1.3.1. Importance of
HDLs………………………………………………8
1.3.2. Verilog HDL……………………………………………………….8
2. FUNCTIONAL DESCRIPTION
2.1. Block Diagram……………………………………………………………
9
2.2. Specifications……………………………………………………………..
9
2.3. Instructions
6
2.3.1. Move
Instructions………………………………………………..11
2.3.2. Arithmetic
Instructions………………………………………….11
2.3.3. Jump
Instructions………………………………………………..13
2.3.4. Memory Access
Instructions……………………………………14
2.4. Targeted Performance Parameters…………………….
……………...14
3. DESIGN ARCHITECTURE
3.1. Instruction Set Architecture
3.1.1. Instruction Format………………………………………………
15
3.1.2. Source/Destination
Format…………………………………….16
3.1.3. Instruction Examples……………………………………………
17
3.2. Modular
Design………………………………………………………..18
3.3. Top Level Entity
3.3.1. Block
Diagram…………………………………………………...19
3.3.2. Ports
Description………………………………………………..19
7
3.3.3. Architecture……………………………………………………...2
2
3.3.4. Source Register
Selection………………………………………23
3.3.5. Memory Access Operations……………………………………
23
3.3.6. Data Bus..............
………………………………………………...23
3.3.7. Destination Decoder……………………………………………
24
3.3.8. Output Port
Xout……………………………………………….24
3.4. Move Unit……………………………………………………………..25
3.5. Shift Unit………………………………………………………………26
3.6. Arithmetic Unit
3.6.1. Block
Diagram…………………………………………………..27
3.6.2. Ports
Description………………………………………………..28
3.6.3. Architecture……………………………………………………..2
9
3.6.4. Functionality…………………………………………………….3
1
3.6.5. Flags……………………………………………………………...32
3.7. Program
Counter……………………………………………………..32
3.8. Instruction
Register…………………………………………………..34
8
3.9. Instruction
Decoder…………………………………………………..34
3.10. Control Unit………………………………………………………….36
3.11. Data Memory………………………………………………………...38
3.12. Program Memory…………………………………………………....38
4. DESIGN IMPLEMENTATION
4.1. HDL Entry……………………………………………………………..39
4.2. Functional
Simulation………………………………………………..40
4.3. Synthesis……………………………………………………………….41
4.3.1. Synthesis
Constraints…………………………………………..41
4.3.2. Synthesis
Report………………………………………………..42
4.4. Translate
4.4.1. NGD Build Overview………………………………………….43
4.4.2. Conversion of Netlist to
NGD………………………………...43
4.5. MAP
4.5.1. MAP Input Files………………………………………………..44
4.5.2. MAP Output Files……………………………………………...45
4.5.3. MAP Report…………………………………………………….46
4.5.4. Post MAP Timing
Report……………………………………...46
4.6. Place & Route
4.6.1. Overview………………………………………………………..49
4.6.2. Placing…………………………………………………………...50
9
4.6.3. Routing………………………………………………………….50
4.6.4. Post PAR Timing Report………………………………………
50
4.7. BitGen
Overview……………………………………………………...53
5. SIMULATION RESULTS…………………………….…….…...….54
6. CONCLUSION
6.1. Performance Parameters…………………………………….
….......58
6.2. Future Improvements…………………………………………….…
58
APPENDIX A – RTL CODING
A.1. Move Unit………………………………………………………....…59
A.2. Shift Unit……………………………………………………….….…59
A.3. Arithmetic Unit…………………………………………………..….59
A.4. Program Counter………………………………………………..…..60
A.5. Instruction Register…………………………………………….…...61
A.6. Instruction Decoder………………………………………………...62
A.7. Control Unit………………………………………………………....63
A.8. Main Processor Unit………………………………………………..64
APPENDIX B – INSTRUCTION SET………………………………..68
10
11
8 BIT RISC MICROPROCESSOR ARCHITECHTURE
12
1. INTRODUCTION
1.1 REDUCED INSTRUCTION SET COMPUTER (RISC)
An important factor in computer design prior to 1980 was that all memories,
including the memory to store program instructions, were very expensive. So
if you were a computer designer, you would want to make each of the
instructions you design to be short but powerful. That way, when
programmers write programs using your instructions, their code will be dense
and will require little memory, but each bit of code would do a lot of work.
This would in a bunch of instructions of different lengths. Finally, you would
also end up with a very rich collection of instructions that can interface with
the computer’s data memory in many different ways: either dealing directly
with the data memory, or demanding that data first be stored into temporary
locations (“registers”) first, or some mix of the two.
Now because of this rich, powerful, and variable-length group of (compact)
instructions you’ve designed, the computer would have several
characteristics. First, each instruction might take several clock cycles to
complete. That’s because each instruction would be of a different size, so
figuring out what each one says is complicated; because each instruction
could talk to memory in a different way; and because each instruction could
potentially do a lot of work. Second, and for the same reasons just given, the
computer speed might be fairly slow.
But as time passed, memory became cheaper, compilers got better, and
13
the motivation for making small but really powerful instructions faded. In
1980, Patterson and Ditzel at Berkeley argued in favor of a different
architecture having simple instructions, all of uniform length and that simpler
operations. Sure, you’d need to specify more of these simpler instructions to
equal one of the old-style complicated instructions, and yes, this takes more
instruction memory, but memory is cheap, and your computer can run faster
and take fewer clocks.
For example, say you had a complicated instruction called “MUL” that told
the computer to take two pieces of data from memory and multiply their sum
with a third piece of data and put the result back somewhere else. This one
instruction might take 10 clock cycles to complete. Now suppose we had a
simple instruction set. To do the same work as “MUL” did, we’d need perhaps
8 different instructions (a few loads, an add, a multiply, a store, etc.). But
each instruction completes in a single clock cycle because each is so simple.
And mybe the computer’s clock can run much faster, too. The downside of
the simple system, of course, is that it requires you to store 8 times as many
instructions.
A Comparison:
• Complicated system does “MUL”:
• 1 instruction x 10 clocks/instr x 10 nsecond/clock = 100ns
• Simple system does the same work as “MUL”:
• 8 instructions x 1 clock/instr x 9 nseconds/clock = 72ns
Three systems based on this idea were built in the early 80’s: the Berkeley
machines RISC-I and RISC-II, the Stanford MIPS processor [2], and the IBM
801 [3]. Based on comparisons between these machines and what came
before, some characteristics commonly associated with RISC and CISC arose.
14
Reduced Instruction Set Computer (RISC) is based on using many
simpler and faster instructions to do the same work as a single
complicated instruction on a Complex Instruction Set Computer
(CISC).
RISC machines are machines that have
• Instructions execute in one clock
• Instructions of a fixed size
• Instructions interface with memory via fixed mechanism
• A small number of primitive instructions
• Pipelining, a way to do more than one instruction at a time.
15
1.2. FIELD PROGRAMMABLE GATE ARRAYS (FPGA)
There is a better way to implement a logic function than to hook together
discrete 74XX packages. One can use semiconductor memory, integrated
circuits known as “Programmable Logic Devices” or get a Custom made IC to
implement logic.
1.2.1 LOOK UP TABLES (MEMORY)
To implement N functions of some K variables, we need a memory with 2K
locations and N bits per location (use one address line for each variable, use
data out line for each function). Thus Memory is not efficient at implementing
functions with lots of input variables or multiple functions with different
inputs.
1.2.2 PROGRAMMABLE LOGIC ARRAY (PLA)
PLA was the first device used specially for implementing logic circuits,
introduced in the early 1079s by Philips; the array consists of 2 levels of logic
gates, a programmable “wired” AND-plane followed by a programmable
“wired” OR-plane. It is designed to implement random logic expression in
SOP form. PLAs are difficult to manufacture, because of 2 levels of
configurable logic. Further this introduces significant propagation delay.
1.2.3 PROGRAMMABLE ARRAY LOGIC (PLA)
To overcome the problems of PLA, PAL devices were developed. It has
single level of programmability. It is programmable “wired” AND-plane and
fixed OR-plane. In PLA, Logic is represented in SOP form. The number of
products in a SOP from will be limited to a fixed number. The number of
variables in each product term limited by number of input pins. The numbers
of independent functions are limited by number of output pins.
16
1.2.4 FIELD PROGRAMMABLE GATE ARRAYS
A Field Programmable Gate Array or FPGA is a semiconductor device con-
taining programmable logic components and programmable interconnects.
The programmable logic components can be programmed to duplicate the
functionality of basic logic gates such as AND, OR, XOR, NOT or more com-
plex combinatorial functions such as decoders or simple math functions. In
most FPGAs, these programmable logic components (or logic blocks, in FPGA
parlance) also include memory elements, which may be simple flip-flops or
more complete blocks of memories.
A hierarchy of programmable interconnects allows the logic blocks of an
FPGA to be interconnected as needed by the system designer, somewhat like
a one-chip programmable breadboard. These logic blocks and interconnects
can be programmed after the manufacturing process by the customer/de-
signer (hence the term "field programmable") so that the FPGA can perform
whatever logical function is needed.
FPGAs are generally slower than their application-specific integrated circuit
(ASIC) counterparts, can't handle as complex a design, and draw more
power. However, they have several advantages such as a shorter time to
market, ability to re-program in the field to fix bugs, and lower non-recurring
engineering costs.
The historical roots of FPGAs are in complex programmable logic devices
(CPLDs). CPLD logic gate densities range from the equivalent of several thou-
sand to tens of thousands of logic gates, while FPGAs typically range from
tens of thousands to several million. The primary differences between CPLDs
and FPGAs are architectural. A CPLD has a somewhat restrictive structure
consisting of one or more programmable SOP logic arrays feeding a relatively
17
small number of clocked registers. The result of this is less flexibility, with the
advantage of more predictable timing delays and a higher logic to intercon-
nect ratio. The FPGA architectures, on the other hand, are dominated by in-
terconnect. This makes them far more flexible, but also far more complex to
design for.
Another notable difference between CPLDs and FPGAs is the presence in
most FPGAs of higher-level embedded functions (such as adders and multipli-
ers) and embedded memories. A related, important difference is that many
modern FPGAs support partial in-system reconfiguration, allowing their de-
signs to be changed "on the fly" either for system upgrades or for dynamic
reconfiguration.
A recent trend has been to take the architectural approach a step further
by combining the logic blocks and interconnects of traditional FPGAs with em-
bedded microprocessors and related peripherals to form complete "systems
on a programmable chip". Examples of such hybrid technologies can be
found in the Xilinx Virtex-II PRO and Virtex-4 devices, which include one or
more PowerPC processors embedded within the FPGA's logic fabric. An alter-
nate approach is to make use of "soft" processor cores that are implemented
within the FPGA logic. These cores include the Xilinx MicroBlaze and Pi-
coBlaze, and the Altera Nios and Nios II processors, as well as third-party
processor cores.
Applications of FPGAs include DSP, software-defined radio, aerospace and
defense systems, ASIC prototyping, medical imaging, computer vision,
speech recognition, cryptography, bioinformatics, computer hardware emula-
tion and a growing range of other areas. As their size, capabilities and speed
increased they began to take over larger and larger functions to the state
where they are now marketed as competitors for full systems on chips. They
now find applications in any area or algorithm that can make use of the mas-
sive parallelism offered by their architecture.
18
To define the behavior of the FPGA the user provides a hardware descrip-
tion language (HDL) or a schematic design. Common HDLs are VHDL and Ver-
ilog. Then, using an electronic design automation tool, a technology-mapped
netlist is generated. The netlist can then be fitted to the actual FPGA archi-
tecture using a process called place-and-route, usually performed by the
FPGA Company’s proprietary place-and-route software. The user will validate
the map, place and route results via timing analysis, simulation, and other
verification methodologies. Once the design and validation process is com-
plete, the binary file generated (also using the FPGA company's proprietary
software) is used to (re)configure the FPGA device. To simplify the design of
complex systems in FPGAs, there exist libraries of predefined complex func-
tions and circuits that have been tested and optimized to speed up the de-
sign process. These predefined circuits are commonly called IP cores, and are
available from FPGA vendors and third-party IP suppliers. In a typical design
flow, an FPGA application developer will simulate the design at multiple
stages throughout the design process. Initially the RTL description in VHDL or
Verilog is simulated by creating test benches to stimulate the system and ob-
serve results. Then, after the synthesis engine has mapped the design to a
netlist, the netlist is translated to a gate level description where simulation is
repeated to confirm the synthesis proceeded without errors. Finally the de-
sign is laid out in the FPGA at which point propagation delays can be added
and the simulation run again with these values back annotated onto the
netlist.
1.2.5 SPARTAN – 3
The Spartan-3 families of FPGA offer densities ranging from 50,000 to five
million system gates. Spartan-3 FPGAs are ideally suited to a wide range of
consumer electronics applications, including broadband access, home
networking, display/projection & digital television equipment, because of
their exceptionally low cost.
19
Features:
- Up to 784 I/O pins
- 622 Mb/s data transfer rate per I/O
- Signal swing ranging from 1.14V to 3.45V
- Double Data Rate (DDR) support
- DDR, DDR2 SDRAM support up to 333 Mbps
1.3 HARDWARE DESCRIPTION LANGUAGE – VERILOG
The HDLs allow designers to model the concurrency of processes found in
hardware elements. HDLs such as Verilog HDL and VHDL became very
popular.
1.3.1. IMPORTANCE OF HDLs
HDLs have many advantages compared to traditional schematic-based
design.
Design can be described at a very abstract level by use of HDLs.
Functional Verification of the design can be done early in the design
cycle.
A textual description with comments is an easier way to develop and
debug circuits.
1.3.2. Verilog HDL
Verilog HDL has evolved as a standard hardware description language.
Verilog HDL offers many useful features for the hardware design.
Verilog is easy to learn and use. It is similar in syntax to the C
programming language.
Allows different levels of abstraction to be mixed in the same model.
Most popular synthesis tools support Verilog HDL.
20
2. FUNCTIONAL DESCRIPTION
This chapter gives the detailed information about the functionality of the
design and the implementation constraints.
2.1. BLOCK DIAGRAM
Fig 2.1 Functional Block Diagram
2.2. SPECIFICATIONS
21
The following instructions have to be implemented:
1. MOV dst, src -- dst <= src
2. INC dst, src -- dst <= src + 1
3. DEC dst, src -- dst <= src - 1
4. ADD src -- src <= src + A
5. SUB src -- src <= src - A
6. SL dst, src -- dst <= shift left src
7. SR dst, src -- dst <= shift right src
8. CMP src -- set Z flag if src = A
9. MVI A, immediate -- A <= immediate data
10. LOAD dst -- dst <= memory contents at -- address [CD]
11. STORE src -- memory at [CD] <= src
12. JMP immediate_offset -- jump to PC + imm_offset
13. JZ immediate_offset -- jump to PC + imm_offset if Z=1
14. JMPCD -- jump to address pointed by [CD]
Src, dst can be either A, B, C, D or X. PC is the program counter. [CD]
represents the contents of register C and D after concatenation. D is the least
significant byte.
A, B, C and D are 8-bit registers.
X is 8 bits wide Input and Output port.
X is visible at the periphery as "X In" and "X Out" as I/O ports. When anything
is assigned to X, it will appear at "X Out". When X is read, the contents at "X
In" will be used.
Z flag is set whenever the result of any operation is zero. C flag is set
whenever the result of any arithmetic operation results in a carry. S flag is
22
set whenever the result of any arithmetic operation results in a negative
number.
It is assumed that the program memory and the data memory have
synchronous writes and asynchronous reads.
Write operation: On a clock edge when the WR is asserted the data on the
data bus is written into the location pointed by address.
Read operation: When the RD is asserted, the contents of the location
pointed by address will be presented at the data bus by the memory. When
RD is de-asserted the memory will stop driving the bus.
For the sake of simplicity, it is assumed that both the memories are fast
enough to complete the read and write operations in one clock.
2.3. INSTRUCTIONS
2.3.1. MOVE INSTRUCTIONS
There are two move instructions
2.3.1.1. Move
INSTRUCTION: MOV dst, src
This instruction copies the 8-bit data from the source register to the
destination register. Destination & Source can be registers A/B/C/D or the
input-output port X
2.3.1.2. Move Immediate Data
INSTRUCTION: MVI, immediate data
This instruction moves the 8-bit data which is a part of the instruction
itself, to the register A.
23
2.3.2. ARITHMETIC INSTRUCTIONS
There are 5 arithmetic instructions
2.3.2.1. Increment
INCINSTRUCTION: INC dst, src
This instruction retrieves the 8-bit data from the source register/port,
increments it by 1 and stores in the destination register/port. The contents
of source register remain unchanged.
2.3.2.2. Decrement
INSTRUCTION: DEC dst, src
This instruction retrieves the 8-bit data from the source register/port,
decrements it by 1 and stores in the destination register/port. The
contents of source register remain unchanged.
2.3.2.3. Addition
INSTRUCTION: ADD src
This instruction retrieves the 8-bit data from the source register/port,
increments it by the contents of register A, and stores the result back in
the source register/port.
2.3.2.4. Subtraction
INSTRUCTION: SUB src
This instruction retrieves the 8-bit data from the source register/port,
decrements it by the contents of register A and stores the result back in
the source register/port.
2.3.2.5. Compare
INSTRUCTION: CMP src
24
This instruction retrieves the 8-bit data from the source register/port,
compares it with the contents of register A, and sets Z flag high if both are
equal. This instruction does not modify the contents of the source
register/port.
2.3.2.6. Shift Left
INSTRUCTION: SL dst, src
This instruction retrieves the 8-bit data from the source register/port and
left shifts the data by 1-bit and stores the result in destination
register/port. This instruction does not modify the contents of the source
register/port.
2.3.2.7. Shift Right
INSTRUCTION: SR dst, src
This instruction retrieves the 8-bit data from the source register/port and
right shifts the data by 1-bit and stores the result in destination
register/port. This instruction does not modify the contents of the source
register/port.
2.3.3. JUMP INSTRUCTIONS
The jump instructions are used to modify the sequence of instruction
execution, by changing the value of program counter. The processor can
execute three kinds of jump instructions.
2.3.3.1. Jump by immediate offset
INSTRUCTION: JMP immediate_offset
The value of the program counter is incremented by the value given as
the immediate data. Immediate data is a part of the instruction itself.
2.3.3.2. Jump by immediate offset if Z flag is Set
INSTRUCTION: JZ immediate_offset
25
The value of the program counter is incremented by the value given as
the immediate data, if the Z flag is high. Immediate data is a part of the
instruction itself. If the Z flag is not set, then the program counter will
increment by 1 as in other instructions.
2.3.3.3. Direct Jump
INSTRUCTION: JMPCD
The value of the program counter is changed to the address pointed by
the concatenation of the contents of the register C and D.
2.3.4. MEMORY ACCESS OPERATOINS
The processor can execute 2 memory access instructions.
2.3.4.1. Load Data
INSTRUCTION: LOAD dst
This instruction loads the destination register/port with 8-bit data retrieved
from the Data Memory. The 16-Bit address of the data memory, from
which data is retrieved, is given by the concatenation of the contents at
registers C and D.
2.3.4.2. Store Data
INSTRUCTION: STORE src
This instruction stores data memory with the 8-bit data of the source
register. The address of the data memory where the contents of source is
stored is given by the concatenation of the contents at registers C and D.
2.4. TARGETED PERFORMANCE PARAMETERS
There are few performance parameters that the design needs to reach.
The design is expected to have a worst case delay of 5ns, i.e. the
processor is expected to have a maximum frequency of 200 MHz.
26
Instruction opcodes are to be designed in such a way that
implementation requires minimum hardware delays.
An optimum instruction size is to be chosen.
Tristate buffers are allowed inside the processor.
Each instruction has to be executed in a single clock cycle.
Modification of the instructions to improve performance is allowed.
More instructions may also be added.
3. DESIGN ARCHITECTURE
This chapter explains the internal architecture of the top level entity and the
sub modules. First the instruction set architecture was finalized an then the
final design
3.1. INSTRUCTION SET ARCHITECTURE
The design is made for a total of 14 Instructions. The instruction set is
designed to have equal instruction size for every instruction. The instruction
size is chosen to be 11-Bits. The ‘X’ in the instructions means ‘don’t care’
condition i.e. the instruction will work in the same way either ‘1’ or ‘0’ is
entered in that position.
3.1.1 INSTRUCTION FORMAT
Instructions MVI, JMP and JZ have immediate data/offset as the part of
the instructions
1. MVI : 01_< 8-Bit Immediate Data>_X
2. JMP : 10_< 8-Bit Immediate Offset>_X
27
3. JZ : 01_< 8-Bit Immediate Offset>_X
Instructions MOV, INC, DEC, SL and SR have both destination and
source as the part of the instruction.
4. MOV : 00_001_< 3-Bit Destination>_< 3_Bit Source>
5. INC : 00_010_< 3-Bit Destination>_< 3_Bit Source>
6. DEC : 00_011_< 3-Bit Destination>_< 3_Bit Source>
7. SL : 00_100_< 3-Bit Destination>_< 3_Bit Source>
8. SR : 00_101_< 3-Bit Destination>_< 3_Bit Source>
The destination register/port for the instructions ADD, CMP and SUB are
same as the source, so no need of mentioning the destination in the
instruction.
9. CMP : 00_000_00X_< 3_Bit Source>
10. ADD : 00_000_010_< 3_Bit Source>
11. SUB : 00_000_011_< 3_Bit Source>
The source in the case of LOAD instruction is fixed i.e. the data
memory, and in the case of STORE instruction, the SOURCE is fixed i.e. Data
Memory
12. LOAD : 00_110_< 3_Bit Destination>_XXX
13. STORE : 00_111_XXX_< 3_Bit Source>
The direct jump instruction JMPCD doesn’t require any destination,
source or immediate data to be the part of the instruction
14. JMPCD : 00_000_1XX_XXX
28
3.1.2. SOURCE / DESTINATION FORMAT
Source can be one of the registers A, B, C, D or the input port Xin
Total of 3-bits are required to define the source
A : 000
B : 001
C : 010
D : 011
Xin : 1XX
Destination can be one of the registers A, B, C, D or the output port Xout
Total of 3-bits are required to define the source
A : 000
B : 001
C : 010
D : 011
Xout : 1XX
3.1.3. INSTRUCTION EXAMPLES
1. MOV B,A – i.e. Move the contents of register A to B
Destination is B : 001
Source is A : 000
Instruction Code : 00_001_001_000
2. ADD D – i.e. Add the contents of register D to A and store the result in
D
Destination is B : Not Required, Same as Source
Source is D : 011
Instruction Code : 00_000_010_011
29
3. MVI A7 – i.e. Move immediate data ‘A7’ to register A
Destination is A : Not Required, It is fixed
Data : 1010_0111
Instruction Code : 01_1010_0111_1 / 01_1010_0111_0
3.2. MODULAR DESIGN
Selection of the correct design hierarchy is advantageous for the following
reasons.
Improves simulation and synthesis results
Improves debugging and modifying modular designs
Allows parallel engineering (a team of engineers can work on different
parts of the design at the same time)
Improves the placement and routing of the design by reducing routing
congestion and improving timing
Allows for easier code reuse in the current design, as well as in future
designs
In my design there are modules for arithmetic operations, logical
operations, move operations, jump operations, instructions register and
control unit. All the units are interconnected inside the Top module. The
different modules are:
Move unit
Shift Unit
Arithmetic Unit
Program Counter
Instruction Register
30
MAIN PROCESSOR UNIT
Xin
Xout
Clk
Rst
IR_in
wr_datard_data
8
8
8
11
16
16
Data_inout
Addr_PC
Instruction Decoder
Control Unit
Data Memory.
Program Memory.
Selection of source register, Selection of the Destination register, Selection
of input data to the destination register, control signal for the buffer for Xout
and control signal for data bus connected to the data memory are generated
inside the top level entity.
3.3. TOP LEVEL ENTITY
3.3.1. BLOCK DIAGRAM
Addr_data
31
3.3.2. PORTS DESCRIPTION
1. XinLength : 8 BitType : InputUse : This port can be used by the user for providing immediate
data for various instructions
2. XoutLength : 8 BitType : OutputUse : This port can be used by the user for getting the
immediate result of various instructions
3. ClkLength : 1 BitType : InputUse : This port provides the global clock signal used to
synchronize the internal registers, program memory and the data memory
4. RstLength : 1 BitType : InputUse : This port provides the global reset signal to all the internal
registers, program memory, data memory, instruction register etc.
5. Addr_PCLength : 16 BitType : OutputUse : This port serves as the address lines for the 6K x 11 Bits
program memory
6. IR_inLength : 11 Bit
32
Type : InputUse : This port provides 11-bit instruction to the processor
fetched from the program memory
7. Data_inoutLength : 8 itType : InoutUse : This port provides the 8-bit data to-and-from the data
memory. Buffers control the direction of data flow
8. Addr_dataLength : 16 BitType : OutputUse : This port serves as the address lines for the 6K x 8 Bits
data memory
9. wr_dataLength : 1 BitType : OutputUse : This port provides the write signal to the data memory
when data has to be written to the data memory
10. rd_dataLength : 1 BitType : OutputUse : This port provides the read signal to the data memory
when data has to be read from the data memory
33
34
3.3.4. SOURCE REGISTER SELECTION
There are four registers A, B, C, D and one input port Xin. The source can
be identified with the help of Instruction bits I[3:1}. The instruction bits I[2:1]
are used to identify the source register A/B/C/D. The instruction bit I[3] is
used to identify that weather the source is input port Xin or one of the
registers.
A 8-bit, 4-to-1 multiplexer with the select lines as I[2:1] is used to identify
the register. Another 8-bit, 2-to-1 multiplexer with the select line as I[3] is
used to select either the input port Xin or the already selected register. For
e.g. if I[3] bit is ‘1’ then irrespective of the bits I[2:1], the source will be input
port Xin and if I[3] is ‘0’ then the source will be selected according to the
value of the bits I[2:1].
3.3.5. MEMORY ACCESS OPERATIONS
There are two memory access operations, load and store. The load
operation and the store use the same bi-directional data bus to read and
write data. So the direction of flow of data is controlled with the help of 2, 8-
bit tristate buffers. The control lines wr_data and rd_data are generated
inside the control unit. The write/store operation is synchronous and the
read/load operation is asynchronous. The address for the data bus is given by
the concatenation of the registers C and D.
35
3.3.6. DATA BUS
The contents of the source register/port are modified by 3-parallel
modules, i.e. Move Unit, Arithmetic Unit and Shift Unit. The data to be sent of
data bus is selected by a 8-bit, 4-to-1 multiplexer with three of the inputs
being the three above mentioned units and the fourth input being the 8-bit
line from data memory (for LOAD instruction). The select lines for this
multiplexer are generated by the control unit.
3.3.7. DESTINATION DECODER
The Data bus is the common input to all the registers. The data from the
data bus is stored on a particular destination register by enabling the ‘load’
signal of that particular register. The load signals are generated using a 2X4
decoder. The four outputs represent the load signals of the four registers.
The 2-bit input to the decoder comes from the destination bits of the
instruction.
The destination is represented in the bits I[6:4] of the instruction. Only two
(least significant, I[5:4]) of these bits are required to select one of the four
registers, the third bit is used to select Xout as the destination.
The instructions ADD and SUB have the destination same as the source. So
for these two instructions the bits used as input to the destination decoder
are I[2:1]. A 2-bit, 2-to-1 multiplexer is used for this purpose. The input to this
MUX are I[5:4] and I[2:1]. The select line is generated inside the control unit.
One more signal ‘En_dec’ is used which serves as the enable for the
decoder. This signal is also generated inside the control unit. If the control
signal for Xout goes high, then also the destination decoder get disabled.
36
srcI[9:2]
8 8
3.3.7. OUTPUT PORT Xout
There is a latency of one clock between the loading of the instruction and
the storing of result when the destination is selected to be one of the
registers, because the registers are loaded with the result only on positive
edge of the clock. But when the destination is selected to be Xout port, then
there is no latency. So to make the operations symmetric I have included one
more 8-bit register X. The output of this register is connected to the port
Xout. So the value of Xout also changes only on the rising edge of the clock.
A 1-bit register is also being introduced in the design to store the value of
control signal for tristate buffer for Xout.
As for the destination decoder, the control signal for the Xout tristate
buffer is generated using a 1-bit, 2-to-1 multiplexer. The inputs to this MUX
are I[6] and I[3]. The select line is generated inside the control signal.
Another signal ‘Xout_buf ‘is used which is ANDed with the output of the MUX.
The result is stored in a 1-bit register ‘X_buf’, the output of which is
connected to the control line of the tristate buffer for Xout. The signal
‘Xout_buf’ is generated inside the control unit.
3.4. MOVE UNIT
The move unit performs two instructions:
1. MOV dst, src
2. MVI, immediate data
3.4.1. ARCHITECTURE
37
Instruction Instruction Code
1. MVI immediate data : 0__1__< 8-bit immediate data>_X
2. MOV dst, src : 0__0__001_< 3-bit Destination>_< 3-bit
source>
I[10]
So depending upon the instruction bit I[10 ], the multiplexer will select either
the instruction bits I[9:2] (i.e. the immediate data) or the source
3.5. SHIFT UNIT
The shift unit performs two instructions:
1. SL dst, src
2. SR dst, src
3.5.1. ARCHITECTURE
38
1 0
{src[6:0], 0}{0, src[7:1]}
8 8
8
I[7]
Result_suInstruction Instruction Code
1. SL dst, src : 00_10__0_< 3-bit Destination>_< 3-bit
source>
2. SR dst, src : 00_10__1_< 3-bit Destination>_< 3-bit
source>
I[7]
So depending upon the instruction bit I[7 ], the multiplexer will either left
shift the source by 1-bit or right shift by 1-bit.
3.6. ARITHMETIC UNIT
The arithmetic unit performs five instructions:
1. INC dst, src
2. DEC dst, src
3. ADD src
4. SUB src
5. CMP src
3.6.1. BLOCK DIAGRAM
39
ARITHMETIC UNIT
8
8 8
src A Cin Sub I[8] q_S q_C
Result_au S C Z
3.6.2. PORTS DESCRIPTION
1. srcLength : 8 BitType : InputUse : This port provides the data from the source register/port.
2. ALength : 8 BitType : InputUse : This port always provides the contents of register A for
SUB, ADD and CMP instructions.
3. CinLength : 1 BitType : InputUse : This port provides the carry-in signal to the adder inside
the arithmetic unit. This signal is generated inside the control unit.
4. SubLength : 1 BitType : InputUse : This signal is generated inside the control unit. If Sub
goes high then the 2nd input the adder is converted to its 2’s complement form
5. I[8]Length : 1 BitType : Input
40
Use : This is the 8th bit of the instruction. This line is used to select the 2nd input to the adder inside the unit.
6. q_CLength : 1 BitType : InputUse : This signal is enable signal for the carry signal for the
carry flag.
7. q_SLength : 1 BitType : InputUse : This signal is enable signal for the Sign signal for the Sign
flag.
8. Result_auLength : 8 BitType : OutputUse : This port gives the result of the arithmetic unit.
9. ZLength : 1 BitType : OutputUse : This signal is given to the Zero flag inside the top entity
10. CLength : 1 BitType : OutputUse : This signal is given to the Caary flag inside the top entity
11. SLength : 1 BitType : OutputUse : This signal is given to the Sign flag inside the top entity
3.6.3. ARCHITECTURE
The basic block inside the arithmetic unit is an 8-bit ripple carry adder.
41
One input to the adder is fixed, i.e. the 8-bit source. The second input to the
adder depends upon the instruction to execute. The subtraction operations
are also performed using the same adder by performing the 2’s complement
operation of the input to be subtracted by using 8 XOR gates.
One input to the arithmetic unit comes from the Source register/port
and the second input is fixed to register A
Sign, Carry and Zero flags are the part of the top level entity, but their
values are generated inside the arithmetic unit only.
Inst.
No.
InstInst. Code I/P1 I/P2 Cin Sub
Operatio
n
q5
q6
q10
q11
q9
INC
DEC
ADD
SUB
CMP
000_ 1 _0_<dst><src>
000_ 1 _1_<dst><src>
000_ 0 _00_10__<src>
000_ 0 _00_11__<src>
000__0__00_0X__<src>
Src
Src
Src
Src
Src
0
0
A
A
A
1
0
0
1
1
0
1
0
1
1
Src + 1
Src - 1
Src + A
Src - A
Src - A
42
8-Bit Adder
-------I/P2--------I/P1
Cin
Sub
8
88
I[8]
0
A
Src
8
Result_au
Z
Cout
Depending upon the value of instruction bit I[8], the input 2 will be ei-
ther 0 or register A
Instruction nos. given here are generated by the instruction register
discussed later
Thus by controlling the values Cin, Sub and I/P2, different operation can
be performed by the same unit.
o If Sub is ‘1’ and Cin is ‘0’ then the 2nd input is converted to its 1’s
complement form.
o If Sub is ‘1’ and Cin is ‘0’ then the 2nd input is converted to its 2’s
complement form i.e. to its negative value.
43
3.6.4. FUNCTIONALITY
1. INC: The 2nd input to the adder is 0 and Cin is high, so the result
comes out to be source +1
2. DEC: The 2nd input is Zero, Sub is high and Cin is low, the result is
source + 1’s complement of 0 i.e. 1111_1111 which is also the 2’s com-
plement of 1. So the result comes out to be source – 1
3. ADD: Cin and Sub both are low, so the 2nd input i.e. A, is passed as it
is. The result comes out to be source + contents of register A.
4. SUB: Cin and Sub both are high, so the 2nd input i.e. A, is converted to
its 2’s complement form i.e. its negative value. The result comes out to
be source - contents of register A.
5. CMP: Its functionality is exactly the same as Sub, the only difference
being that the result in this case is not stored in any register.
3.6.5. FLAGS
The flags are the part of the top level entity, but the values to be
loaded in them are generated inside the arithmetic unit
1. Carry: This is be high only if there is a carry out and the instruction be-
ing executed is ADD or INC
2. Sign: This is high only if carry out is low and the instruction being exe-
cuted is SUB, CMP or DEC
3. Zero: This is high if the result of the arithmetic unit is 0
The signals q_S and q_C controlling the Sign and Carry flags are generated
inside the Control unit.
44
PROGRAM COUNTER
This unit performs three instructions:
1. JMP immediate offset
2. JZ immediate offset
3. JMPCD
3.7.1. ARCHITECTURE
If instruction is JMPCD i.e. q14 is high then the program counter will be
loaded with the value stored in registers C & D
If q14 is low then there can be three cases
1. Instruction is JMP
2. Instruction is JZ and Zero flag is set. In both these cases the pro-
gram counter will be loaded with a new value which is equal to
the old value plus the 8- bit immediate offset which is specified in
the instruction bits I[9:2].
3. If all of the above conditions are not met then the program
counter will be just incremented by 1.
45
16-BIT ADDER
PROGRAM COUNTER
q14
CD16
16
rst
clk
16
Address Lines forProgram Memory
0S4
I[9:2]
0000-0001
8
8
Signal S4 is generated inside the control unit
3.8. INSTRUCTION REGISTER
The instruction register is a 11-bit triggered register. It loads the instructions
on the positive edge of the clock. The instruction to the instruction register is
fed from the program memory. The address for the program memory is taken
by the value of the program counter.
46
3.9. INSTRUCTION DECODER
This unit is used to identity the instruction being executed. The input to this
unit is the op-code part of the instruction which comes from the instruction
register. Output of this unit is a 14-bit port where each bit represents one of
the 14 instructions. All the instructions have different operation codes, so at
time only one of the 14 bits will be high in the output.
1. MVI : 01_< 8-Bit Immediate Data>_X
q[1] = I[11]’ I[10]
2. JMP : 10_< 8-Bit Immediate Offset>_X
q[2] = I[11] I[10]’
3. JZ : 01_< 8-Bit Immediate Offset>_X
q[3] = I[11] I[10]
4. MOV : 00_001_< 3-Bit Destination>_< 3_Bit Source>
q[4] = I[11]’ I[10]’ I[9]’ I[8]’ I[7]
5. INC : 00_010_< 3-Bit Destination>_< 3_Bit Source>
q[5] = I[11]’ I[10]’ I[9]’ I[8] I[7]’
6. DEC : 00_011_< 3-Bit Destination>_< 3_Bit Source>
q[6] = I[11]’ I[10]’ I[9]’ I[8] I[7]
7. SL : 00_100_< 3-Bit Destination>_< 3_Bit Source>
q[7] = I[11]’ I[10]’ I[9] I[8]’ I[7]’
8. SR : 00_101_< 3-Bit Destination>_< 3_Bit Source>
q[8] = I[11]’ I[10]’ I[9] I[8]’ I[7]
47
9. CMP : 00_000_00X_< 3_Bit Source>
q[9] = I[11]’ I[10]’ I[9]’ I[8]’ I[7]’ I[6]’ I[5]’
10. ADD : 00_000_010_< 3_Bit Source>
q[10] = I[11]’ I[10]’ I[9]’ I[8]’ I[7]’ I[6]’ I[5] I[4]’
11. SUB : 00_000_011_< 3_Bit Source>
q[11] = I[11]’ I[10]’ I[9]’ I[8]’ I[7]’ I[6]’ I[5] I[4]
12. LOAD : 00_110_< 3_Bit Destination>_XXX
q[12] = I[11]’ I[10]’ I[9] I[8] I[7]’
13. STORE : 00_111_XXX_< 3_Bit Source>
q[13] = I[11]’ I[10]’ I[9] I[8] I[7]
14. JMPCD : 00_000_1XX_XXX
q[14] = I[11]’ I[10]’ I[9]’ I[8]’ I[7]’ I[6]
3.10. CONTROL UNIT
Control unit generates many control signals required by different modules
and the top level entity. The inputs to the control unit are Decoded
Instructions from the instruction decoder and the values of the flags. The
output is many control signals.
Signals to arithmetic unit
1. q_C: This is the enabling signal for the carry flag. It is high only if the in-
struction being executed is ADD(q10) or INC(q5).
q_C = q[5] + q[10]
48
2. q_S: This is the enabling signal for the sign flag. It is high only if the in-
struction being executed is SUB(q11) or DEC(q6) or CMP(q9).
q_C = q[6] + q[9] + q[11]
3. Sub: As shown in the table in arithmetic unit, this signal is high in the
case of DEC, CMP and SUB
Sub = q[6] + q[9] + q[11]
4. Cin: As shown in the table in arithmetic unit, this signal is high in the
case of INC, CMP and SUB
Cin = q[5] + q[9] + q[11]
Signals to Program Counter
1. S4: This signal selects the immediate offset to be added to contents of
the program counter. It is high if “the instruction being executed is
JMP” or if “the instruction begin executed is JZ and Zero flag is set at
the same time”
S4 = q[2] + q[3].Z
Signals to Data Memory
1. wr_data: This signal goes high if the instruction being executed is
STORE.
wr_data = q[13]
2. rd_data: This signal goes high if the instruction being executed is LOAD.
rd_data = q[12]
Signals to Top level Entity
49
1. ld_flags: This is the load signals for the flags. This signal is high if the
instruction being executed in an arithmetic instruction.
ld_flags = q[5] + q[6] + q[9] + q[10] + q[11]
2. S2: This signal selects the either destination or the source bits for the
input to the destination decoder. This signal is high only if the instruc-
tions being executed is ADD or SUB which have destination same as
the source.
S2 = q[10] + q[11]
3. Xout_buf: This signal is ANDed with the destination bit to generate the
control signal for the Xout tristate buffer. This signal is high only if the
instruction being executed involves any destination.
Xout_buf = q[4] + q[5] + q[6] + q[7] + q[8] + q[10] + q[11] + q[12]
4. En_dec: This signal is NORed with the control signal of Xout tristate
buffer to generate the enable signal for the Destination Decoder. This
signal is high only if the instruction being executed doesn’t involve any
destination. So if either the control signal for Xout goes high or this
En_dec signal goes high, it will disable the destination decoder.
En_dec = q[1] + q[2] + q[3] + q[9] + q[13] + q[14]
5. S1, S0: These are the select lines for the multiplexer which selects the
result of which unit should be present on the data bus.
Their value is 00 for Move Unit
01 for Arithmetic Unit
10 for Shift Unit
11 for LOAD Instruction
So these signals are generated by 4X2 Encoder. The Input to the
encoder begin E[3:0] where:
E[0] = q[1] + q[4]
50
E[1] = q[5] + q[6] + q[9] + q[10] + q[11]
E[2] = q[7] + q[8]
E[3] = q[12]
3.11. DATA MEMORY
The data memory is a block RAM of size 65kbytes. The data memory has a
synchronous write and asynchronous read. The address lines for it comes
from the concatenation of the contents of the registers C & D. The data line
for the memory is bidirectional. Write and Read operations are controlled by
the wr_data and rd_data signals generated by the control unit.
3.12. PROGRAM MEMORY
The program memory is a block RAM with 65536 locations and 11 bits per
location. This stores the instructions to be executed by the processor. Read
operation is asynchronous. The address line for the program memory comes
from the 16-bit program counter.
4. DESIGN IMPLEMENTATION
This chapter details the complete design flow for the FPGA implementation
of the design. The target device is SPARTAN 3.
51
Fig 4.1 FPGA Design Flow
4.1. HDL ENTRY
The first step in implementation of the design is creating the HDL code
based on the design criteria. The following recommendations were taken care
of to create effective design.
Using RTL Code
Usage of register transfer level (RTL) code and avoiding (when possible)
instantiating specific components creates designs with the following
characteristics.
Readable code
Faster and simpler simulation
Portable code for migration to different device families
52
Reusable code for future designs
In my design, Verilog is the HDL used to make the design entry.
4.2. FUNCTIONAL SIMULATION
Functional or RTL simulation is used to verify the syntax and functionality of
the design. The following recommendations were used for simulating the
design.
Typically with larger hierarchical HDL designs, one should perform
separate simulations on each module before testing the entire design.
This makes it easier to debug your code.
Once each module functions as expected, a test bench is created to
verify that entire design functions as planned. The same test bench is
used again for the final timing simulation to confirm that the design
functions as expected under worst-case delay conditions.
My design’s functionality was tested successfully
4.3. SYNTHESIS
After creating HDL design, you must synthesize it. During synthesis,
behavioral information in the HDL file is translated into a structural netlist,
and the design is optimized for a Xilinx device. Xilinx offers its own synthesis
tool, Xilinx Synthesis Technology (XST). XST is a Xilinx® tool that synthesizes
HDL designs to create Xilinx® specific netlist files called NGC files. The NGC
file is a netlist that contains both logical design data and constraints that
takes the place of both EDIF and NCF files.
53
4.3.1. SYNTHESIS CONSTRAINTS
Constraints are essential to help you meet your design goals or obtain the
best implementation of your design. Constraints are available in XST to
control various aspects of the synthesis process itself, as well as placement
and routing. Synthesis algorithms have been tuned to automatically provide
optimal results in most situations. In some cases, however, synthesis may fail
to initially achieve optimal results; some of the available constraints allow
you to explore different synthesis alternatives to meet your specific needs.
Following is a list of some HDL Options that can be set within the HDL
Options tab of the Process Properties dialog box for FPGA devices:
FSM Encoding Algorithm
Case Implementation Style
FSM Style
RAM Extraction
RAM Style
Mux Style
Decoder Extraction
Priority Encoder Extraction
Shift Register Extraction
Logical Shifter Extraction
4.3.2. SYNTHESIS REPORT
While synthesizing the design, Xilinx XST creates a synthesis report also
having my details like Device utilization, Macro Statistics, Timing etc. The
following shows some parts of the synthesis report generated for the top
level entity of my design
HDL Synthesis Report====================
54
Macro Statistics----------------# Adders/Subtractors : 2 16-bit adder carry out : 1 8-bit adder carry in/out : 1# Registers : 9 1-bit register : 2 11-bit register : 1 16-bit register : 1 8-bit register : 5# Multiplexers : 3 1-bit 4-to-1 multiplexer : 2 8-bit 4-to-1 multiplexer : 1# Tristates : 3 8-bit tristate buffer : 3# Xors : 1 8-bit xor2 : 1
Device utilization summary:---------------------------
Selected Device : 3s200pq208-5
Number of Slices: 82 out of 1920 4% Number of Slice Flip Flops: 77 out of 3840 2% Number of 4 input LUTs: 146 out of 3840 3% Number of bonded IOBs: 71 out of 141 50% Number of GCLKs: 1 out of 8 12%
TIMING REPORT-------------
Minimum period: 10.599ns (Maximum Frequency: 94.347MHz) Minimum input arrival time before clock: 7.845ns Maximum output required time after clock: 10.277ns Maximum combinational path delay: 7.862ns
4.4. TRANSLATE
4.4.1. NGD Build Overview
NGD Build reads in a netlist file in EDIF or NGC format and creates a NGD file
that contains a logical description of the design in terms of logic elements,
such as AND gates, OR gates, decoders, flip-flops, and RAMs.
The NGD file contains both a logical description of the design reduced to
55
Xilinx Native Generic Database (NGD) primitives and a description of the
original hierarchy expressed in the input netlist. The output NGD file can be
mapped to the desired device family.
4.4.2. Conversion of Netlist to NGD File
NGD Build performs the following steps to convert a netlist to an NGD file:
1. Reads the source netlist. NGD Build invokes the Netlist Launcher. The
Netlist Launcher determines the input netlist type and starts the appropriate
netlist reader program. The netlist reader incorporates NCF files associated
with each netlist. NCF files contain timing and layout constraints for each
module.
2. Reduces all components in the design to NGD primitives. NGD Build
merges components that reference other files. NGD Build also finds the
appropriate system library components, physical macros (NMC files), and
behavioral models.
3. Checks the design by running a Logical Design Rule Check (DRC) on the
converted design Logical DRC is a series of tests on a logical design.
4. Writes an NGD file as output
4.5. MAP
The MAP program maps a logical design to a Xilinx FPGA. The input to MAP
is an NGD file, which is generated using the NGD Build program. The NGD file
contains a logical description of the design that includes both the hierarchical
components used to develop the design and the lower level Xilinx primitives.
The NGD file also contains any number of NMC (macro library) files, each of
which contains the definition of a physical macro. MAP first performs a logical
56
DRC (Design Rule Check) on the design in the NGD file. MAP then maps the
design logic to the components (logic cells, I/O cells, and other components)
in the target Xilinx FPGA. The output from MAP is an NCD (Native Circuit
Description) file—a physical representation of the design mapped to the
components in the targeted Xilinx FPGA. The mapped NCD file can then be
placed and routed using the PAR program.
4.5.1. MAP Input Files
MAP uses the following files as input:
• NGD file—Native Generic Database file. This file contains a logical
description of the design expressed both in terms of the hierarchy used when
the design was first created and in terms of lower-level Xilinx primitives to
which the hierarchy resolves. The file also contains all of the constraints
applied to the design during design entry or entered in a UCF (User
Constraints File). The NGD file is created by the NGD Build program.
• NMC file—Macro library file. An NMC file contains the definition of a physical
macro. When there are macro instances in the NGD design file, NMC files are
used to define the macro instances. There is one NMC file for each type of
macro in the design file.
• Guide NCD file—An optional input file generated from a previous MAP run.
An NCD file contains a physical description of the design in terms of the
components in the target Xilinx device. A guide NCD file is an output NCD file
from a previous MAP run that is used as an input to guide a later MAP run.
• Guide NGM file—A binary design file containing all of the data in the input
NGD file as well as information on the physical design produced by the
mapping.
57
4.5.2. MAP Output Files
Output from MAP consists of the following files:
• NCD (Native Circuit Description) file—a physical description of the design in
terms of the components in the target Xilinx device.
• PCF (Physical Constraints File)—an ASCII text file that contains constraints
specified during design entry expressed in terms of physical elements. The
physical constraints in the PCF are expressed in Xilinx’s constraint language.
MAP creates a PCF file if one does not exist or rewrites an existing file.
• NGM file—a binary design file that contains all of the data in the input NGD
file as well as information on the physical design produced by mapping. The
NGM file is used to correlate the back-annotated design netlist to the
structure and naming of the source design.
• MRP (MAP report)—a file that contains information about the MAP run. The
MRP file lists any errors and warnings found in the design, lists design
attributes specified, and details on how the design was mapped (for example,
the logic that was removed or added and how signals and symbols in the
logical design were mapped into signals and components in the physical
design). The file also supplies statistics about component usage in the
mapped design.
4.5.3. MAP REPORT
The MAP report is generated in the following format
______________________________
Table of Contents
----------------------------------------------
58
Section 1 - Errors
Section 2 - Warnings
Section 3 - Informational
Section 4 - Removed Logic Summary
Section 5 - Removed Logic
Section 6 - IOB Properties
Section 7 - RPMs
Section 8 - Guide Report
Section 9 - Area Group Summary
Section 10 - Modular Design Summary
Section 11 - Timing Report
Section 12 - Configuration String Information
Section 13 - Additional Device Resource Counts
____________________________________
4.5.4. POST MAP TIMING REPORT
The timing report generated after MAP process contains all the component
delays. But this report doesn’t take care of the interconnect delays. So the
delays for the same type of components come out to be exactly same.
The Post MAP Timing Report for my Design is:
Data Sheet report:-----------------All values displayed in nanoseconds (ns)
Setup/Hold to clock clk+-------------+------------+------------+| | | || Clock | Setup to | Hold to |
59
| Source | clk (edge) | clk (edge) |+-------------+------------+------------+data_in_out<0>| 1.356(R)| 0.134(R)data_in_out<1>| 1.305(R)| 0.134(R)data_in_out<2>| 1.356(R)| 0.134(R)data_in_out<3>| 1.305(R)| 0.134(R)data_in_out<4>| 1.356(R)| 0.134(R)data_in_out<5>| 1.305(R)| 0.134(R)data_in_out<6>| 1.356(R)| 0.134(R)data_in_out<7>| 1.305(R)| 0.134(R)ir_in<10> | 3.202(R)| 0.643(R)ir_in<11> | 3.202(R)| -1.117(R)ir_in<1> | 3.202(R)| -1.117(R)ir_in<2> | 3.202(R)| -1.117(R)ir_in<3> | 3.202(R)| -1.117(R)ir_in<4> | 3.202(R)| -1.117(R)ir_in<5> | 3.202(R)| -1.117(R)ir_in<6> | 3.202(R)| -1.117(R)ir_in<7> | 3.202(R)| 0.643(R)ir_in<8> | 3.202(R)| 0.643(R)ir_in<9> | 3.202(R)| -1.117(R)xin<0> | 4.237(R)| -0.832(R)xin<1> | 4.247(R)| -0.349(R)xin<2> | 4.026(R)| -0.832(R)xin<3> | 4.036(R)| -0.832(R)xin<4> | 3.815(R)| -0.832(R)xin<5> | 3.825(R)| -0.832(R)xin<6> | 3.380(R)| -0.349(R)xin<7> | 3.004(R)| -0.832(R)+-------------+------------+------------+
Clock clk to Pad+-------------+------------+| | clk (edge) || Destination | to PAD |+-------------+------------+addr_data<0> | 6.407(R)addr_data<10> | 6.407(R)addr_data<11> | 6.407(R)addr_data<12> | 6.407(R)addr_data<13> | 6.407(R)addr_data<14> | 6.407(R)addr_data<15> | 6.407(R)addr_data<1> | 6.407(R)addr_data<2> | 6.407(R)addr_data<3> | 6.407(R)addr_data<4> | 6.407(R)
60
addr_data<5> | 6.407(R)addr_data<6> | 6.407(R)addr_data<7> | 6.407(R)addr_data<8> | 6.407(R)addr_data<9> | 6.407(R)addr_pc<0> | 6.407(R)addr_pc<10> | 6.407(R)addr_pc<11> | 6.407(R)addr_pc<12> | 6.407(R)addr_pc<13> | 6.407(R)addr_pc<14> | 6.407(R)addr_pc<15> | 6.407(R)addr_pc<1> | 6.407(R)addr_pc<2> | 6.407(R)addr_pc<3> | 6.407(R)addr_pc<4> | 6.407(R)addr_pc<5> | 6.407(R)addr_pc<6> | 6.407(R)addr_pc<7> | 6.407(R)addr_pc<8> | 6.407(R)addr_pc<9> | 6.407(R)data_in_out<0>| 7.565(R)data_in_out<1>| 7.565(R)data_in_out<2>| 7.565(R)data_in_out<3>| 7.565(R)data_in_out<4>| 7.565(R)data_in_out<5>| 7.565(R)data_in_out<6>| 7.565(R)data_in_out<7>| 7.565(R)rd_data | 7.164(R)wr_data | 7.164(R)xout<0> | 6.618(R)xout<1> | 6.618(R)xout<2> | 6.618(R)xout<3> | 6.618(R)xout<4> | 6.618(R)xout<5> | 6.618(R)xout<6> | 6.618(R)xout<7> | 6.618(R)Pad to Pad+--------------+---------------+---------+| Source Pad |Destination Pad| Delay |---------------+---------------+---------+xin<0> |data_in_out<0> | 6.159xin<1> |data_in_out<1> | 6.159xin<2> |data_in_out<2> | 6.159xin<3> |data_in_out<3> | 6.159xin<4> |data_in_out<4> | 6.159xin<5> |data_in_out<5> | 6.159
61
xin<6> |data_in_out<6> | 6.159xin<7> |data_in_out<7> | 6.159+--------------+---------------+---------+
Analysis completed Tue May 30 13:11:29 2006
4.6. PLACE AND ROUTE
4.6.1. OVERVIEW
After you create a Native Circuit Description (NCD) file with the MAP
program, you can place and route that design file using PAR. PAR accepts a
mapped NCD file as input, places and routes the design, and outputs an NCD
file to be used by the bit stream generator (BitGen). The NCD file output by
PAR can also be used as a guide file for additional runs of PAR that may be
done after making minor changes to your design.
PAR places and routes a design based on the following considerations:
• Timing-driven—The Xilinx timing analysis software enables PAR to place
and route a design based upon timing constraints.
• Non Timing-driven (cost-based)—Placement and routing are performed
using various cost tables that assign weighted values to relevant factors such
as constraints, length of connection, and available routing resources. Non
timing-driven placement and routing is used if no timing constraints are
present.
4.6.2 PLACING
The PAR placer executes multiple phases of the placer. PAR writes the NCD
after all the placer phases are complete. During placement, PAR places
62
components into sites based on factors such as constraints specified in the
PCF file, the length of connections, and the available routing resources.
4.6.3. ROUTING
After placing the design, PAR executes multiple phases of the router. The
router performs a converging procedure for a solution that routes the design
to completion and meets timing constraints. Once the design is fully routed,
PAR writes an NCD file, which can be analyzed against timing. PAR writes a
new NCD as the routing improves throughout the router phases.
Note: Timing-driven place and timing-driven routing are automatically
invoked if PAR finds timing constraints in the physical constraints file
4.6.3. POST PAR TIMING REPORT
The timing report generated after MAP process contains all the component
delays. But the timing report generated after PAR have both the component
as well as the interconnect delays. The interconnect delays comes out to be
comparable to the component delays. Now the delays for the same type of
components will not be same because of different routing paths.
The Post PAR Timing Report for my Design is:
Data Sheet report:-----------------All values displayed in nanoseconds (ns)
Setup/Hold to clock clk+-------------+------------+------------+| Clock | Setup to | Hold to || Source | clk (edge) | clk (edge) |+-------------+------------+------------+data_in_out<0>| 3.411(R)| 0.111(R)data_in_out<1>| 3.599(R)| -0.001(R)|
63
data_in_out<2>| 3.271(R)| 0.056(R)|data_in_out<3>| 3.497(R)| -0.016(R)|data_in_out<4>| 3.551(R)| 0.085(R)|data_in_out<5>| 4.543(R)| -0.583(R)|data_in_out<6>| 3.925(R)| -0.144(R)|data_in_out<7>| 3.065(R)| -0.077(R)|ir_in<10> | 2.622(R)| 0.534(R)|ir_in<11> | 2.623(R)| -0.401(R)|ir_in<1> | 2.623(R)| -0.400(R)|ir_in<2> | 2.622(R)| -0.400(R)|ir_in<3> | 2.623(R)| -0.400(R)|ir_in<4> | 2.623(R)| -0.401(R)|ir_in<5> | 2.623(R)| -0.400(R)|ir_in<6> | 2.622(R)| -0.400(R)|ir_in<7> | 2.622(R)| 0.753(R)|ir_in<8> | 2.623(R)| 0.394(R)|ir_in<9> | 2.623(R)| -0.401(R)|xin<0> | 8.194(R)| -2.109(R)|xin<1> | 7.703(R)| -1.613(R)|xin<2> | 7.778(R)| -1.795(R)|xin<3> | 8.995(R)| -1.671(R)|xin<4> | 7.548(R)| -1.576(R)|xin<5> | 8.228(R)| -1.709(R)|xin<6> | 7.885(R)| -1.298(R)|xin<7> | 6.916(R)| -2.378(R)|+-------------+------------+------------+
Clock clk to Pad+-------------+------------+| Destination | clk (edge) || | to PAD |+-------------+------------+addr_data<0> | 9.442(R)addr_data<10> | 9.144(R)addr_data<11> | 9.149(R)addr_data<12> | 8.825(R)addr_data<13> | 8.607(R)addr_data<14> | 8.525(R)addr_data<15> | 8.784(R)addr_data<1> | 8.424(R)addr_data<2> | 8.638(R)addr_data<3> | 9.100(R)addr_data<4> | 8.361(R)addr_data<5> | 8.380(R)addr_data<6> | 8.640(R)addr_data<7> | 9.172(R)addr_data<8> | 8.907(R)addr_data<9> | 8.852(R)
64
addr_pc<0> | 9.099(R)addr_pc<10> | 8.084(R)addr_pc<11> | 8.290(R)addr_pc<12> | 8.528(R)addr_pc<13> | 8.178(R)addr_pc<14> | 8.735(R)addr_pc<15> | 8.824(R)addr_pc<1> | 9.076(R)addr_pc<2> | 9.333(R)addr_pc<3> | 9.067(R)addr_pc<4> | 11.008(R)addr_pc<5> | 8.909(R)addr_pc<6> | 9.681(R)addr_pc<7> | 9.785(R)addr_pc<8> | 8.384(R)addr_pc<9> | 9.393(R)data_in_out<0>| 12.164(R)data_in_out<1>| 13.308(R)data_in_out<2>| 12.626(R)data_in_out<3>| 12.626(R)data_in_out<4>| 12.632(R)data_in_out<5>| 14.403(R)data_in_out<6>| 12.408(R)data_in_out<7>| 14.370(R)rd_data | 12.080(R)wr_data | 12.561(R)xout<0> | 9.245(R)xout<1> | 9.249(R)xout<2> | 9.616(R)xout<3> | 8.549(R)xout<4> | 9.300(R)xout<5> | 9.623(R)xout<6> | 9.578(R)xout<7> | 9.265(R)+-------------+------------+Pad to Pad+--------------+---------------+---------+| Source Pad |Destination Pad| Delay |+--------------+---------------+---------+xin<0> |data_in_out<0> | 9.217xin<1> |data_in_out<1> | 8.800xin<2> |data_in_out<2> | 9.069xin<3> |data_in_out<3> | 8.827xin<4> |data_in_out<4> | 8.639xin<5> |data_in_out<5> | 9.744xin<6> |data_in_out<6> | 9.310xin<7> |data_in_out<7> | 10.122+--------------+---------------+---------+
65
Analysis completed Tue May 30 13:15:43 2006
4.7 BITGEN OVERVIEW
BitGen produces a bit stream for Xilinx device configuration. After the
design is completely routed, it is necessary to configure the device so that it
can execute the desired function. This is done using files generated by
BitGen, the Xilinx bit stream generation program. BitGen takes a fully routed
NCD (native circuit description) file as input and produces a configuration bit
stream—a binary file with a .bit extension. The BIT file contains all of the
configuration information from the NCD file that defines the internal logic and
interconnections of the FPGA, plus device-specific information from other files
associated with the target device. The binary data in the BIT file is then
downloaded into the FPGAs memory cells, or it is used to create a PROM file.
The final bit file was downloaded into the FPGA device and real time
verification was done.
5. CONCLUSION
The design was successfully implemented on the target device. The design
was tested successfully by both Functional and Post PAR Simulation.
5.1. PERFORMANCE PARAMETERS
Here are some of the performance parameters that my design achieved.
1. Throughput : 1 instruction/cycle
2. Initial Latency : 1 Clock
66
3. No. of Pipelining Stages : 2
4. Max. Operating Freq : 97 Mhz
5.2. FUTURE IMPROVEMENTS
1. More instructions can be included in the design with the same instruc-
tion size by using the don’t care bits.
2. Number of pipelining stages can be increased to 4-5 from the current
number of 2. First pipelining stage is Read-Fetch-Execute and the Sec-
ond pipelining stage is Write. By dividing the First stage further in to
three stages, maximum operating frequency will also be improved by
great extent.
APPENDIX A – RTL CODING
A.1. MOVE UNIT
/* ~~~~MOVE UNIT~~~~ */
module move_unit(I_10, src, I_2_9, result); input I_10; input [7:0] src; input [7:0] I_2_9; output [7:0] result;
assign result = I_10 ? I_2_9 : src;
endmodule
67
A.2. SHIFT UNIT
/* ~~~~SHIFT UNIT~~~~ */
module shift_unit(src, I_7, result); input [7:0] src; input I_7; output [7:0] result;
assign result = I_7 ? {1'b0, src[7:1]} : {src[6:0], 1'b0};
endmodule
A.3. ARITHMETIC UNIT
/* ~~~~8-BTIT FULL-ADDER~~~~ */
module full_adder_8bit(in1, in2, sum, cout, cin);
input [7:0] in1, in2;input cin;
output [7:0] sum;output cout;
assign {cout, sum} = in1 + in2 + cin;
endmodule/* ~~~~ARITHMETIC UNIT~~~~ */
module arithmetic_unit(A, src, I_8, cin, sub, q_c, q_s, result, S, C, Z); input [7:0] A; input [7:0] src; input I_8; input cin; input sub; input q_c, q_s;
output [7:0] result;output S, C, Z;
wire [7:0] in2, in2_final;wire cout;
assign in2 = I_8 ? 8'b0 : A;assign in2_final = in2 ^ {8{sub}};
68
full_adder_8bit a1(.in1(src), .in2(in2_final), .cin(cin), .cout(cout), .sum(result));
assign C = q_c && cout; //CARRY FLAGassign S = q_s && (!cout); //SIGN FLAG
assign Z = (!result[7]) & (!result[6]) & (!result[5]) & (!result[4]) & (!result[3]) & (!result[2]) & (!result[1]) & (!result[0]); //ZERO FLAG
endmodule
A.4. PROGRAM COUNTER
/* ~~~~16-BIT ADDER~~~~ */
module adder_16(in1, in2, out, cout, cin);
input cin;input [15:0] in1, in2;
output cout;output [15:0] out;
assign {cout, out}=in1 + in2 + cin;
endmodule
/* ~~~~PROGRAM COUNTER~~~~ */
module program_counter(ld_pc, rst, clk, c, d, I_2_9, s4, q14, PC); input ld_pc; input rst; input clk; input [7:0] c; input [7:0] d; input [7:0] I_2_9; input s4, q14; output reg [15:0] PC;
wire cin=1'b0; wire [15:0] in2, adder_out, pc_in; wire [7:0] in2_half;
assign in2_half = s4 ? I_2_9 : 8'b0000_0001; assign in2 = {8'b0000_0000, in2_half}; assign pc_in = q14 ? {c,d} : adder_out;
69
adder_16 a16 (.in1(PC), .in2(in2), .out(adder_out), .cin(cin));
always@(posedge clk, posedge rst) begin
if (rst)PC = 8'b0;
else if (ld_pc)PC = pc_in;
end
endmodule
A.5. INSTRUCTION REGISTER
/* ~~~~INSTRUCTION REGISTER~~~~ */
module instruction_register(clk, rst, ld_ir, ir_in, I);
input clk; input rst; input ld_ir; input [11:1] ir_in;
output reg [11:1] I;
always@(posedge clk, posedge rst)begin
if(rst)I = 11'b0100_0000_000;
else if (ld_ir)I = ir_in;
end
endmodule
A.6. INSTRUCTION DECODER
70
/* ~~~~INSTRUCTION DECODER~~~~ */
module instruction_decoder(I_4_11, q); input [11:4] I_4_11; output [14:1] q;
assign q[1] = (!I_4_11[11]) & I_4_11[10]; //MVI assign q[2] = I_4_11[11] & (!I_4_11[10]); //JMP offset assign q[3] = I_4_11[11] & I_4_11[10]; //JZ assign q[4] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!
I_4_11[8]) & I_4_11[7]; //MOV
assign q[5] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & I_4_11[8] & (!I_4_11[7]); //INC
assign q[6] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & I_4_11[8] & I_4_11[7]; //DEC
assign q[7] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & (!I_4_11[8]) & (!I_4_11[7]); //SL
assign q[8] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & (!I_4_11[8]) & I_4_11[7]; //SR
assign q[9] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & (!I_4_11[5]); //CMP
assign q[10] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & I_4_11[5] & (!I_4_11[4]); //ADD
assign q[11] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & (!I_4_11[6]) & I_4_11[5] & I_4_11[4]; //SUB
assign q[12] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & I_4_11[8] & (!I_4_11[7]); //LOAD
assign q[13] = (!I_4_11[11]) & (!I_4_11[10]) & I_4_11[9] & I_4_11[8] & I_4_11[7]; //STORE
assign q[14] = (!I_4_11[11]) & (!I_4_11[10]) & (!I_4_11[9]) & (!I_4_11[8]) & (!I_4_11[7]) & I_4_11[6]; //JMPCD
endmodule
A.7. CONTROL UNIT
/* ~~~~CONTROL UNIT~~~~ */
module control_unit(q, z, q_c, q_s, s0, s1, s2, wr_data, rd_data, ld_pc,
71
ld_ir, en_dec, sub, cin, s4, xout_buf, ld_flags); input [14:1] q; input z; output reg q_c, q_s; output reg s0, s1, s2, s4; output wr_data, rd_data; output ld_pc, ld_ir; output reg ld_flags; output reg en_dec; output reg sub, cin; output reg xout_buf;
reg [3:0] E;
assign ld_pc=1'b1; assign ld_ir=1'b1;
assign rd_data=q[12]; assign wr_data=q[13];
always @ * begin
E[0]=q[1] | q[4];E[1]=q[5] | q[6] | q[9] | q[10] | q[11];E[2]=q[7] | q[8];E[3]=q[12];
case (E)4'b0010: begin s0=1'b1; s1=1'b0; end4'b0100: begin s0=1'b0; s1=1'b1; end4'b1000: begin s0=1'b1; s1=1'b1; enddefault: begin s0=1'b0; s1=1'b0; end
endcase
sub=q[6] | q[9] | q[11];cin=q[5] | q[9] | q[11];q_c=q[5] | q[10];q_s=q[6] | q[11] | q[9];ld_flags = E[1];
s4=q[2] | (q[3] && z);
s2=q[10] | q[11];
en_dec=q[1] | q[2] | q[3] | q[9] | q[13] | q[14];
xout_buf=q[4] | q[5] | q[6] | q[7] | q[8] | q[10] | q[11] | q[12];
end
72
endmodule
A.8. MAIN PROCESSOR UNIT
/* ~~~~MAIN PROCESSOR UNIT~~~~ */
module main_processor(clk, rst, xin, xout, wr_data, rd_data, addr_data, data_in_out, ir_in, addr_pc); input clk; input rst; input [7:0] xin; input [11:1] ir_in; inout [7:0] data_in_out;
output [7:0] xout; output wr_data; output rd_data; output [15:0] addr_data; output [15:0] addr_pc;
reg Sign, Carry, Zero; reg [7:0] A_reg, B_reg, C_reg, D_reg, X_reg; reg X_buf;
reg [7:0] data_bus; reg ld_a_temp, ld_B, ld_C, ld_D;
wire cin, sub, q_C, q_S, s, c, z; wire s0, s1, s2, s4; wire xout_buf, en_dec; wire ld_A, ld_ir, ld_pc, ld_flags; wire [11:1] I; wire [14:1] q; wire [7:0] result_au, result_su, result_mu; wire [7:0] src; wire [7:0] data_in;
arithmetic_unit au1(A_reg, src, I[8], cin, sub, q_C, q_S, result_au, s, c, z);
control_unit cu1(q, Zero ,q_C, q_S, s0, s1, s2, wr_data, rd_data, ld_pc, ld_ir, en_dec, sub, cin, s4, xout_buf, ld_flags);
instruction_decoder id1(I[11:4], q); instruction_register ir1(clk, rst, ld_ir, ir_in, I);
move_unit mu1(I[10], src, I[9:2], result_mu);
73
program_counter pc1(ld_pc, rst, clk, C_reg, D_reg, I[9:2], s4, q[14], addr_pc);
shift_unit su1(src, I[7], result_su);
assign xout = X_buf ? X_reg : 8'bz; assign ld_A = ld_a_temp || q[1]; assign addr_data = {C_reg, D_reg};
//SRC Multiplexerassign src = I[3] ? xin : (I[2] ? (I[1] ? D_reg : C_reg) : (I[1] ? B_reg : A_reg));
assign data_in = rd_data ? data_in_out : 8'bz; assign data_in_out = wr_data ? src : 8'bz;
always @ (posedge clk, posedge rst) begin
if (rst)begin
A_reg=8'b0;B_reg=8'b0;C_reg=8'b0;D_reg=8'b0;X_reg=8'b0;X_buf=1'b0;
Sign=1'b0;Carry=1'b0;Zero=1'b0;
end
elsebegin
X_reg = data_bus;X_buf = xout_buf & (s2 ? I[3] : I[6]);
if(ld_flags)begin
Carry=c;Zero=z;Sign=s;
end
if (ld_A)
74
A_reg = data_bus;
if (ld_B)B_reg = data_bus;
if (ld_C)C_reg = data_bus;
if (ld_D)D_reg = data_bus;
end
end
always @ *begin
// Destination Decoderif (!((xout_buf & (s2 ? I[3] : I[6])) || en_dec))begin
case (s2 ? I[2:1] : I[5:4])2'b00: begin ld_a_temp =1'b1; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b0; end
2'b01: begin ld_a_temp =1'b0; ld_B = 1'b1; ld_C = 1'b0; ld_D=1'b0; end
2'b10: begin ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b1; ld_D=1'b0; end
2'b11: begin ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b1; end
endcaseend
elsebegin
ld_a_temp =1'b0; ld_B = 1'b0; ld_C = 1'b0; ld_D=1'b0;end
case ({s1, s0})
2'b01: data_bus = result_au;2'b10: data_bus = result_su;2'b11: data_bus = data_in;default: data_bus = result_mu;
endcase
end
75
endmodule
APPENDIX B – INSTRUCTION SET
ADD A : 11’H010ADD B : 11’H011ADD C : 11’H012ADD D : 11’H013
76
ADD X : 11’H014
CMP A : 11’H000CMP B : 11’H001CMP C : 11’H002CMP D : 11’H003CMP X : 11’H004
DEC A, A : 11’H0C0DEC A, B : 11’H0C1DEC A, C : 11’H0C2DEC A, D : 11’H0C3DEC A, X : 11’H0C4
DEC B, A : 11’H0C8DEC B, B : 11’H0C9DEC B, C : 11’H0CADEC B, D : 11’H0CBDEC B, X : 11’H0CC
DEC C, A : 11’H0D0DEC C, B : 11’H0D1DEC C, C : 11’H0D2DEC C, D : 11’H0D3DEC C, X : 11’H0D4
DEC D, A : 11’H0D8DEC D, B : 11’H0D9DEC D, C : 11’H0DADEC D, D : 11’H0DBDEC D, X : 11’H0DC
DEC X, A : 11’H0E0DEC X, B : 11’H0E1DEC X, C : 11’H0E2DEC X, D : 11’H0E3DEC X, X : 11’H0E4
INC A, A : 11’H080INC A, B : 11’H081INC A, C : 11’H082INC A, D : 11’H083INC A, X : 11’H084
INC B, A : 11’H088
77
INC B, B : 11’H089INC B, C : 11’H08AINC B, D : 11’H08BINC B, X : 11’H08C
INC C, A : 11’H090INC C, B : 11’H091INC C, C : 11’H092INC C, D : 11’H093INC C, X : 11’H094
INC D, A : 11’H098INC D, B : 11’H099INC D, C : 11’H09AINC D, D : 11’H09BINC D, X : 11’H09C
INC X, A : 11’H0A0INC X, B : 11’H0A1INC X, C : 11’H0A2INC X, D : 11’H0A3INC X, X : 11’H0A4
JMP : [2’B10, < 8-Bit Data>, 1’b0]JMPCD : 11’H020JZ : [2’B11, < 8-Bit Data>, 1’b0]
LOAD A : 11’H180LOAD B : 11’H188LOAD C : 11’H190LOAD D : 11’H198LOAD X : 11’H1A0
MOV A, A : 11’H041MOV A, B : 11’H041MOV A, C : 11’H042MOV A, D : 11’H043MOV A, X : 11’H044
MOV B, A : 11’H048MOV B, B : 11’H049MOV B, C : 11’H04AMOV B, D : 11’H04BMOV B, X : 11’H04C
MOV C, A : 11’H050
78
MOV C, B : 11’H051MOV C, C : 11’H052MOV C, D : 11’H053MOV C, X : 11’H054
MOV D, A : 11’H058MOV D, B : 11’H059MOV D, C : 11’H05AMOV D, D : 11’H05BMOV D, X : 11’H05C
MOV X, A : 11’H060MOV X, B : 11’H061MOV X, C : 11’H062MOV X, D : 11’H063MOV X, X : 11’H064
MVI : [2’B01, < 8-Bit Data>, 1’b0]
SL A, A : 11’H100SL A, B : 11’H101SL A, C : 11’H102SL A, D : 11’H103SL A, X : 11’H104
SL B, A : 11’H108SL B, B : 11’H109SL B, C : 11’H10ASL B, D : 11’H10BSL B, X : 11’H10C
SL C, A : 11’H110SL C, B : 11’H111SL C, C : 11’H112SL C, D : 11’H113SL C, X : 11’H114
SL D, A : 11’H118SL D, B : 11’H119SL D, C : 11’H11ASL D, D : 11’H11BSL D, X : 11’H11C
SL X, A : 11’H120SL X, B : 11’H121SL X, C : 11’H122
79
SL X, D : 11’H123SL X, X : 11’H124
SR A, A : 11’H140SR A, B : 11’H141SR A, C : 11’H142SR A, D : 11’H143SR A, X : 11’H144
SR B, A : 11’H148SR B, B : 11’H149SR B, C : 11’H14ASR B, D : 11’H14BSR B, X : 11’H14C
SR C, A : 11’H150SR C, B : 11’H151SR C, C : 11’H152SR C, D : 11’H153SR C, X : 11’H154
SR D, A : 11’H158SR D, B : 11’H159SR D, C : 11’H15ASR D, D : 11’H15BSR D, X : 11’H15C
SR X, A : 11’H160SR X, B : 11’H161SR X, C : 11’H162SR X, D : 11’H163SR X, X : 11’H164
STORE A : 11’H1A0STORE B : 11’H1A1STORE C : 11’H1A2STORE D : 11’H1A3STORE X : 11’H1A4
SUB A : 11’H018SUB B : 11’H019SUB C : 11’H01ASUB D : 11’H01BSUB X : 11’H01C
80
81
82
83
84
References
[1] R. Aceves, Desarrollo de un enlace inalámbrico paratelefonía fija empleando una FPGA. Final Project at theETSII, University of Valladolid, Spain, 2006.
[2] M. Alonso, Diseño de un Entorno de Desarrollo de Altoy Bajo Nivel para un Procesador de Propósito Generalintegrado en FPGA, Final Project at the ETSII,University of Valladolid, Spain, 2003.
[3] J. del Barrio, Desarrollo sobre FPGA de un Emulador deuna Planta de Microgeneración Eléctrica, Final Projectat the ETSII, University of Valladolid, Spain, 2004.
[4] K. Chapman, “PicoBlaze 8-Bit Microcontroller forVirtex-E and Spartan-II/IIE Devices”, Xilinx XAPP213(v2.0), online at http://www.xilinx.com/xapp/xapp213.pdf, December, 2002.[5] J. Gray, “Designing a Simple FPGA-Optimized RISCCPU and System-on-a-Chip”, DesignCon’2001, online athttp://www.fpgacpu.org/gr/index.html, 2001.
[6] J. Gray, “FPGA CPU Links”, on line at http://www.fpgacpu.org/links.html, September, 2002.
[7] S. K. Knapp, “XC4000 Series Edge-Triggered and Dual-Port RAM Capability”, Xilinx XAPP065, 1996.
[8] J. Kent, “John’s FPGA Page”, online at http://members.optushome.com.au/jekent/FPGA.htm, January, 2002.
[9] G. Moore, “Cramming more components onto integratedcircuits”, Electronics Magazine, 19 April, 1965.
[10] Opencores: http://www.opencores.org/
[11] S. de Pablo et al., “A soft fixed-point Digital SignalProcessor applied in Power Electronics”, FPGAworldConference 2005, Stockholm, Sweden, 2005.
[12] I. Rodríguez, Desarrollo en FPGA de un interfaz USB.
85