Memory map selection of real time sdram controller using verilog full project report

Project Report On

Memory map selection of real time SDRAM

controller using Verilog

By

RAHUL VERMA

(9015694258)

vi

TABLE OF CONTENTS Page

DECLARATION ............................................................................................................................ii

CERTIFICATE .............................................................................................................................iii

ACKNOWLEDGEMENTS ..........................................................................................................iv

ABSTRACT ..................................................................................................................................vi

LIST OF FIGURES .....................................................................................................................vii

LIST OF TABLES........................................................................................................................viii

LIST OF ABBREVIATION……………………………………………………………………...ix

CHAPTER 1 (INTRODUCTION)……………………………………………………………..01

1.1 LITERATURE SURVEY……………………………………………………...02

1.2 GOAL OF THE PROJECT…………………………………………………….03

CHAPTER 2 (BACKGROUND)………………………………………………………………04

2.1 RANDOM ACCESS MEMORY…………………………………........ ……..04

2.2 STATIC RANDOM ACCESS MEMORY …………………………………....04

2.3 DYNAMIC RANDOM ACCESS MEMORY ……………..………………….05

2.4 DEVELOPMENT OF DRAM ………………………………………………...06

2.4.1 DRAM …………………………………………………………………...07

2.4.2 SYNCHRONOUS DRAM……………………………………………….07

2.4.3 DDR1SDRAM…………………………………………………………....08

2.4.4 DDR2SDRAM……………………………………………………………08

vii

2.4.5 DDR3SDRAM………………………………………………………..…09

2.5 TIMELINE……………………………………………………………………09

CHAPTER 3 (METHODOLOGY)…………………………………………………………...11

3.1 HARDWARE…………………………………………………………………11

3.1.1 VIRTEX-6FPGA………………………………………………………..11

3.1.2 ML605 BOARD………………………………………………………...12

3.2 TOOLS………………………………………………………………………..12

3.2.1 XILINX INTERGRATED SOFTWARE ENVIRONMENT(ISE)……..13

3.2.2 SYNTHESIS AND SIMULATION……………………. ……………..14

3.2.3 IMPLEMENTATION AND HARDWARE VALIDATION…………...14

3.2.4 ANALYSIS OF TURN-AROUND TIMES…………………………….17

3.2.5 XILINX CORE GENERATOR…………………………………………19

CHAPTER 4 (ARCHITECTURE)……………………………………………………………20

4.1 CONTROL INTERFACE MODULE…………………………………………21

4.2 COMMAND MODULE…………………….……………….………………...22

4.3 DAPATH MODULE…………………………………………………………..24

CHAPTER 5 (OPERATION).....................................................................................................25

5.1 SDRAM OVERVIEW…………………………………………………………26

5.2 FUNCTIONAL DESCRIPTION………………………………………………27

5.3 SDRAM CONTROLLER COMMAND INTERFACE……………………….28

5.3.1 NOP COMMAND……………………………………………………….29

5.3.2 READA COMMAND…………………………………………………...30

5.3.3 WRITEA COMMAND……………………………………………….…31

5.3.4 REFRESH COMMAND…………………………………………….…..32

viii

5.3.5 PRECHARGE COMMAND………………………………………….....34

5.3.6 LOAD_MODE COMMAND……………………………………………35

5.3.7 LOAD_REG1 COMMAND……………………………………………..36

5.3.8 LOAD_REG2 COMMAND……………………………………………..37

CHAPTER 6 (ELEMENTS OF MEMORY BANK)…………………………………………38

6.1 DECODER…………………………………………………………………….38

6.1.1 A 2 TO 4 SINGLE BIT DECODER…………………………………….38

6.2 DEMUX………………………………………………………………………..40

6.3 RAM…………………………………………………………………………...41

6.3.1 TYPES OF RAM………………………………………………………...42

6.4 MUX…………………………………………………………………………...44

6.5 BUFFER……………………………………………………………………….45

6.5.1 VOLTAGE BUFFER…………………………………………………….46

6.5.2 CURRENT BUFFER…………………………………………………….47

6.6 MEMORY BANK……………………………………………………………..48

CHAPTER 7 (RESULT AND CONCLUSIONS)……………………………………………..51

7.1 POWER CONSUMED WHEN ALL 8 BANKS ARE ON…………..………51

7.1.1 PROJECT………………………………………………………...………51

7.1.2 DEVICE ………………………………………………………………….51

7.1.3 ENVIRONMENT …………………………………………………….,,,,.52

7.1.4 DEFAULT ACTIVITY………...………………………………….……..52

7.1.5 ON-CHIP POWER SUMMARY………………………………………...53

7.1.6 THERMAL SUMMARY………………………………………………...53

7.1.7 POWER SUPPLY SUMMARY………………………………………….53

7.1.8 CONFIDENCE LEVEL………………………………………………….54

7.1.9 BY HIERARCHY………………………………………………………..55

ix

7.2 POWER CONSUMED WHEN ONLY ONE MEMORY BANK IS IN USE…..56

7.2.1. PROJECT………………………………………………………………..56

7.2.2 DEVICE……………………………………………………………….....56

7.2.3 ENVIRONMENT………………………………………………………...57

7.2.4 DEFAULT ACTIVITY RATES…………………………………………57

7.2.5 ON-CHIP POWER SUMMARY………………………………………...58

7.2.6 THERMAL SUMMARY………………………………………………...58

7.2.7 POWER SUPPLY SUMMARY……………………………………...….58

7.2.8 CONFIDENCE LEVEL………………………………………………….59

7.2.9 BY HIERARCHY………………………………………………………..60

7.3 CONCLUSION…………………………………………………………….….60

CHAPTER 8 (FUTURE SCOPE)……………………………………………………………...61

REFERENCES...............................................................................................................................62

x

LIST OF FIGURES Page

Figure 2.1 DRAM Row Access Latency vs. Year 09

Figure 2.2 DRAM Column Address Time vs. Year 10

Figure 3.1 Screenshot of ISE Project Navigator 13

Figure 3.2 Flow Chart and Timing for Simulation and Hardware Validation 15

Figure 3.3 ISim Screen Shot 18

Figure 3.4 CHIPSCOPE Screen Shot 19

Figure 4.0 Architecture of SDRAM controller 20

Figure 4.1 Control Interface Module 21

Figure 4.2 Command Module Block Diagram 23

Figure 4.3 Data Path Module 24

Figure 5.0 SDR SDRAM Controller System-Level Diagram 25

Figure 5.1 Timing diagram for a READA command 30

Figure 5.2 Timing diagram for a WRITEA command 31

Figure 5.3 Timing diagram for a REFRESH command 32

Figure 5.4Timing diagram for a PRECHARGE command 34

Figure 5.5 Timing diagram for a PRECHARGE command 35

Figure 6.1 RTL of decoder 39

xi

Figure 6.2 Simulation of Decoder 40

Figure 6.3 RTL of DEMUX 41

Figure 6.4 Simulation Of DEMUX 42

Figure 6.5 RTL of RAM 44

Figure 6.6 Simulation of RAM 44

Figure 6.7 RTL of MUX 46

Figure 6.8 Simulation of MUX 46

Figure 6.9 RTL of Buffer 48

Figure 6.10 Simulation of Buffer 49

Figure 6.11 RTL of Memory Bank 50

Figure 6.12 Simulation of Memory Bank 50

xii

LIST OF TABLES Page

Table 5.1 SDRAM Bus Commands 26

Table 5.2 Interface Signals 28

Table 5.3 Interface Commands 29

Table 5.4 REG1 Bit Definitions 36

Table 7.1 Project 51

Table 7.2 Device 51

Table 7.3 Environment 52

Table 7.4 Default Activity 52

Table 7.5 On-Chip Power Summary 53

Table 7.6 Thermal Summary 53

Table 7.7 Power Supply Summary 53

Table 7.8 Power Supply Current 54

Table 7.9 Confidence Level 54

Table 7.10 By Hierarchy 55

Table 7.11 Project 56

Table 7.12 Device 56

xiii

Table 7.13 Environment 57

Table 7.14 Default Activity 57

Table 7.15 On-Chip Power Summary 58

Table 7.16 Thermal Summary 58

Table 7.17 Power Supply Summary 58

Table 7.18 Power Supply Current 59

Table 7.19 Confidence Level 59

Table 7.20 By Hierarchy 60

xiv

LIST OF ABBREVIATIONS

A/D Analog To Digital

CAS Column Address Strobing

CLB Configurable Logic Block

DRAM Dynamic Random-Access Memory

FPGA Field-Programmable Gate Array

ISE Integrated Software Environment

I/O Input/ Output

LUTs Look-Up Tables

NCD Native Circuit Description

RAM Random Access Memory

RAS Row Address Strobing

ROM Read Only Memory

SDRAM Synchronous Dynamic Random-Access Memory

SRAM Static Random-Access Memory

XST Xilinx Synthesis Technology

http://en.wikipedia.org/wiki/Synchronous_dynamic_random-access_memory

http://en.wikipedia.org/wiki/Field-programmable_gate_array


1

CHAPTER 1

INTRODUCTION

Embedded applications with real-time requirements are mapped to heterogeneous multiprocessor

systems. The computational demands placed upon these systems are continuously increasing,

while power and area budgets limit the amount of resources that can be expended to reduce

costs, applications are often forced to share hardware resources. Functional correctness for Real-

Time application is only guaranteed if their timing requirements are considered throughout the

entire system when the requirements are not met, it may cause an unacceptable loss of

functionality or severe quality degradation. We focus on the real-time properties of the (off-chip)

memory.

SDRAM is a commonly used memory type because it provides a large amount of storage space at

low cost per bit. It comprises a hierarchical structure of banks and rows that have to be opened

and closed explicitly by the memory controller, where only one row in each bank can be open at

a time. Requests to the open row are served at a low latency, while request to a different row

results in a high latency, since it requires closing the open row and subsequent opening

of the requested row. Locality thus strongly influences the performance of the memory

subsystem.

The worst-case (minimum) bandwidth and worst-case (maximum) latency are determined by

the way requests are mapped to the memory. The worst-case latency can be optimized by

accessing the memory at a small granularity (i.e. few words), such that the individual requests

take a small amount of time to complete. This allows fine-grained sharing of the memory

resource, at the expense of efficiency, since the overhead of opening and closing rows is

amortized over only a small number of bits. Latency sensitive requests like cache misses favor

this configuration. Conversely, to optimize for bandwidth, the memory has to be used as

efficiently as possible, which requires memory maps that use a large access granularity.

2

Existing memory controllers offer only limited configurability of the memory mapping and are

unable to balance this trade-off based on the application requirements .A memory controller

must take the latency and bandwidth requirements of all of its applications into account, while

staying within the given power budget. This requires an understanding of the effect that

different memory maps have on the attainable worst-case bandwidth, latency and power.

1.1 LITERATURE SURVEY

Synchronous DRAM (SDRAM) has become a mainstream memory of choice in embedded

system memory design due to its speed, burst access and pipeline features. For high-end

applications using processors such as Motorola MPC 8260 or Intel StrongArm, the interface to

the SDRAM is supported by the processor’s built-in peripheral module. However, for other

applications, the system designer must design a controller to provide proper commands for

SDRAM initialization, read/write accesses and memory refresh.

In some cases, SDRAM is chosen because the previous generations of DRAM (FP and EDO) are

either end-of-life or not recommended for new designs by the memory vendors. From the board

design point of view, design using earlier generations of DRAM is much easier and more

straightforward than using SDRAM unless the system bus master provides the SDRAM interface

module as mentioned above. This SDRAM controller reference design, located between the

SDRAM and the bus master, reduces the user’s effort to deal with the SDRAM command

interface by providing a simple generic system interface to the bus master.

In today's SDRAM market, there are two major types of SDRAM distinguished by their data

transfer rates. The most common single data rate (SDR) SDRAM transfers data on the rising edge

of the clock. The other is the double data rate (DDR) SDRAM which transfers data on both the

rising and falling edge to double the data transfer throughput. Other than the data transfer phase,

the different power-on initialization and mode register definitions, these two SDRAMs share the

same command set and basic design concepts. This reference design is targeted for SDR

SDRAM, however, due to the similarity of SDR and DDR SDRAM, this design can also be

modified for a DDR SDRAM controller.

3

For illustration purposes, the Micron SDR SDRAM MT48LC32M4A2 (8Meg x 4 x 4 banks) is

chosen for this design. Also, this design has been verified by using Micron’s simulation model.

It is highly recommended to download the simulation model from the SDRAM vendors for

timing simulation when any modifications are made to this design.

Several SDRAM controllers focusing on real-time applications have been proposed, all trying to

maximize the worst case performance. Uses a static command schedule computed at design time.

Full knowledge of the application behavior is thus required, making it unable to deal with

dynamism in the request streams. The controller proposed in dynamically schedules pre-

computed sequences of SDRAM commands according to a fixed set of scheduling rules. The

controller proposed in follows a similar approach. Dynamically schedules commands at run-time

according to a set of rules from which an upper bound on the latency of a request is determined

and use a memory map that always interleaves requests over all banks in the SDRAM, which sets

a high lower bound on the smallest request size that can be supported efficiently. Supports

multiple bursts to each bank in an access to increase guaranteed bandwidth for large requests.

Allows only single burst accesses to all banks in a fixed sequential manner, although multiple

banks can be clustered to create a single logical resource. None of the mentioned controllers take

power into account, despite it being an increasingly important design constraint.

1.1 GOAL OF THE PROJECT

1) We explore the full memory map design space by allowing requests to be interleaved over a

variable number of banks. This reduces the minimum access granularity and can thus be

beneficial for applications with small requests or tight latency constraints.

2) We propose a configuration methodology that is aware of the real-time and power constraints,

such that an optimal memory map can be selected.

4

CHAPTER 2

BACKGROUND

There are two different types of random access memory: synchronous and dynamic.

Synchronous random access memory (SRAM) is used for high-speed, low power applications

while dynamic random access memory (DRAM) is used for its low cost and high density.

Designers have been working to make DRAM faster and more energy efficient .The following

sections will discuss the differences between these two types of RAM, as well as present the

progression of DRAM towards a faster, more energy efficient design.

2.1 RANDOM ACCESS MEMORY

Today, the most common type of memory used in digital systems is random access memory

(RAM). The time it takes to access RAM is not affected by the data’s location in memory. RAM

is volatile, meaning if power is removed, then the stored data is lost. As a result, RAM cannot be

used for permanent storage. However, RAM is used during runtime to quickly store and retrieve

data that is being operated on by a computer. In contrast, nonvolatile memory, such as hard

disks, can be used for storing data even when not powered on. Unfortunately, it takes much

longer for the computer to store and access data from this memory. There are two types of

RAM: static and dynamic. In the following sections the differences between the two types and

the evolution of DRAM will be discussed.

2.2 STATIC RANDOM ACCESS MEMORY

Static random access memory (SRAM) stores data as long as power is being supplied to the

chip.

5

Each memory cell of SRAM stores one bit of data using six transistors: a flip flop and two

access transistors (i.e. four transistors). SRAM is the faster of the two types of RAM because it

does not involve capacitors, which involve sense amplification of a small charge. For this

reason, it is used in cache memory of computers. Additionally, SRAM requires a very small

amount of power to maintain its data in standby mode Although SRAM is fast and energy

efficient it is also expensive due to the amount of silicon needed for its large cell size. This

presented the need for a denser memory cell, which brought about DRAM.

2.3 DYNAMIC RANDOM ACCESS MEMORY

According to Wakerly , “In order to build RAMs with higher density (more bits per chip), chip

designers invented memory cells that use as little as one transistor per bit. Each DRAM cell

consists of one transistor and a capacitor. Since capacitors “leak” or lose charge over time,

DRAM must have a refresh cycle to prevent data loss.

According to a high-performance DRAM study on earlier versions of DRAM, DRAM’s

refresh cycle is one reason DRAM is slower than SRAM. The cells of DRAM use sense

amplifiers to transmit data to the output buffer in the case of a read and transmit data back to the

memory cell in the case of a refresh. During a refresh cycle, the sense amplifier reads the

degraded value on a capacitor into a D- Latch and writes back the same value to the capacitor

so it is charged correctly for 1 or 0. Since all rows of memory must be refreshed and the sense

amplifier must determine the value of a, already small, degenerated capacitance, refresh

takes a significant amount of time. The refresh cycle typically occurs about every 64

milliseconds the refresh rate of the latest DRAM (DDR3) is about 1 microsecond.

Although refresh increases memory access time, according to a high-performance DRAM study

on earlier versions of DRAM, the greatest amount of time is lost during row

addressing, more specifically, “[extracting] the required data from the sense amps/row caches” .

During addressing, the memory controller first strobes the row address (RAS) onto the address

bus. Once the RAS is sent, a sense amplifier (one for each cell in the row) determines if a

charge indicating a 1 or 0 is loaded into each capacitor.

6

This step is long because “the sense amplifier has to read a very weak charge” and “the row is

formed by the gates of memory cells.” The controller then chooses a cell in the row from

which to read from by strobing the column address (CAS) onto the address bus. A write

requires the enable signal to be asserted at the same time as the CAS, while a read requires the

enable signal to be de-asserted. The time it takes the data to move onto the bus after the CAS is

called the CAS latency.

Although recent generations of DRAM are still slower than SRAM, DRAM is used when a

largeramount of memory is required since it is less expensive. For example, in embedded

systems, a small block of SRAM is used for the critical data path, and a large block of DRAM

is used to satisfy all other needs .The following section will discuss the development of

DRAM into a faster, more energy efficient memory.

2.4 DEVELOPMENT OF DRAM

Many factors are considered in the development of high performance RAM. Ideally, the

developer would always like memory to transfer more data and respond in less time; memory

would have higher bandwidth and lower latency. However, improving upon one factor often

involves sacrificing the other.

Bandwidth is the amount of data transferred per second. It depends on the width of the data

bus and the frequency at which data is being transferred. Latency is the time between when the

address strobe is sent to memory and when the data is placed on the data bus. DRAM is

slower than SRAM because it periodically disables the refresh cycle and because it takes a much

longer time to extract data onto the memory bus. Advancements have been, however, to

several different aspects of DRAM to increase bandwidth and decrease latency.

Over time, DRAM has evolved to become faster and more energy efficient by decreasing in cell

size and increasing in capacity. In the following section, we will look at different types of

DRAM and how DDR3 memory has come to be.

7

2.4.1 DRAM

One of the reasons the original DRAM was very slow is because of extensive addressing

overhead. In the original DRAM, an address was required for every 64-bit access to memory.

Each access took six clock cycles. For a four 64-bit access to consecutive addresses in memory,

the notation for timing was 6-6-6-6. Dashes separate memory accesses and the numbers indicate

how long the accesses take. This DRAM timing example took 24 cycles to access the memory

four times. In contrast, more recent DRAM implements burst technology which can send

many 64-bit words toconsecutive addresses. While the first access still takes six clock cycles

due memory accessing, the next three adjacent addresses can be performed in as little as one

clock cycle since the addressing does not need to be repeated.

During burst mode, the timing would be 6-1-1-1, a total of nine clock cycles. The original DRAM

is also slower than its descendants because it is asynchronous. This means there is no memory

bus clock to synchronize the input and output signals of the memory chip. The timing

specifications are not based on a clock edge, but rather on maximum and minimum timing

values (in seconds). The user would need to worry about designing a state machine with idle

states, which may be inconsistent when running the memory at different frequencies.

2.4.2 Synchronous DRAM

In order to decrease latency, SDRAM utilizes a memory bus clock to synchronize signals to and

from the system and memory. Synchronization ensures that the memory controller does not need

to follow strict timing; it simplifies the implemented logic and reduces memory access latency.

With a synchronous bus, data is available at each clock cycle.

SDRAM divides memory into two to four banks for concurrent access to different parts of

memory.Simultaneous access allows continuous data flow by ensuring there will always be a

memory bank read for access. The addition of banks adds another segment to the addressing,

resulting in a bank, row and column address.

8

The memory controller determines if an access addresses the same bank and row as the

previous access, so only a column address strobe must be sent. This allows the access to occur

much more quickly and can decrease overall latency.

2.4.3 DDR1 SDRAM

DDR1 SDRAM (i.e. first generation of SDRAM) doubles the data rate (hence the term DDR)

of SDRAM without changing clock speed or frequency. DDR transfers data on both the rising

and falling edge of the clock, has a pre-fetch buffer and low voltage signaling, which makes

it more energy efficient than previous designs.

Unlike SDRAM, which transfers 1 bit per clock cycle from the memory array to the data queue,

DDR1 transfers 2 bits to the queue in two separate pipelines. The bits are released in order on the

same output line. This is called a 2n-prefetch architecture. In addition, DDR1 utilizes double

transition clocking by triggering on both the rising and falling edge of the clock to transfer data.

As a result, the bandwidth of DDR1 is doubled without an increase in the clock frequency.

In addition to doubling the bandwidth, DDR1 made advances is energy efficiency. DDR1 can

operate at 2.5V instead of the 3.3V operating point of SDRAM thanks to low voltage signaling

technology.

2.4.4 DDR2 SDRAM

Data rates of DDR2 SDRAM are up to eight times more than original SDRAM. At an operation

voltage of1.8V, it achieves lower power consumption than DDR1. DDR2 SDRAM has a 4-bit

prefetch buffer, an improvement from the DDR12-bit prefetch. This means that 4 bits are

transferred per clock cycle from the memory array to the data bus, which increases bandwidth.

9

2.4.5 DDR3 SDRAM

DDR3 provides two burst modes for both reading and writing: burst chop (BC4) and burst

length eight (BL8). BC4 allows bursts of four by treating data as though half of it is masked.

This creates smooth transitioning if switching from DDR2 to DDR3 memory. However, burst

mode BL8 is the primary burst mode. BL8 allows the most data to be transferred in the least

amount of time; it transfers the greatest number of 64-bit data packets (eight) to or from

consecutive addresses in memory, which means addressing occurs once for every eight data

packets sent. In order to support a burst length of eight data packets, DDR3 SDRAM has an 8-

bit prefetch buffer.DDR3, like its predecessors, not only improves upon bandwidth, but also

energy conservation.Power consumption of DDR3 can be up to 30 percent less than DDR2. The

DDR3 operating voltage is the lowest yet, at 1.5 V, and low voltage versions are supported at

voltages of 1.35 V.

2.5 TIMELINE

Ideally, memory performance would improve at the same rate as central processing unit

(CPU) performance. However, memory latency has only improved about five percent each

year . The longest latency (RAS latency) of the newest release of DRAM for each year is

shown in the plot in Figure 2.1.

Figure 2.1 DRAM Row Access Latency vs. Year

10

As seen in Figure 2.1, the row access latency decreases linearly with every new release of

DRAM until 1996. Once SDRAM is released in 1996, the difference in latency from year to

year is much smaller. With recent memory releases it is much more difficult to reduce RAS

latency.

This can be seen especially for DDR2 and DDR3 memory releases 2006 to 2012.CAS latency,

unlike RAS latency, consistently decreases (bandwidth increases) with every memory

release, and in the new DDR3 memory, is very close to 0 ns. Figure 2.2 shows the column access

latency.

Figure 2.2 DRAM Column Address Time vs. Year

Looking at some prominent areas of the CAS graph, it can be seen in Figure 2.2 that bandwidth

greatly increased (CAS decreased) from 1983 to 1986. This is due to the switch from NMOS

DRAMs to CMOS DRAMs. In1996 the first SDRAM was released. The CAS latency decreased

(bandwidth increased) due to synchronization and banking. In later years, the CAS latency does

not decrease by much, but this is expected since the latency is already much smaller. Comparing

Figure 2.2 to Figure 2.1, CAS time decreases much more drastically than RAS time. This

means the bandwidth greatly improves, while latency improves much more slowly. In 2010,

when DDR2 was released, it can be seen that latency was sacrificed (Figure 2.1) for an

increase in bandwidth (Figure 2.2).

11

CHAPTER 3

METHODOLOGY

In this section the ML605 and Virtex-6 board hardware is described as well as the tools

utilized for design and validation. The Xilinx Integrated Software Environment (ISE) was used

for design and iSim and ChipScope were used for validation in simulation and in hardware.

3.1 HARDWARE

3.1.1 Virtex-6FPGA

The Virtex-6 FPGA (XC6VLX240T) is used to implement the arbiter. This FPGA has 241, 152

logic cells and is organized into banks (40 pins per bank). These logic cells, or slices, are

composed of four look-up tables (LUTs), multiplexers and arithmetic carry logic.

LUTs implement Boolean functions, and multiplexers enable combinatorial logic. Two slices

form a configurable logic block (CLB). In order to distribute a clock signal to all these logic

blocks, the FPGA has five types of clock lines: BUFG, BUFR, BUFIO, BUFH, and high-

performance clock. These lines satisfy “requirements of high fan out, short propagation delay,

and extremely low skew”. The clock lines are also split into categories depending on the sections

of the FPGA and components they drive. The three categories are: global, regional, and I/O lines.

Global clock lines drive all flip-flops, clock enables, and many logic inputs. Regional clock lines

drive all clock destinations in their region and two bordering regions. There are six to eighteen

regions in an FPGA. Finally, I/O clock lines are very fast and only drive I/O logic and

serializer/deserializer circuits.

12

3.1.2 ML605 Board

The Virtex-6 FPGA is included on the ML605 Development Board. In addition to the FPGA, the

development board includes a 512 MB DDR3 small outline dual inline memory module

(SODIMM), which our design arbitrates access to. A SODIMM is the type of board the memory

is manufactured on .The FPGA also includes 32 MB of linear BPI Flash and 8 Kb of IIC

EEPROM.

Communication mechanisms provided on the board include Ethernet, SFP transceiver

connector, GTX port, USB to UART Bridge, USB host and peripheral port, and PCI Express.

The only connection used during this project was the USB JTAG connector. It was used to

program and debug the FPGA from the host computer.

There are three clock sources on the board: a 200 MHz differential oscillator, 66 MHz single-

ended oscillator and SMA connectors for an external clock. This project utilizes the 200MHz

oscillator. Peripherals on the ML605 board were useful for debugging purposes. The push

buttons were used to trigger sections of code execution in ChipScope such as reading and

writing from memory. Dip switches acted as configuration inputs to our code. For example,

they acted as a safety to ensure the buttons on the board were not automatically set to active

when the code was downloaded to the board. In addition, the value on the switches indicated

which system would begin writing first for debugging purposes. LEDs were used to check

functionality of sections of code as well, and for additional validation, they can be used to

indicate if an error as occurred. Although we did not use it, the ML605 board provides an LCD.

3.2 TOOLS

Now that the hardware where the design is placed is described, the software used to

manipulate the design can be described. The tools for design include those provided within

Xilinx Integrated Software Environment, and the tools used for validation include iSim and

ChipScope. This looks at the turn-around time for both validation tools and what it means for the

design process.

13

3.2.1 Xilinx Integrated Software Environment (ISE)

We designed the arbiter using Verilog hardware description language in Xilinx Integrated

Software Environment (ISE). ISE is an environment in which the user can “take [their] design

from design entry through Xilinx device programming”. The main workbench for ISE is ISE

Project Navigator. The Project Navigator tool allows the user to effectively manage their

design and call upon development processes. In Figure 3.1, a screen shot of ISE Project

Navigator :

Figure 3.1 Screen Shot of ISE Project Navigator

Figure 3.1 shows some main windows in ISE Project Navigator. On the right hand side is the

window for code entry. The hierarchal view of modules in the design appears on the

left, and when implementation is selected from the top, the design implementation progress is

shown in the bottom window. If simulation were selected instead of implementation there

would be an option to run the design for simulation.

The main processes called upon by ISE are synthesis, implementation, and bit stream

generation. During synthesis, Xilinx Synthesis Technology (XST) is called upon. XST

synthesizes Verilog, VHDL or mixed language designs and creates netlist files. Netlist files, or

NGC files, contain the design logic and constraints.

14

They are saved for use in the implementation process. During synthesis, the XST checks for

synthesis errors (parsing) and infers macros from the code. When the XST infers macros it

recognizes parts of the code that can be replaced with components in its library such as MUXes,

RAM encodes them in a way that would be best for reduced area and/or increased speed.

Implementation is the longest process to perform on the design. The first step of

implementation is to combine the netlists and constraints into a design/NGD file. The NGD

file is the design file reduced to Xilinx primitives. This process is called translation. During the

second step, mapping, the design is fitted into the target device. This involves turning logic into

FPGA elements such as configurable logic blocks. Mapping produces a native circuit

description (NCD) file.

The third step, place and route, uses the mapped NCD file to place the design and route timing

constraints. Finally, the program file is generated and, at the finish of this step, a bit stream is

ready to be downloaded to the board.

3.2.2 Synthesis and Simulation

Once the design has been synthesized, simulation of the design is possible. Simulating a design

enables verification of logic functionality and timing. We used simulation tool in ISE (isim) to

view timing and signal values. In order to utilize isim, we created a test bench to provide the

design with stimulus. Since simulation only requires design synthesis, it is a relatively fast

process. The short turn-around time of simulation means we were able to iteratively test small

changes to the design and, therefore, debug our code efficiently.

3.2.3 Implementation and Hardware VALIDATION

Once the design was working in simulation, we still needed to test the design’s

functionality in hardware. Testing the design in hardware is the most reliable validation

method. In order to download the design to the board, it first needs to be implemented in ISE.

15

Implementation has a much longer turn- around time than synthesis, so while functionality in

hardware ensures the design is working, simulation is the practical choice for iterative

verification.

In order to test our design in hardware, we utilized ChipScope Pro Analyzer, a GUI which

allows the user to “configure [their] device, choose triggers, setup the console, and view results

of the capture on the fly”. In order to use ChipsScope Pro, you may either insert ChipScope

Pro Cores into the design using the Core Generator, a tool that can be accessed in ISE Project

Figure 3.2 Flow Chart and Timing for Simulation and Hardware Validation

Navigator, or utilize the Plan Ahead or Core Inserter tool, which automatically inserts cores into

the design netlist for you. One method of inserting ChipScope cores into the design is by utilizing

Plan Ahead software. The Plan Ahead tool enables the creation of floorplans.

16

Floorplans provide an initial view of “the design’s interconnect flow and logic module sizes.

This helps the designer to “avoid timing, utilization, and routing congestion issues. Plan Ahead

also allows the designer to create and configure I/O ports and analyze implementation results,

which aids in the discovery of bottlenecks in the design.

For our project, however, we utilized Plan Ahead only for its ability to automatically insert

ChipScope cores. Plan Ahead proved to be inefficient for our purposes since many times, when a

change was made in the design, the whole netlist would need to be selected again.

In addition, there were bugs in the software that greatly affected the turn-around time of

debugging, and it crashed several times. If Plan Ahead were used for floor planning and other

design tools, then it might have proved to be much for useful.

In replace of Plan Ahead, we utilized the Core Generator within ISE. The ChipScope

cores provided by Xilinx include ICON, ILA, VIO, ATC2, and IBERT. The designer can

choose which cores to insert by using the Core Generator in ISE. The ICON core provides

communication between the different cores and the computer running ChipScope. It can connect

up to fifteen ILA, VIO, and ATC2 cores.

The ILA core is used to synchronously monitor internal signals. It contains logic to trigger inputs

and outputs and capture data. ILA cores allow up to sixteen trigger ports, which can be 1 to

256 bits wide. The VIO core can monitor signals like ILA, but also drive internal FPGA signals

real-time. The ATC2 core is similar to the ILA core, but was created for Agilent FPGA

dynamic probe technology. Finally, the IBERT core contains “all the logic to control, monitor,

and change transceiver parameters and perform bit error ratio tests.

The only ChipScope cores we were concerned with in this project were the ICON and ILA cores

We inserted one ChipScope ILA and ICON cores using the ISE Core Generator within

ISE Project Navigator. The ILA core allowed us to monitor internal signals in the FPGA.

Instead of inserting a VIO core, which allows inputs to and outputs from ChipScope, we used

buttons to trigger the execution of write and read logic.

17

3.2.4 Analysis of Turn-Around Times

As introduced in sections 3.3.2 and 3.3.3, implementation takes much longer than synthesis.

Therefore, when it comes down to turn-around time, simulation is much more effective for

iterative debugging. In Figure 3.2, the phases for simulation and hardware validation can be

seen as well as the time it takes to complete each phase.

For simulation, the process starts at Verilog code, becomes synthesized logic, and using a test

bench, is run in iSim for viewing. This process takes about eight minute’s total. A system’s

simulation run-time is much longer than if it were running on hardware, but simulation is still

faster than hardware validation because it does not have to undergo implementation.

The bottleneck in our simulation process is the set up time for the DDR3 memory model which

accounts for most of the simulation time. Hardware validation starts at Verilog code, is

synthesized, implemented, and imported into ChipScope. This whole process takes about fifteen

minutes.

Most of the time spent for hardware validation is on implementation of the design. In addition,

hardware validation requires more of the user’s attention. It is more difficult and takes more

time to set up a ChipScope core than it does to create a test bench for simulation. While a

test bench (green) involves writing some simple code, a ChipScope core (orange) involves

setting up all the signals to be probed. Not only is simulation faster, but the iSim tool is easier

to use than ChipScope. Figure.3.3shows

18

Figure 3.3 iSim Screen Shot

The screen shot of iSim shows the instance names in the first column, all the signals to choose

from in the second, and the signals and their waveforms in the third and fourth columns. The

user can view any signal without having to port it out of the design and re-implement like

when using ChipScope. When adding an additional signal in iSim, only simulation needs to be

restarted. The iSim interface makes debugging much easier with collapsible signal viewing,

grouping abilities, and a large window for viewing many signals at once.

A screen shot of ChipScope is shown in Figure 3.4 In ChipScope, you can view the devices,

signals, triggers, and waveforms window.The time ChipScope is able to capture is much less

than iSim. For this reason, triggers are required to execute different parts of code; this is

where buttons were utilized. If a signal could not fit into the allowable number of signal inputs

or was forgotten, it would need to be added to the design and implemented all over again much

longer turn-around time than simulation. Therefore, simulation is used for iterative debugging and

functionality testing, while hardware validation is the next step to ensure design accuracy.

19

Figure 3.4 ChipScope Screen Shot

3.2.5 Xilinx Core Generator

One tool in ISE that was very important to our project was the CORE Generator. The core

generator provided us with not only the ChipScope cores, but the memory controller, and FIFOs

as well. The core generator can be accessed within ISE Project Navigator. It provides many

additional functions for the designer.

The options provided for creating FIFOs, for example, include common or independent clocks,

first-word fall-through; a variety of flags to indicated the amount of data in the FIFO and write

width, read width and depth.

The different width capabilities allowed us to create asynchronous FIFOs. The memory

controller was created using the Xilinx memory interface generator (MIG). There were options

to use an AXI4, native, or user interface, which is discussed in a following section on interfacing

with the Xilinx MIG.

20

CHAPTER 4

ARCHITECTURE

The SDR SDRAM Controller consists of four main modules: the SDRAM controller, control

interface, command, and data path modules. The SDRAM controller module is the top-level

module that instantiates the three lower modules and brings the whole design together. The

control interface module accepts commands and related memory addresses from the host,

decoding the command and passing the request to the command module. The command module

accepts commands and addresses from the control interface module, and generates the proper

commands to the SDRAM. The data path module handles the data path operations during

WRITEA and READA commands. The SDRAM controller module also instantiates a PLL that is

used in the CLOCK_LOCK mode to improve I/O timing. This PLL is not essential to the

operation of the SDR SDRAM Controller and can be easily removed.

Figure 4 Architecture of SDRAM controller

21

4.1 CONTROL INTERFACE MODULE

The control interface module decodes and registers commands from the host, and passes the

decoded NOP, WRITEA, READA, REFRESH, PRECHARGE, and LOAD_MODE commands,

and ADDR to the command module. The LOAD_REG1 and LOAD_REG2 commands are

decoded and used internally to load the REG1 and REG2 registers with values from ADDR.

Figure 4.1 shows the control interface module block diagram.

Figure 4.1 Control Interface Module

22

The control interface module also contains a 16-bit down counter and control circuit that is used

to generate periodic refresh commands to the command module. The 16-bit down counter is

loaded with the value from REG2 and counts down to zero. The REFRESH_REQ output is

asserted when the counter reaches zero and remains asserted until the command module

acknowledges the request. The acknowledge from the command module causes the down counter

to be reloaded with REG2 and the process repeats. REG2 is a 16-bit value that represents the

period between REFRESH commands that the SDR SDRAM Controller issues. The value is set

by the equation int (refresh_period/clock_period).

For example, if an SDRAM device that is connected to the SDR SDRAM Controller has a 64-ms,

4096-cycle refresh requirement, the device must have a REFRESH command issued to it at least

every64 ms/4096 = 15.625 µs. If the SDRAM and SDR SDRAM Controller are clocked by a

100-MHz clock, the maximum value of REG2 is 15.625 µs/0.01µs = 1562d.

4.2 COMMAND MODULE

The command module accepts decoded commands from the control interface module, refresh

requests from the refresh control logic, and generates the appropriate commands to the SDRAM.

The module contains a simple arbiter that arbitrates between the commands from the host

interface and the refresh requests from the refresh control logic. The refresh requests from the

refresh control logic have priority over the commands from the host interface. If a command from

the host arrives at the same time or during a hidden refresh operation, the arbiter holds off the

host by not asserting CMDACK until the hidden refresh operation is complete. If a hidden refresh

command is received while a host operation is in progress, the hidden refresh is held off until the

host operation is complete. Figure 4.2 shows the command module block diagram.

23

Figure 4.2 Command Module Block Diagram

After the arbiter has accepted a command from the host, the command is passed onto the

command generator portion of the command module. The command module uses three shift

registers to generate the appropriate timing between the commands that are issued to the

SDRAM. One shift register is used to control the timing the ACTIVATE command; a second is

used to control the positioning of the READA or WRITEA commands; a third is used to time

command durations, which allows the arbiter to determine if the last requested operation has been

completed.

The command module also performs the multiplexing of the address to the SDRAM. The row

portion of the address is multiplexed out to the SDRAM outputs A[11:0] during the

ACTIVATE(RAS) command. The column portion is then multiplexed out to the SDRAM address

outputs during a READA (CAS) or WRITEA command.

The output signal OE is generated by the command module to control tristate buffers in the last

stage of the DATAIN path in the data path module.

24

4.3 DATA PATH MODULE

The data path module provides the SDRAM data interface to the host. Host data is accepted on

DATAIN for WRITEA commands and data is provided to the host on DATAOUT during READA

commands.

Figure 4.3 shows the data path module block diagram.

Figure 4.3 Data Path Module

The DATAIN path consists of a 2-stage pipeline to align data properly relative to the CMDACK

and the commands that are issued to the SDRAM. DATAOUT consists of a 2-stage pipeline that

registers data from the SDRAM during a READA command. DATAOUT pipeline delay can be

reduced to one or even zero registers, with the only affect that the relationship of DATAOUT to

CMDACK changes.

25

CHAPTER 5

OPERATION

The single data rate (SDR) synchronous dynamic random access memory (SDRAM) controller

provides a simplified interface to industry standard SDR SDRAM. The SDR SDRAM Controller

is available in either Verilog HDL or VHDL and is optimized for the architecture. The SDR

SDRAM Controller supports the following features:

Burst lengths of 1, 2, 4, or 8 data words.

CAS latency of 2 or 3 clock cycles.

16-bit programmable refresh counter used for automatic refresh.

2-chip selects for SDRAM devices.

Supports the NOP, READA, WRITEA, AUTO_REFRESH, PRECHARGE, ACTIVATE,

BURST_STOP, and LOAD_MR commands.

Support for full-page mode operation.

Data mask line for write operations.

PLL to increase system performance.

Figure 5 SDR SDRAM Controller System-Level Diagram

26

5.1 SDRAM OVERVIEW

SDRAM is high-speed dynamic random access memory (DRAM) with a synchronous interface.

The synchronous interface and fully-pipelined internal architecture of SDRAM allows extremely

fast data rates if used efficiently. Internally, SDRAM devices are organized in banks of memory,

which are addressed by row and column. The number of row- and column-address bits and the

number of banks depends on the size of the memory.

SDRAM is controlled by bus commands that are formed using combinations of the RASN, CASN,

and WEN signals. For instance, on a clock cycle where all three signals are high, the associated

command is a no operation (NOP). A NOP is also indicated when the chip select is not asserted.

Table 5.1 shows the standard SDRAM bus commands.

Table 5.1 SDRAM Bus Commands

SDRAM banks must be opened before a range of addresses can be written to or read from. The

row and bank to be opened are registered coincident with the ACT command.

When a bank is accessed for a read or a write it may be necessary to close the bank and re-open it

if the row to be accessed is different than the row that is currently opened.

Closing a bank is done with the PCH command.

27

The primary commands used to access SDRAM are RD and WR. When the WR command is

issued, the initial column address and data word is registered. When a RD command is issued, the

initial address is registered. The initial data appears on the data bus 1 to 3 clock cycles later.

This is known as CAS latency and is due to the time required to physically read the internal DRAM

core and register the data on the bus. The CAS latency depends on the speed of the SDRAM and

the frequency of the memory clock. In general, the faster the clock, the more cycles of CAS latency

are required. After the initial RD or WR command, sequential read and writes continue until the

burst length is reached or a BT command is issued. SDRAM memory devices support burst lengths

of 1, 2, 4, or 8 data cycles. The ARF is issued periodically to ensure data retention. This function is

performed by the SDR SDRAM Controller and is transparent to the user.

The LMR is used to configure the SDRAM mode register which stores the CAS latency, burst

length, burst type, and write burst mode. Consult the SDRAM specification for additional details.

SDRAM comes in dual in-line memory modules (DIMMs), small-outline DIMMs (SO-DIMMs)

and chips. To reduce pin count SDRAM row and column addresses are multiplexed on the same

pins. SDRAM often includes more than one bank of memory internally and DIMMS may require

multiple chip selects.

5.2 FUNCTIONAL DESCRIPTION

Table shows the SDR SDRAM Controller interface signals. All signals are synchronous to the

system clock and outputs are registered at the SDR SDRAM Controller’s outputs.

28

Table 5.2 Interface Signals

5.3 SDRAM CONTROLLER COMMAND INTERFACE

The SDR SDRAM Controller provides a synchronous command interface to the SDRAM and

several control registers. Table shows the commands, which are described in following sections.

The following rules apply to the commands with reference with table 5.2:

All commands, except NOP, are driven by the user ontoCMD [2:0]; ADDR and DATAIN

are set appropriately for the requested command. The controller registers the command on

the next rising clock edge.

29

To acknowledge the command the controller asserts CMDACK for one clock period.

For READA or WRITEA commands, the user should start receiving or writing data on

DATAOUT and DATAIN.

The user must drive NOP onto CMD [2:0] by the next rising clock edge after CMDACK is

asserted.

Table 5.3 Interface Commands

5.3.1 NOP Command

NOP is a no operation command to the controller. When NOP is detected by the controller, it

performs a NOP in the following clock cycle. A NOP must be issued the following clock cycle

after the controller has acknowledged a command.

The NOP command has no affect on SDRAM accesses that are already in progress.

30

5.3.2 READA Command

Figure 5.1 Timing diagram for a READA command

The READA command instructs the SDR SDRAM Controller to perform a burst read with auto-

precharge to the SDRAM at the memory address specified by ADDR. The SDR SDRAM

Controller issues an ACTIVATE command to the SDRAM followed by a READA command. The

read burst data first appears on DATAOUT (RCD + CL + 2) after the SDR SDRAM Controller

asserts CMDACK. During a READA command the user must keep DM low.

When the controller is configured for full-page mode, the READA command becomes READ

(READ without auto-precharge). Figure 5.1 shows an example timing diagram for a READA

command.

31

The following sequence describes the general operation of the READA command:

The user asserts READA, ADDR and DM.

The SDR SDRAM Controller asserts CMDACK to acknowledge the command and

simultaneously starts issuing commands to the SDRAM devices.

One clock after CMDACK is asserted, the user must assert NOP.

The CMDACK presents the first read burst value on DATAOUT, the remainder of the read

bursts follow every clock cycle.

5.3.3 WRITEA Command

Figure 5.2 Timing diagram for a WRITEA command

The WRITEA command instructs the SDR SDRAM Controller to perform a burst write with auto-

precharge to the SDRAM at the memory address specified by ADDR.

32

The SDR SDRAM Controller will issue an ACTIVATE command to the SDRAM followed by a

WRITEA command. The first data value in the burst sequence must be presented with the

WRITEA and ADDR address. The host must start clocking data along with the desired DM values

into the SDR SDRAM Controller (tRCD – 2) clocks after the SDR SDRAM Controller has

acknowledged the WRITEAcommand.

See a SDRAM data sheet for how to use the data mask lines DM/DQM.When the SDR SDRAM

Controller is in the full-page mode WRITEA becomes WRITE (write without auto-precharge).

Figure shows an example timing diagram for a WRITEA command. The following sequence

describes the general operation of a WRITEA command:

The user asserts WRITEA, ADDR, the first write data value on DATAIN, and the desired

data mask value on DM with reference to the table 5.2 and 5.3.



One clock after CMDACK was asserted, the user asserts NOP on CMD.

The user clocks data and data mask values into the SDR SDRAM Controller through

DATAIN and DM.

5.3.4 REFRESH Command

The REFRESH command instructs the SDR SDRAM Controller to perform an ARF command to

the SDRAM. The SDR SDRAM Controller acknowledges the REFRESH command with

CMDACK. Figure 5.3 shows an example timing diagram of the REFRESH command.

33

Figure 5.3 Timing diagram for a REFRESH command

The following sequence describes the general operation of a REFRESH command:

The user asserts REFRESH on the CMD input.



The user asserts NOP on CMD

34

5.3.5 PRECHARGE Command

Figure 5.4 Timing diagram for a PRECHARGE command

The PRECHARGE command instructs the SDR SDRAM Controller to perform a PCH command

to the SDRAM. The SDR SDRAM Controller acknowledges the command with CMDACK. The

PCH command is also used to generate a burst stop to the SDRAM. Using PRECHARGE to

terminate a burst is only supported in the full-page mode.

Note that the SDR SDRAM Controller adds a latency from when the host issues a command to

when the SDRAM sees the PRECHARGE command of 4 clocks. If a full-page read burst is to be

stopped after 100 cycles, the PRECHARGE command must be asserted (4 + CL – 1) clocks before

the desired end of the burst (CL – 1 requirement is imposed by the SDRAM devices). So if the

CAS latency is 3, the PRECHARGE command must be issued (100 – 3 –1 – 4) = 92 clocks into the

burst.

35

Figure 5.4 shows an example timing diagram of the PRECHARGE command. The following

sequence describes the general operation of a PRECHARGE command:

The user asserts PRECHARGE on CMD.

The DR SDRAM Controller asserts CMDACK to acknowledge the command and


The user asserts NOP on CMD

5.3.6 LOAD_MODE Command

The LOAD_MODE command instructs the SDR SDRAM Controller to perform a LMR command

to the SDRAM. The value that is to be written into the SDRAM mode register must be present on

ADDR [11:0] with the LOAD_MODE command. The value on ADDR [11:0] is mapped directly to

the SDRAM pins A11-A0 when the SDR SDRAM Controller issues the LMR to the SDRAM.

Figure 5.5 shows an example timing diagram.

The following sequence describes the general operation of a LOAD_MODE command, the

users asserts LOAD_MODE on CMD.



One clock after the SDR SDRAM Controller asserts CMDACK, the users asserts NOP on

CMD.

36

. Figure 5.5 Timing diagram for a LOAD_MODE Command

5.3.7 LOAD_REG1 Command

Table 5.4 REG1 Bit Definitions

37

CL is the CAS latency of the SDRAM memory in clock periods and is dependent on the memory

device speed grade and clock frequency. Consult the SDRAM data sheet for appropriate settings.

CL must be set to the same value as CL for the SDRAM memory devices.

RCD is the RAS to CAS delay in clock periods and is dependent on the SDRAM speed grade and

clock frequency. RCD = INT(tRCD/clock period), where tRCD is the value from the SDRAM data

sheet and clock period is the clock period of the SDR SDRAM Controller and SDRAM

clock.RRD is the refresh to RAS delay in clock periods. RRD is dependent on the SDRAM speed

grade and clock frequency. RRD= INT(tRRD/clock_period), where tRRD is the value from the

SDRAM data sheet and clock_period is the clock period of the SDR SDRAM controller and

SDRAM clock.PM is the page mode bit. If PM = 0, the SDR SDRAM Controller operates in non-

page mode. If PM = 1, the SDR SDRAM Controller operates in page-mode. See Section “Full-

Page Mode Operation” for more information. BL is the burst length the SDRAM devices have

been configured for.

5.3.8 LOAD_REG2 Command

The LOAD_REG2 command instructs the SDR SDRAM Controller to load the internal configuration

register REG2. REG2 is a 16-bit value that represents the period between REFRESH commands that the

SDR SDRAM Controller issues. The value is set by the equation int (refresh_period/clock period).

For example, if a SDRAM device connected to the SDR SDRAM Controller has a 64-ms, 4096-

cycle refresh requirement the device must have a REFRESH command issued to it at least every 64

ms/4096 = 15.625 09 µs.If the SDRAM and SDR SDRAM Controller are clocked by a 100 MHz

clock, the maximum value of REG2 is 15.625 µs/0.01 µs = 1562d. The value that is to be written

into REG2 must be presented on the ADDR input simultaneously with the assertion of the

command LOAD_REG2.

38

CHAPTER 6

ELEMENTS OF MEMORY BANK

6.1 DECODER

A decoder is a device which does the reverse operation of an encoder, undoing the encoding so

that the original information can be retrieved. The same method used to encode is usually just

reversed in order to decode. It is a combinational circuit that converts binary information from n

input lines to a maximum of 2n unique output lines.

6.1.1 A 2-to-4 line single-bit decoder

In digital electronics, a decoder can take the form of a multiple-input, multiple-output logic

circuit that converts coded inputs into coded outputs, where the input and output codes are

different. E.g. n-to-2n, binary-coded decimal decoders. Enable inputs must be on for the decoder

to function, otherwise its outputs assume a single "disabled" output code word. Decoding is

necessary in applications such as data multiplexing, 7 segment display and memory address

decoding.

The example decoder circuit would be an AND gate because the output of an AND gate is "High"

(1) only when all its inputs are "High." Such output is called as "active High output". If instead of

AND gate, the NAND gate is connected the output will be "Low" (0) only when all its inputs are

"High". Such output is called as "active low output". A slightly more complex decoder would be

the n-to-2n type binary decoders. These type of decoders are combinational circuits that convert

binary information from 'n' coded inputs to a maximum of 2n unique outputs.

We say a maximum of 2n outputs because in case the 'n' bit coded information has

unused bit combinations, the decoder may have less than 2n outputs.

http://en.wikipedia.org/wiki/Logic_gate


http://en.wikipedia.org/wiki/Binary-coded_decimal

http://en.wikipedia.org/wiki/Multiplexing

http://en.wikipedia.org/wiki/Memory


http://en.wikipedia.org/wiki/Binary_coding

http://en.wikipedia.org/wiki/Bit

39

We can have 2-to-4 decoder, 3-to-8 decoder or 4-to-16 decoder. We can form a 3-to-8 decoder

from two 2-to-4 decoders (with enable signals).

Figure 6.1 RTL of decoder

Similarly, we can also form a 4-to-16 decoder by combining two 3-to-8 decoders. In this type of

circuit design, the enable inputs of both 3-to-8 decoders originate from a 4th input, which acts as

a selector between the two 3-to-8 decoders. This allows the 4th input to enable either the top or

bottom decoder, which produces outputs of D(0) through D(7) for the first decoder, and D(8)

through D(15) for the second decoder.

Figure 6.2 Simulation Of Decoder

40

A decoder that contains enable inputs is also known as a decoder-demultiplexer. Thus, we have a

4-to-16 decoder produced by adding a 4th input shared among both decoders, producing 16

outputs.

6.2 DEMUX

The data distributor, known more commonly as a demultiplexer or “Demux” for short, is the

exact opposite of the Multiplexer we saw in the previous tutorial. The demultiplexer converts a

serial data signal at the input to a parallel data.

Figure 6.3 RTL Of DEMUX

41

The demultiplexer takes one single input data line and then switches it to any one of a number of

individual at its output lines output lines one at a time.

Figure 6.4 Simulation Of DEMUX

6.3 RAM

Random-access memory (RAM) is a form of computer data storage. A random-access memory

device allows data items to be read and written in roughly the same amount of time regardless of

the order in which data items are accessed. In contrast, with other direct-access data storage

media such as hard disks, CD-RWs, DVD-RWs and the older drum memory, the time required to

read and write data items varies significantly depending on their physical locations on the

recording medium, due to mechanical limitations such as media rotation speeds and arm

movement delays.

Today, random-access memory takes the form of integrated circuits. Strictly speaking, modern

types of DRAM are not random access, as data is read in bursts, although the name DRAM /

RAM has stuck. However, many types of SRAM are still random access even in a strict sense.

42

RAM is normally associated with volatile types of memory (such as DRAM memory modules),

where stored information is lost if the power is removed, although many efforts have been made

to develop non-volatile RAM chips. Other types of non-volatile memory exist that allow random

access for read operations, but either do not allow write operations or have limitations on them.

These include most types of ROM and a type of flash memory called NOR-Flash.

6.3.1 TYPES OF RAM

The two main forms of modern RAM are Static Ram (SRAM), dynamic RAM (DRAM). In

SRAM, a bit of data is stored using the state of a flip-flop. This form of RAM is more expensive

to produce, but is generally faster and requires less power than DRAM and, in modern

computers, is often used as cache memory for the CPU. DRAM stores a bit of data using a

transistor and capacitor pair, which together comprise a memory cell. The capacitor holds a high

or low charge (1 or 0, respectively), and the transistor acts as a switch that lets the control

circuitry on the chip read the capacitor's state of charge or change it. As this form of memory is

less expensive to produce than static RAM, it is the predominant form of computer memory used

in modern computers.

Figure 6.5 RTL of RAM

http://en.wikipedia.org/wiki/Static_random_access_memory

http://en.wikipedia.org/wiki/Dynamic_random-access_memory

http://en.wikipedia.org/wiki/Bit

http://en.wikipedia.org/wiki/Flip-flop_(electronics)

http://en.wikipedia.org/wiki/Central_processing_unit

43

Both static and dynamic RAM are considered volatile, as their state is lost or reset when power is

removed from the system. By contrast, read-only memory (ROM) stores data by permanently

enabling or disabling selected transistors, such that the memory cannot be altered. Writeable

variants of ROM (such as EEPROM and flash memory) share properties of both ROM and RAM,

enabling data to persist without power and to be updated without requiring special equipment.

These persistent forms of semiconductor ROM include USB flash drives, memory cards for

cameras and portable devices, etc. ECC memory (which can be either SRAM or DRAM) includes

special circuitry to detect and/or correct random faults (memory errors) in the stored data,

using parity bits or error correction code.

In general, the term RAM refers solely to solid-state memory devices (either DRAM or SRAM),

and more specifically the main memory in most computers. In optical storage, the term DVD-

RAM is somewhat of a misnomer since, unlike CD-RW or DVD-RW it does not need to be

erased before reuse. Nevertheless a DVD-RAM behaves much like a hard disc drive if somewhat

slower.

Figure 6.6 Simulation of RAM

http://en.wikipedia.org/wiki/Read-only_memory

http://en.wikipedia.org/wiki/EEPROM

http://en.wikipedia.org/wiki/Flash_memory

http://en.wikipedia.org/wiki/Persistence_(computer_science)

http://en.wikipedia.org/wiki/Universal_serial_bus

http://en.wikipedia.org/wiki/ECC_memory

http://en.wikipedia.org/wiki/Parity_bit

http://en.wikipedia.org/wiki/Error_detection_and_correction#Error-correcting_code

http://en.wikipedia.org/wiki/DVD-RAM

http://en.wikipedia.org/wiki/DVD-RAM

http://en.wikipedia.org/wiki/CD-RW

http://en.wikipedia.org/wiki/DVD-RW

44

6.4 MUX

In electronics, a multiplexer is a device that selects one of several analog or digital input signals

and forwards the selected input into a single line. A multiplexer of 2n inputs has n select lines,

which are used to select which input line to send to the output. Multiplexers are mainly used to

increase the amount of data that can be sent over the network within a certain amount of time

and bandwidth. A multiplexer is also called a data selector.

Figure 6.7 RTL of MUX

An electronic multiplexer can be considered as a multiple-input, single-output switch, and a

demultiplexer as a single-input, multiple-output switch. The schematic symbol for a multiplexer

is an isosceles trapezoid with the longer parallel side containing the input pins and the short

parallel side containing the output pin.

http://en.wikipedia.org/wiki/Electronics

http://en.wikipedia.org/wiki/Analog_signal

http://en.wikipedia.org/wiki/Digital_signal

http://en.wikipedia.org/wiki/Computer_network

http://en.wikipedia.org/wiki/Bandwidth_(signal_processing)

http://en.wikipedia.org/wiki/System_analysis#Characterization_of_systems

http://en.wikipedia.org/wiki/System_analysis#Characterization_of_systems

http://en.wikipedia.org/wiki/Isosceles_trapezoid

45

The schematic on the right shows a 2-to-1 multiplexer on the left and an equivalent switch on the

right. The wire connects the desired input to the output. An electronic multiplexer makes it

possible for several signals to share one device or resource, for example one A/D converter or

one communication line, instead of having one device per input signal.

Figure 6.8 Simulation Of MUX

6.5 BUFFER

A buffer amplifier (sometimes simply called a buffer) is one that provides electrical impedance

transformation from one circuit to another. Two main types of buffer exist: the voltage buffer and

the current buffer

http://en.wikipedia.org/wiki/A/D_converter

46

6.5.1 VOLTAGE BUFFER

A voltage buffer amplifier is used to transfer a voltage from a first circuit, having a high output

impedance level, to a second circuit with a low input impedance level. The interposed buffer

amplifier prevents the second circuit from loading the first circuit unacceptably and interfering

with its desired operation. In the ideal voltage buffer in the diagram, the input resistance is

infinite, the output resistance zero (impedance of an ideal voltage source is zero). Other

properties of the ideal buffer are: perfect linearity, regardless of signal amplitudes; and instant

output response, regardless of the speed of the input signal.

If the voltage is transferred unchanged (the voltage gain Av is 1), the amplifier is a unity gain

buffer; also known as a voltage follower because the output voltage follows or tracks the input

voltage. Although the voltage gain of a voltage buffer amplifier may be (approximately) unity, it

usually provides considerable current gain and thus power gain. However, it is commonplace to

say that it has a gain of 1 (or the equivalent 0 dB), referring to the voltage gain.

As an example, consider a Thévenin source (voltage VA, series resistance RA) driving a resistor

load RL. Because of voltage division (also referred to as "loading") the voltage across the load is

only VA RL / ( RL + RA ). However, if the Thévenin source drives a unity gain buffer such as that

in Figure 1 (top, with unity gain), the voltage input to the amplifier is VA, and with no voltage

division because the amplifier input resistance is infinite. At the output the dependent voltage

source delivers voltage Av VA = VA to the load, again without voltage division because the output

resistance of the buffer is zero. A Thévenin equivalent circuit of the combined original Thévenin

source and the buffer is an ideal voltage source VA with zero Thévenin resistance.

Figure 6.9 RTL Of Buffer

http://en.wikipedia.org/wiki/Output_impedance


http://en.wikipedia.org/wiki/Input_impedance

http://en.wikipedia.org/wiki/Gain

http://en.wikipedia.org/wiki/Decibel

http://en.wikipedia.org/wiki/Thevenin%27s_theorem

http://en.wikipedia.org/wiki/Voltage_division

47

6.5.2 CURRENT BUFFER

Typically a current buffer amplifier is used to transfer a current from a first circuit, having a

low output impedance level, to a second circuit with a high input impedance level. The interposed

buffer amplifier prevents the second circuit from loading the first circuit unacceptably and

interfering with its desired operation.

In the ideal current buffer in the diagram, the input impedance is zero and the output impedance

is infinite (impedance of an ideal current source is infinite). Again, other properties of the ideal

buffer are: perfect linearity, regardless of signal amplitudes; and instant output response,

regardless of the speed of the input signal.

For a current buffer, if the current is transferred unchanged (the current gain βi is 1), the amplifier

is again a unity gain buffer; this time known as a current follower because the output

current follows or tracks the input current.

Figure 6.10- Simulation Of Buffer


http://en.wikipedia.org/wiki/Input_impedance

http://en.wikipedia.org/wiki/Gain

48

As an example, consider a Norton source (current IA, parallel resistance RA) driving a resistor

load RL. Because of current division (also referred to as "loading") the current delivered to the

load is only IA RA / ( RL + RA ). However, if the Norton source drives a unity gain buffer (bottom,

with unity gain), the current input to the amplifier is IA, with no current division because the

amplifier input resistance is zero. At the output the dependent current source delivers current

βi IA = IA to the load, again without current division because the output resistance of the buffer is

infinite. A Norton equivalent circuit of the combined original Norton source and the buffer is an

ideal current source IA with infinite Norton resistance.

6.6 MEMORY BANK

A memory bank is a logical unit of storage in electronics, which is hardware dependent. In

a computer the memory bank may be determined by the memory access controller along with

physical organization of the hardware memory slots.

In a typical synchronous dynamic random-access memory (SDRAM) or double data rate

synchronous dynamic random-access memory (DDR SDRAM), a bank consists of multiple rows

and columns of storage units and is usually spread out across several chips. In a single read or

write operation, only one bank is accessed, therefore bits in a column or a row, per bank, per chip

= memory bus width in bits (single channel). The size of a bank is further determined by bits in a

column and a row, per chip× number of chips in a bank.

http://en.wikipedia.org/wiki/Norton%27s_theorem

http://en.wikipedia.org/wiki/Current_division

http://en.wikipedia.org/wiki/Computer_hardware

http://en.wikipedia.org/wiki/Computer

http://en.wikipedia.org/wiki/Memory_controller


http://en.wikipedia.org/wiki/Double_data_rate

http://en.wikipedia.org/wiki/DDR_SDRAM

49

Figure 6.11 RTL Of Memory Bank

Some computers have several identical memory banks of RAM, and use bank switching to switch

between them. Harvard architecture computers have (at least) 2 very different banks of memory,

one for program storage and one for data storage.

http://en.wikipedia.org/wiki/Bank_switching

http://en.wikipedia.org/wiki/Harvard_architecture

50

Figure 6.12 Simulation Of Memory Bank

51

CHAPTER 7

RESULTS AND CONCLUSIONS

7.1 POWER CONSUMED WHEN ALL 8 BANKS ARE ON

7.1.1 Project

Table 7.1 Project

7.1.2 Device

Table 7.2 Device

52

7.1.3 Environment

Table 7.3 Environment

7.1.4 Default Activity

Table 7.4 Default Activity

53

7.1.5 On-Chip Power Summary

Table 7.5 On-Chip Power Summary

7.1.6 Thermal Summary

Table 7.6 Thermal Summary

7.1.7 Power Supply Summary

Table 7.7 Power Supply Summary

54

Table 7.8 Power Supply Current

7.1.8 Confidence Level

Table 7.9 Confidence Level

55

7.1.9 By Hierarchy

Table 7.10 By Hierarchy

56

7.2 POWER CONSUMED WHEN ONLY ONE MEMORY BANK IS IN USE

7.2.1. Project

Table 7.11 Project

7.2.2 Device

Table 7.12 Device

57

7.2.3 Environment

Table 7.13 Environment

7.2.4 Default Activity Rates

Table 7.14 Default Activity

58

7.2.5 On-Chip Power Summary

Table 7.15 On-Chip Power Summary

7.2.6 Thermal Summary

Table 7.16 Thermal Summary

7.2.7 Power Supply Summary

Table 7.17 Power Supply Summary

59

Table 7.18 Power Supply Current

7.2.8 Confidence Level

Table 7.19 Confidence Level

60

7.2.9 By Hierarchy

Table 7.20 By Hierarchy

7.3 CONCLUSION

This project addresses the problem of finding a memory map for firm real-time workloads in

the context of SDRAM memory controllers. Existing controllers use either a static memory map

or provide only limited configurability. We use the number of banks requests are interleaved over

as flexible configuration parameter, while previous work considers it a fixed part of the controller

architecture. We use this degree of freedom to optimize the memory configuration to the mix of

applications and their requirements. This is beneficial for the worst-case performance in terms of

bandwidth, latency and power.

61

CHAPTER 8

FUTURE SCOPE

The advantages of this controller compared to SDR SDRAM, DDR1 SDRAM and DDR2

SDRAM is that it synchronizes the data transfer, and the data transfer is twice as fast as previous,

the production cost is also very low.

We have successfully designed using Verilog HDL and synthesized using Xilinx tool.

1. DDR4 SDRAM is the 4th generation of DDR SDRAM.

2. DDR3 SDRAM improves on DDR SDRAM by using differential signalling and lower voltages

to support significant performance advantages over DDR SDRAM.

3. DDR3 SDRAM standards are still being developed and improved.

62

REFERENCES

[1] C. van Berkel, “Multi-core for Mobile Phones,” in Proc. DATE, 2009.

[2] “International Technology Roadmap for Semiconductors (ITRS),” 2009.

[3] P. Kollig et al., “Heterogeneous Multi-Core Platform for Consumer Multimedia

Applications,” in Proc. DATE, 2009.

[4] L. Steffens et al., “Real-Time Analysis for Memory Access in Media Processing SoCs : A

Practical Approach,” Proc. ECRTS, 2008.

[5] S. Bayliss et al., “Methodology for designing statically scheduled application-specific

SDRAM controllers using constrained local search, “in Proc. FPT, 2009.

[6] B. Akesson et al., “Architectures and modelling of predictable memory controllers for

improved system integration,” in Proc. DATE, 2011.

[7] J. Reineke et al., “PRET DRAM Controller: Bank Privatization for Predictability and

Temporal Isolation,” in Proc. CODES+ISSS, 2011.

[8] M. Paolieri et al., “An Analyzable Memory Controller for Hard Real-Time CMPs,”

Embedded Systems Letters, IEEE, vol. 1, no. 4, 2009.

[9] Micron Technology Inc., “DDR3-800-1Gb SDRAM Datasheet, 02/10 EN edition,” 2006.

[10] D. Stiliadis et al., “Latency-rate servers: a general model for analysis of traffic scheduling

algorithms,” IEEE/ACM Trans. Netw., 1998. [11] B. Akesson et al., “Classification and Analysis

of Predictable Memory Patterns,” in Proc.RTCSA, 2010.

[12] DDR2 SDRAM Specification, JESD79-2E ed., JEDEC Solid State Technology

Association, 2008.

[13] DDR3 SDRAM Specification, JESD79-3D ed., JEDEC Solid State Technology

Association, 2009.

63

[14] K. Chandrasekar et al., “Improved Power Modelling of DDR SDRAMs,” in Proc. DSD,

2011.

[15] B. Akesson et al., “Automatic Generation of Efficient Predictable Memory Patterns,” in

Proc. RTCSA, 2011.

Memory map selection of real time sdram controller using verilog full project report

Engineering

Transcript of Memory map selection of real time sdram controller using verilog full project report