FIFOs with BRAMs (FIFO 3)€¦ · FIFOs with BRAMs (FIFO_3) EE 560. 1. Statement of the problem 2....

Post on 14-Jun-2020

5 views 0 download

Transcript of FIFOs with BRAMs (FIFO 3)€¦ · FIFOs with BRAMs (FIFO_3) EE 560. 1. Statement of the problem 2....

By

Gandhi Puvvada

This is in continuation to the EE201L lecture on Synchronous 1-clock and 2-clock FIFOs (FIFO_1 lecture).

To go through this, it is not necessary to go through the FIFO_2lecture which deals with width & depth expansion of FIFOs.

However please do go through GRAY_1 related to GRAY Code counter design and GRAY to BINARY code conversion.

FIFOs with BRAMs (FIFO_3)

EE 560

1. Statement of the problem

2. Quick review of the FIFO_1 2-clock FIFO

3. BRAMs _ info from Xilinx

4. Impact of I_Reg & O_Reg on when to increment WP & RP and when to conveythem to the other side.

5. Consumer design. When (and how often) you can activate REN

6. FWFT

7. Implementation details

CONTENTS

In both FIFO_1 and FIFO_2 lectures (and slide-sets)we have assumed (for convenience) that the FIFO storage is made up of an ARRAY of REGISTERS.

We all know that, for large FIFOs, we can't afford anArray of Registers. It has to be a memory structure.

It has to be a SSRAM (or a BRAM in the case of a FPGA).

In this lecture, we discuss what we need to do to account for (a) the IReg (Input Register) in the case of a Flow-Through BRAM and (b) the IReg (Input Register) and the OReg (Output Register) in the case of a Pipelined BRAM.

Question #1 of EE560 Final Exam of Summer 2012 is related to the current discussion. So we will review the question and the solution as part of this lecture.

First let us review the 2-clock FIFO from the FIFO_1 lecture.

FULL and EMPTY:

In a single-clock FIFO we have two options. We can choose to use two n-bit counters (for WP and RP for a 2**n locations deep FIFO) and a FF to remember if most recently the FIFO was running around AF (Almost Full) condition or AE (Almost Empty) condition to resolve the ambiguity caused by WP = RP which can represent FULL or EMPTY. Or we use two (n+1)-bit counters. Then the (WP=RP) represents the empty condition only and the (WP-RP) mod 2**(n+1) = 2**n represents the FULL condition.

In a two-clock FIFO, the (n+1)-bit counters option is the only option. Otherwise we can cause deadlock.

From Question #1 of EE560 Final Exam of Summer 2012

16-location Register Array

Writes complete at the end of the ____________ (current/next) clock.Reads are _________________ (synchronous/asynchronous).

Compare the three (1) a Register Array, (2) a Flow-Through BRAM, and (3) a Pipelined BRAM

Register Array: Glitches on WENQ are OK.Writes complete at the end of the current clock. Reads are asynchronous and continuous. Reads incur only a mux delay.

Flow-Through BRAM: Glitches on WENQ are OK.Writes complete at the end of the next clock. Reads are synchronous and incur one clock delay.

Pipelined BRAM: Glitches on WENQ are OK.Writes complete at the end of the next clock. Reads are synchronous and incur two clock delays.

SOLUTION

Spartan-6 FPGA Block RAM Resources User Guidehttp://www.xilinx.com/support/documentation/user_guides/ug383.pdf

For simplicity (and to be portable to other FPGAs and ASICs), in our designs we assume that no such latch is available. So we use the default Write First mode of the Xilinx BRAM which keeps the latch transparent.

For simplicity and to make our design portable to other FPGAs and ASICs, we ignore the availability of this output register enable control. Moreover, we do not instantiate the BRAM. We code in such a way that a BRAM is inferred with no such control.

Flow-Through BRAM or Pipelined BRAM

Writes complete at the end of the next clock.

Since actual writing happens inside the memory due to internally generated timing pulses during the next (subsequent) clock, It is proper to wait until the very end of that next clock.

So is it OK to increment the WP immediately and convey the incremented WP to the consumer immediately? ?

Mr. Bruin: I would increment WP at the end of that subsequent clock.

Miss Bruin: I do not think that we need to delay incrementation of the WP as two clocks are lost in synchronization of the WP to form WPss anyways. So it is never a premature signalling.

Mr. Trojan: Both of you are wrong.

Mr. Trojan: Both of you are wrong.

Our design should allow the producer to be able to write data on consecutive clocks, if he had data ready and if there is space in the FIFO. So WP should be updated on every clock (not once in two clocks). So Mr. Bruin is wrong.

The RCLK may be much faster than the WCLK. Say RCLK is running at 100MHz and WCLK is running at 1 MHz. Then two clocks of RCLK are only 20ns which is a very small fraction of one clock of the WCLK which is running at 1 us (1000 ns). So Miss Bruin is wrong to think that there is no premature signalling.

Mr. Trojan's SOLUTION: Delay the conveyance of the WP by 1 clock of the WCLK by using a register as a delay register clocked by the WCLK.

Show the design below. Would you place the delay FF at A or B or C?

B C

?

A

Mr. Trojan's SOLUTION: Delay the conveyance of the WP by 1 clock of the WCLK by using a register as a delay register clocked by the WCLK.

Show the design below. Would you place the delay FF at A or B or C?

B CA

OK, any changes in the consumer side?

Compared to the register array, reads incur one clock delay in the case of a Flow-Through BRAM and incur two clocks delay in the case of a Pipelined BRAM.

Question regarding the consumer design:

So should we have a simplistic consumer design where, once we ascertain that the FIFO is not empty and that our downstream parts are ready for consumption, initiate a read-enable and wait for 1 or 2 clocks and then receive the data and consume?Shall we design a state machine to do this?

?

Question regarding RP pointer incrementation and conveyance of the pointer to WCLK domain

Shall we increment the read pointer RP immediately at the end of the clock in which we activate the REN (Read Enable) or when we actually receive the data one or two clocks after?

Shall we convey the incremented RP to the producer in the write-clock domain immediately or one or two clocks late as we did not yet complete the initiated consumption?

Well perhaps that depends upon whether you increment RP immediately or after one or two clocks, isn't it?

?

SOLUTION to the question regarding the consumer design:

A simplistic consumer design using a state machine would be very inefficient as you would take 2 or 3 clocks to consume one data item. In the lab you are provided with an inefficient consumer design to illustrate this.

A good consumer design should allow consumption on every clock. No question about it as otherwise we defeat the very purpose of the FIFO. So there shall be no state machine or there shall be a mealy-type state machine where at times you stay in one state consuming one item per clock on consecutive clocks.

Question: How is it possible to consume on every clock if it takes more than one clock to read an item?

Answer: Consider the following small shop selling DVD Players.

Let us say, the DVD players are imported from Japan and sold here in Los Angles. It takes a week to deliver our orders, so we always place orders in advance and stock the DVD players. So how to manage to place orders in advance in such a way that we will not have difficulty in storing them when they are delivered.

We place an order depending upon the space available in our storage minus the orders in the pipe. If there is place to store 10 more DVD players, and only three DVD players are in the pipe, we can place an order for 7 DVD players.

Question: How is it possible to consume on every clock if it takes more than one clock to read an item?

Answer: Going by the DVD Player shop analogy, we need to have small store besides the BRAM-based FIFO to store the items initiated to be read. Say a small Register-based FIFO of 4 locations is maintained in the consumer itself.

Consumer can only initiate reading of one data item in a clock by activating REN (Read Enable). So, if the BRAM FIFO (the big FIFO) is not empty, and if the Register-based FIFO (the small FIFO) has extra space after accounting for the orders in pipe, then we can activate the REN.

OK, How do we keep track of the orders in the pipe?

Corresponding to each of the delay causing registers (IReg or IREG and OReg), you keep a VALID Flip-Flop (let us call it Valid_1 (for IReg) and Valid_2 (for OReg if preset). Send a one into it when REN is activated.

Under clock edge, you need to do the following:

-- for a Flow-Through FIFO with IReg onlyValid_1 <= REN_Internal;

-- for a pipelined FIFO with IReg and ORegValid_1 <= REN_Internal;Valid_2 <= Valid_1;

RENQ

VALID_1

IReg

RENQ

VALID_1 VALID_2

IReg OReg

Whatever you do in the consumer design, make sure that the order of consumption is the original order of production. That is the essence of the First In First Out (FIFO)!

FWFT = First Word Fall ThroughRENQ

VALID_1 VALID_2

IReg OReg

Can we do FWFT (First Word Fall Through) only for the small FIFO or both the big and small FIFOs?

How about the single-clock FIFO and the two-clock FIFO?

What is meant by "Snake Path" (a term coined by our

Question regarding RP pointer incrementation and conveyance of the pointer to WCLK domain

Shall we increment the read pointer RP immediately at the end of the clock in which we activate the REN (Read Enable) or when we actually receive the data one or two clocks after?

Shall we convey the incremented RP to the producer in the write-clock domain immediately or one or two clocks late as we did not yet complete the initiated consumption?

Well perhaps that depends upon whether you increment RP immediately or after one or two clocks, isn't it?

Question regarding RP pointer incrementation and conveyance of the pointer to WCLK domain

Answer: First we need to increment the RP pointer immediately since we hope to make the next read request on the very next clock.

Then we should refrain from conveying this new incremented RP to the WCLK domain immediately as we have not removed the data item from the memory array yet. If we do not delay conveying the RP, if the FIFO is running full and if the WCLK is much faster, then we run the risk of overwriting the oldest item (the item being read, before it is read).

So conclusion: delay the RP by one clock of RCLK before

SOLUTION

RENQ RENQ

VALID_1 VALID_ VALID_2

IReg IReg OReg

Same?

Or two-clock delay?

RENQ

VALID_ VALID_2

IReg OReg

Same?

Or two-clock delay?

SOLUTION

Same. 1-clock delay

Treat it as part of the consumer.This can not possibly be over-writtenby the producer.

Show your design.Would you place the delay FF at A or B or C?

BC A

Show your design.Would you place the delay FF at A or B or C?

BC

SOLUTION

A

Question #1 of EE560 Final Exam of Summer 2012

Partly discussed in the GRAY_1 lecture

From the GRAY_1 lecture

Compare the following two designs with the one on the previous page, state if they are right or wrong. If they are right, then state whether they are faster or slower designs, cheaper or expensive.

Right / WrongFaster / SlowerCheaper / Expensive

Right / WrongFaster / SlowerCheaper / Expensive

The Gray code counter lags by 1 clock. After reset:BIN: 0 1 2 3 4 5 6GRAY: 0 0 1 2 3 4 5

Here, for our BRAM-based FIFO, we need a binary counter as well as a Gray Code counter.

We need to delay the Gray code by 1 clock either explicitly or indirectly.

So the following does both generation and delaying of the gray code. Shown below is WP_Gray_delayed.RP_Gray_delayed can be produced in a similar fashion.

Q #1.5.2.2 (and solution) of EE560 Final Su 2012

Q#1.5.2.3 (and solution) of EE560 Final Su 2012

Q#1.5.2.4 of EE560 Final Su 2012

Q#1.5.2.4 solution of EE560 Final Su 2012