FIFOs with BRAMs (FIFO 3)€¦ · FIFOs with BRAMs (FIFO_3) EE 560. 1. Statement of the problem 2....

Gandhi Puvvada

This is in continuation to the EE201L lecture on Synchronous 1-clock and 2-clock FIFOs (FIFO_1 lecture).

To go through this, it is not necessary to go through the FIFO_2lecture which deals with width & depth expansion of FIFOs.

However please do go through GRAY_1 related to GRAY Code counter design and GRAY to BINARY code conversion.

FIFOs with BRAMs (FIFO_3)

EE 560

1. Statement of the problem

2. Quick review of the FIFO_1 2-clock FIFO

3. BRAMs _ info from Xilinx

4. Impact of I_Reg & O_Reg on when to increment WP & RP and when to conveythem to the other side.

5. Consumer design. When (and how often) you can activate REN

6. FWFT

7. Implementation details

CONTENTS

In both FIFO_1 and FIFO_2 lectures (and slide-sets)we have assumed (for convenience) that the FIFO storage is made up of an ARRAY of REGISTERS.

We all know that, for large FIFOs, we can't afford anArray of Registers. It has to be a memory structure.

It has to be a SSRAM (or a BRAM in the case of a FPGA).

In this lecture, we discuss what we need to do to account for (a) the IReg (Input Register) in the case of a Flow-Through BRAM and (b) the IReg (Input Register) and the OReg (Output Register) in the case of a Pipelined BRAM.

Question #1 of EE560 Final Exam of Summer 2012 is related to the current discussion. So we will review the question and the solution as part of this lecture.

First let us review the 2-clock FIFO from the FIFO_1 lecture.

FULL and EMPTY:

In a single-clock FIFO we have two options. We can choose to use two n-bit counters (for WP and RP for a 2**n locations deep FIFO) and a FF to remember if most recently the FIFO was running around AF (Almost Full) condition or AE (Almost Empty) condition to resolve the ambiguity caused by WP = RP which can represent FULL or EMPTY. Or we use two (n+1)-bit counters. Then the (WP=RP) represents the empty condition only and the (WP-RP) mod 2**(n+1) = 2**n represents the FULL condition.

In a two-clock FIFO, the (n+1)-bit counters option is the only option. Otherwise we can cause deadlock.

From Question #1 of EE560 Final Exam of Summer 2012

16-location Register Array

Writes complete at the end of the ____________ (current/next) clock.Reads are _________________ (synchronous/asynchronous).

Compare the three (1) a Register Array, (2) a Flow-Through BRAM, and (3) a Pipelined BRAM

Register Array: Glitches on WENQ are OK.Writes complete at the end of the current clock. Reads are asynchronous and continuous. Reads incur only a mux delay.

Flow-Through BRAM: Glitches on WENQ are OK.Writes complete at the end of the next clock. Reads are synchronous and incur one clock delay.

Pipelined BRAM: Glitches on WENQ are OK.Writes complete at the end of the next clock. Reads are synchronous and incur two clock delays.

SOLUTION

Spartan-6 FPGA Block RAM Resources User Guidehttp://www.xilinx.com/support/documentation/user_guides/ug383.pdf

For simplicity (and to be portable to other FPGAs and ASICs), in our designs we assume that no such latch is available. So we use the default Write First mode of the Xilinx BRAM which keeps the latch transparent.

For simplicity and to make our design portable to other FPGAs and ASICs, we ignore the availability of this output register enable control. Moreover, we do not instantiate the BRAM. We code in such a way that a BRAM is inferred with no such control.

Flow-Through BRAM or Pipelined BRAM

Writes complete at the end of the next clock.

Since actual writing happens inside the memory due to internally generated timing pulses during the next (subsequent) clock, It is proper to wait until the very end of that next clock.

So is it OK to increment the WP immediately and convey the incremented WP to the consumer immediately? ?

Mr. Bruin: I would increment WP at the end of that subsequent clock.

Miss Bruin: I do not think that we need to delay incrementation of the WP as two clocks are lost in synchronization of the WP to form WPss anyways. So it is never a premature signalling.

Mr. Trojan: Both of you are wrong.

Our design should allow the producer to be able to write data on consecutive clocks, if he had data ready and if there is space in the FIFO. So WP should be updated on every clock (not once in two clocks). So Mr. Bruin is wrong.

The RCLK may be much faster than the WCLK. Say RCLK is running at 100MHz and WCLK is running at 1 MHz. Then two clocks of RCLK are only 20ns which is a very small fraction of one clock of the WCLK which is running at 1 us (1000 ns). So Miss Bruin is wrong to think that there is no premature signalling.

Mr. Trojan's SOLUTION: Delay the conveyance of the WP by 1 clock of the WCLK by using a register as a delay register clocked by the WCLK.

Show the design below. Would you place the delay FF at A or B or C?

Mr. Trojan's SOLUTION: Delay the conveyance of the WP by 1 clock of the WCLK by using a register as a delay register clocked by the WCLK.

Show the design below. Would you place the delay FF at A or B or C?

OK, any changes in the consumer side?

Compared to the register array, reads incur one clock delay in the case of a Flow-Through BRAM and incur two clocks delay in the case of a Pipelined BRAM.

Question regarding the consumer design:

So should we have a simplistic consumer design where, once we ascertain that the FIFO is not empty and that our downstream parts are ready for consumption, initiate a read-enable and wait for 1 or 2 clocks and then receive the data and consume?Shall we design a state machine to do this?

Question regarding RP pointer incrementation and conveyance of the pointer to WCLK domain

Shall we increment the read pointer RP immediately at the end of the clock in which we activate the REN (Read Enable) or when we actually receive the data one or two clocks after?

Shall we convey the incremented RP to the producer in the write-clock domain immediately or one or two clocks late as we did not yet complete the initiated consumption?

Well perhaps that depends upon whether you increment RP immediately or after one or two clocks, isn't it?

SOLUTION to the question regarding the consumer design:

A simplistic consumer design using a state machine would be very inefficient as you would take 2 or 3 clocks to consume one data item. In the lab you are provided with an inefficient consumer design to illustrate this.

A good consumer design should allow consumption on every clock. No question about it as otherwise we defeat the very purpose of the FIFO. So there shall be no state machine or there shall be a mealy-type state machine where at times you stay in one state consuming one item per clock on consecutive clocks.

Question: How is it possible to consume on every clock if it takes more than one clock to read an item?

Answer: Consider the following small shop selling DVD Players.

Let us say, the DVD players are imported from Japan and sold here in Los Angles. It takes a week to deliver our orders, so we always place orders in advance and stock the DVD players. So how to manage to place orders in advance in such a way that we will not have difficulty in storing them when they are delivered.

We place an order depending upon the space available in our storage minus the orders in the pipe. If there is place to store 10 more DVD players, and only three DVD players are in the pipe, we can place an order for 7 DVD players.

Question: How is it possible to consume on every clock if it takes more than one clock to read an item?

Answer: Going by the DVD Player shop analogy, we need to have small store besides the BRAM-based FIFO to store the items initiated to be read. Say a small Register-based FIFO of 4 locations is maintained in the consumer itself.

Consumer can only initiate reading of one data item in a clock by activating REN (Read Enable). So, if the BRAM FIFO (the big FIFO) is not empty, and if the Register-based FIFO (the small FIFO) has extra space after accounting for the orders in pipe, then we can activate the REN.

OK, How do we keep track of the orders in the pipe?

Corresponding to each of the delay causing registers (IReg or IREG and OReg), you keep a VALID Flip-Flop (let us call it Valid_1 (for IReg) and Valid_2 (for OReg if preset). Send a one into it when REN is activated.

Under clock edge, you need to do the following:

-- for a Flow-Through FIFO with IReg onlyValid_1 <= REN_Internal;

-- for a pipelined FIFO with IReg and ORegValid_1 <= REN_Internal;Valid_2 <= Valid_1;

VALID_1

VALID_1 VALID_2

IReg OReg

Whatever you do in the consumer design, make sure that the order of consumption is the original order of production. That is the essence of the First In First Out (FIFO)!

FWFT = First Word Fall ThroughRENQ

VALID_1 VALID_2

IReg OReg

Can we do FWFT (First Word Fall Through) only for the small FIFO or both the big and small FIFOs?

How about the single-clock FIFO and the two-clock FIFO?

What is meant by "Snake Path" (a term coined by our

Shall we increment the read pointer RP immediately at the end of the clock in which we activate the REN (Read Enable) or when we actually receive the data one or two clocks after?

Shall we convey the incremented RP to the producer in the write-clock domain immediately or one or two clocks late as we did not yet complete the initiated consumption?

Well perhaps that depends upon whether you increment RP immediately or after one or two clocks, isn't it?

Answer: First we need to increment the RP pointer immediately since we hope to make the next read request on the very next clock.

Then we should refrain from conveying this new incremented RP to the WCLK domain immediately as we have not removed the data item from the memory array yet. If we do not delay conveying the RP, if the FIFO is running full and if the WCLK is much faster, then we run the risk of overwriting the oldest item (the item being read, before it is read).

So conclusion: delay the RP by one clock of RCLK before

SOLUTION

RENQ RENQ

VALID_1 VALID_ VALID_2

IReg IReg OReg

Or two-clock delay?

VALID_ VALID_2

IReg OReg

Or two-clock delay?

SOLUTION

Same. 1-clock delay

Treat it as part of the consumer.This can not possibly be over-writtenby the producer.

Show your design.Would you place the delay FF at A or B or C?

SOLUTION

Question #1 of EE560 Final Exam of Summer 2012

Partly discussed in the GRAY_1 lecture

From the GRAY_1 lecture

Compare the following two designs with the one on the previous page, state if they are right or wrong. If they are right, then state whether they are faster or slower designs, cheaper or expensive.

Right / WrongFaster / SlowerCheaper / Expensive

The Gray code counter lags by 1 clock. After reset:BIN: 0 1 2 3 4 5 6GRAY: 0 0 1 2 3 4 5

Here, for our BRAM-based FIFO, we need a binary counter as well as a Gray Code counter.

We need to delay the Gray code by 1 clock either explicitly or indirectly.

So the following does both generation and delaying of the gray code. Shown below is WP_Gray_delayed.RP_Gray_delayed can be produced in a similar fashion.

Q #1.5.2.2 (and solution) of EE560 Final Su 2012

Q#1.5.2.3 (and solution) of EE560 Final Su 2012

Q#1.5.2.4 of EE560 Final Su 2012

Q#1.5.2.4 solution of EE560 Final Su 2012

FIFOs with BRAMs (FIFO 3)€¦ · FIFOs with BRAMs (FIFO_3) EE 560. 1. Statement of the problem 2....

Documents

Transcript of FIFOs with BRAMs (FIFO 3)€¦ · FIFOs with BRAMs (FIFO_3) EE 560. 1. Statement of the problem 2....

Asynchronous FIFOs - University of California, Berkeleycs150/fa10/Collections/Discussion/...•This week: {Synchronous, Asynchronous*} FIFOs * JohnW covered Synchronous FIFOs, so we’ll

Asynchronous Communications Element With 64-Byte FIFOs And

BRAMS : status and perspectives

Fifo Method

FIFO Generator v13 - Xilinx · 2021. 2. 4. · FIFO Generator v13.0 6 PG057 November 18, 2015 Chapter 1: Overview AXI Interface FIFOs AXI interface FIFOs are derived from the Native

FIFO Costing

PC16550D Universal Asynchronous … PC16550D Universal Asynchronous Receiver/Transmitter with FIFOs June 1995 PC16550D Universal Asynchronous Receiver/Transmitter with FIFOs†

clarinet trio brams

FIFO Generator v12 - · PDF fileFIFO Generator v12.0 2 PG057 June 24, 2015 Table of Contents IP Facts Chapter 1: Overview Native Interface FIFOs

Enhancing SegHidro/BRAMS experience through EELA

Recent advances in RAMS cloud microphysics and - brams

Status of the BRAMS activities Hervé Lamy

Single- and Dual-Clock FIFO Megafunction User Guide · Single- and Dual-Clock FIFO Megafunction User Guide May 2007 Features To help you efficiently implement FIFOs in your design,

TLPI - Chapter 44 Pipe and Fifos

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs ...cs150/sp09/Lecture/lec24-blocks.pdf · EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ... • S is “set”

Fifo Valuation

historico do desenvolvimento computacional do brams

Implementing FPGA overlay NoCs using the Xilinx UltraScale ...as FIFOs, cascading is useful to construct deeper FIFOs or to combine data from multiple FIFOs into a single output stream.

RAMS/BRAMS Basic equations and some numerical issues.

USING CATT-BRAMS FOR NUMERICAL SIMULATION OF THE …brams.cptec.inpe.br/~rbrams/RAMS_BRAMS_OLAM_6th_workshop/S… · CATT-BRAMS 1,2: modelo de transporte de aerossóis e traçadores

Asynchronous FIFOs - University of California, Berkeleycs150/fa10/Collections/Discussion/...•This week: {Synchronous, Asynchronous} FIFOs JohnW covered Synchronous FIFOs, so we’ll