Digital System Design with PLDs and FPGAs Field ...nptel.ac.in/courses/117108040/downloads/Field...
Transcript of Digital System Design with PLDs and FPGAs Field ...nptel.ac.in/courses/117108040/downloads/Field...
1
111
Digital System Design with PLDs and FPGAs
Field Programmable Gate Arrays
Kuruvilla Varghese
DESE
Indian Institute of Science
Kuruvilla Varghese
222Topics
• FPGA Architecture (Xilinx, Altera, Actel)
• FPGA related Design issues
• FPGA related Timing issues
• Tool Flow
• FPGA Configuration
• SoPC
• Debugging
• Case Study
Kuruvilla Varghese
2
333Field Programmable Gate Arrays
• ASIC, MPGA/Standard Cell, FPGA
• Volumes, NRE cost, Turn around time
• Array of logic resources with programmable interconnection.
• Logic resources (Combinational, Flip flops)
• Combinational: LUT, Multiplexers, Gates
• Programmable interconnections: SRAM, Flash, Anti-fuse
• Special Resources: PLL/DLL, RAMs, FIFOs,
• Memory Controllers, Network Interfaces, Processors
Kuruvilla Varghese
444Commercial FPGA’s
• Xilinx
– Spartan-3, Spartan-6
– Virtex-4, Virtex-5, Virtex-6
– Artix-7, Kintex-7, Virtex-7, Zynq
• Altera
– Cyclone, Cyclone II, Cyclone III, Cyclone IV, Cyclone V
– Arria II, Arria V
– Stratix II, Stratix III, Startix IV, Startix V
Kuruvilla Varghese
3
555Commercial FPGA’s
• Actel
– Axcelerator (Antifuse)
– IGLOO, IGLOOE (Flash)
– ProASIC Plus (Flash)
– ProASIC3, ProASIC3E (Flash)
– RTAX (Radiation Tolerant, Anti-fuse)
– RTSX -SU (Radiation Tolerant, Anti-fuse)
– Smart Fusion, Smart Fusion 2 (ARM Cortex – M3)
Kuruvilla Varghese
666Structure of an FPGA
Kuruvilla Varghese
4
777Structure of an FPGA
Kuruvilla Varghese Source: Xilinx Data Sheets
888Detailed View
Kuruvilla Varghese
CLB
SB
CLB
SB
CLB
SB
CLB CLB CLB
5
999Switch Block
Kuruvilla Varghese
101010Types of switch blocks
Kuruvilla Varghese
6
111111FPGA
Kuruvilla Varghese Source: Xilinx Data Sheets
121212FPGA
• I/O Blocks (Tri-state output / Input, Synchronizing Flip-
flops)
• Array of Configurable Logic Blocks
• Horizontal and Vertical wires with programmable switches
in between
• Single length, Double length, Quad, Hex and Long lines
• Resources available to user
• Resources for configuring programmable switches in the
interconnect structures and Logic blocks
Kuruvilla Varghese
7
131313Programmable Connections
• SRAM (Pass Transistor)
• Flash
• Antifuse
Kuruvilla Varghese
141414SRAM (Pass Transistor)
Kuruvilla Varghese Source: Xilinx Data Sheets
8
151515Pass Transistor with configuration cell
• Flip-Flop to store the switch status (4 Transistors)
• Write Transistor to write Configuration status
• Total: 6 Transistors
• FFs controlling the Switches are organized as SRAM hence the name
Kuruvilla Varghese
Pass Transistor
Flip-Flop
Write Transistor
161616Flash Transistor
• MOS transistor with a floating gate
• Conducts when not programmed off
• Can be electrically programmed ‘off’ or ‘on’
Kuruvilla Varghese
9
171717Flash Transistor
Kuruvilla Varghese
181818Flash Cell Write
Kuruvilla Varghese
10
191919Flash Cell Erase
Kuruvilla Varghese
202020Anti-fuse
Kuruvilla Varghese
11
212121Programmable Connections
Name Volatile Re-programm-
able
Delay Area
Flash No In-circuit Large Medium
SRAM Yes In-circuit Large Large
Anti-fuse No No Small Small
Kuruvilla Varghese
222222Logic Block size
• Coarse grain
– Owing to SRAM interconnection area (6 transistors) the
Logic Blocks are made large in SRAM based FPGA
– Utilization is made high with configurability within the logic
block
• Fine Grain
– Since the antifuse occupies less area and has less time delay,
antifuse based FPGA’s employs smaller size logic blocks
Kuruvilla Varghese
12
232323Logic Cell Structure – Coarse Grain
Kuruvilla Varghese Source: Xilinx Data Sheets
242424Logic Cell Structure – Fine Grain
Kuruvilla Varghese
13
252525Design Methodology
Kuruvilla Varghese
HDL Source
Synthesis
PAR/Fitting
Programming
Functional
Simulation
Logic Simulation
Static Timing
Analysis
Timing Simulation
Constraints
Timing
ModelConfiguration
File
Equations/Netlists
262626Structure of an FPGA
Kuruvilla Varghese Source: Xilinx Data Sheets
14
272727Commercial Tools
• Simulators– ModelSim (Mentor Graphics)
– Active HDL (Aldec)
• Synthesis Tools– Synplify Pro (Synopsys)
– Precision Synthesis (Mentor Graphics)
• Vendor Tools– Xilinx ISE (Synthesis, Simulation, PAR, Programming, …)
– Xilinx Vivado (Synthesis, Simulation, PAR, Programming, …)
– Altera Quartus II (Synthesis, Simulation, PAR, Programming, …)
– Actel Libero (Synthesis, Simulation, PAR, Programming, …)
Kuruvilla Varghese
282828Commercial Tools
• Cadence Suite
• Synopsis Suite
• Mentor Graphics Suite
Kuruvilla Varghese
15
292929Xilinx Virtex FPGA
• SRAM based programmable connections, configuration
• LUT based combinational Logic
• Flip-Flops with sync/async reset/preset
• Large Configurable Logic Cells (CLBs)
• Block RAM (SPRAM, DPRAM, FIFO)
• LUT as Distributed RAM
• Low skew clock trees, DLL, Tri-state gates for Buses
• Carry Chains / Cascade Chains
• JTAG, Serial, and Parallel Configuration schemes
• I/O Blocks (Registered / Non-registered)
• Multiple I/O standards
Kuruvilla Varghese
303030Xilinx Virtex FPGA
Kuruvilla Varghese
Present day FPGAs use PLL
instead of DLL and has DSP
blocks for fixed point arithmetic
Source: Xilinx Data Sheets
16
313131Virtex CLB
Kuruvilla Varghese Source: Xilinx Data Sheets
323232LUT
• Address lines as inputs, data line as output (read mode)
• Truth table written during configuration (write)
• 4 input, 6 input LUTs
• Fixed AND, Programmable OR
Kuruvilla Varghese
00 0
01 1
10 1
11 0
A1
A0
D0
X
Y
X XOR Y
17
333333FPGA Configuration / Programming
• Writing to configuration memory
• Configuring options in Logic blocks
– Writing LUTs with truth tables
– Combining LUTs,
– Using LUTs as memory
– Selecting clocks, Set/Reset for FFs
– Configuring Various Muxes in Slices
• Using special resources (RAM, FIFOs, PLLs)
• Programming Switch matrices
• Programming I/O blocksKuruvilla Varghese
343434Virtex Family
Kuruvilla Varghese Source: Xilinx Data Sheets
18
353535Important Specifications
• CLB Array, Block RAM Bits
• User I/O, Differential I/O
• Distributed RAM Bits can be calculated from number of
CLBs (multiply by 4 x 64)
• System gates and logic gates are not useful, as these are
equivalent gate counts, it is useless to compare across
vendors
Kuruvilla Varghese
363636Structure of an FPGA
Kuruvilla Varghese
19
373737Virtex CLB
Kuruvilla Varghese Source: Xilinx Data Sheets
Kuruvilla Varghese Source: Xilinx Data Sheets
20
3939394 input LUT and Flip-Flops
• Use LUT and FFs independently
• Use LUT followed with FFs
Kuruvilla Varghese
I3
I2 O
I1
I0
S
D Q
CK
AR
I3
I2 O
I1
I0
S
D Q
CK
AR
4040404 input LUT and Flip-Flops
• Independent LUT Outputs: X, Y
• Dedicated inputs to FF: BX, BY
Kuruvilla Varghese
21
4141415 input LUT
• Two 4 input LUTs are Muxed for 5 input LUT using F5 Mux.
Select line is connected to BX and hence cannot use bottom FF
independently. F5 Mux output is connected to this FF.
Kuruvilla Varghese
I3
I2 O
I1
I0
I3
I2 O
I1
I0
I4
F5
4242426 Input LUT
• Two 5 inputs are Muxed using F6 for a 6 input LUT. Select line is connected to BY and hence cannot use top FF independently. F6 Mux output is connected to this FF.
Kuruvilla Varghese
I3
I2 O
I1
I0
I3
I2 O
I1
I0
F5
I3
I2 O
I1
I0
I3
I2 O
I1
I0
F5
F6
I4 I4
I5
22
434343Cascading LUTs
• 5 inputs and 6 inputs LUT using F5 and F6 muxes are required
in most general case, considering all possible minterms
• But in a specific case of 6 input LUT can be implemented using
cascade of two LUTs
Kuruvilla Varghese
4444446 inputs using 2 LUTs
Y = ABCDE or ABCDF
Y = (ABCD) and (E or F)
ABCD = X
Y = X and E or F
Kuruvilla Varghese
A
B
C
D
E
F
X
Truth Table
ABCD
Truth Table
X or E or F
23
4545455 inputs using 2 cascaded LUTs
Y = ABCDE
Y = (ABCD) and E
ABCD = X
Y = X and E
Kuruvilla Varghese
A
B
C
D
E
X
Truth Table
ABCD
Truth Table
X and E
4646465 inputs using 2 cascaded LUTs
Y = ABCDE or AB/CDE/
Y = (ABCD) and E
ABCD = X
Y = X and E
Kuruvilla Varghese
A
B
C
D
E
X
Truth Table
ABCD
Truth Table
X and E
24
4747475 inputs using 5 input LUT
Y = ABCD xor E
ABCD = Z
Y = ZE/ and Z/E
Kuruvilla Varghese
I3
I2 O
I1
I0
I3
I2 O
I1
I0
E
F5
A
B
C
D
A
B
C
D
Y
484848Virtex CLB: LUT
• LUT and FF can be used separately or together
• 4, 4 inputs LUTs
• 5 inputs LUT from two 4 inputs LUTs using F5 Mux
• 6 inputs LUT from two 5 inputs LUTs through F6 Mux
• Four 4 inputs LUTs / Two 5 inputs LUTs / One 6 inputs LUT
• FF: Sync/Async Set-Reset, Clock Enable– Since, both set and reset is available. Registers can be
initialized to any value, without extra overhead.
Kuruvilla Varghese
25
494949LUT as RAM
• General routing lines can be used to write LUT through the LUT
RAM write control circuit to use LUT as Distributed RAM
Kuruvilla Varghese
I3
I2 O
I1
I0
LUT
RAM
Write
505050Virtex CLB: LUT as distributed RAM
• LUT is written while configuring FPGA, when used for logic
implementation.
• Write control signals are available to be connected to routing wires so that
it can be used a s RAM when it is not used for logic implementation.
• Four 16x1 distributed RAM per CLB
• These can be combined to make various memory sizes and data widths.
• Since it is spread across CLBs, it is called Distributed RAM
• Since, it is spread across, access latency can vary and should be careful, if
you use it without read registering.
Kuruvilla Varghese
26
515151Carry Chain
• Adder
• Requires two lookup tables (Si and Ci+1) at each stage.
• This along with routing makes adder big and slow
• Hence dedicated carry chain to make adder faster,
implementing Ci+1.
iiiiii
iiii
CBABAC
CBAS
)(1 ⊕+=
⊕⊕=
+
Kuruvilla Varghese
525252Carry Chain
iiiiiiCBABAC )(1 ⊕+=
+
Kuruvilla Varghese
⊕
LUT
Ci
Si
Ci+1
0 1
Ai
Bi
27
Kuruvilla Varghese Source: Xilinx Data Sheets
545454Carry Chain
• For adders use the operator ‘+’ to be able to use carry chains.
• For higher level functions like counters etc; synthesis tool
infer and use carry chains.
• The AND gate combining Ai and Bi shown in Slice diagram
is for partial product generation in multipliers
• In some FPGAs, carry chain has features to cascade
(AND/OR) the LUT outputs.
Kuruvilla Varghese
28
555555Control of Sequential Circuits
Kuruvilla Varghese
Reg /
Counter /
…
FSM /
Contr-
oller
clk
en (RA_L)
565656Clock Gating
Kuruvilla Varghese
D7:0
RA_E
RA_L
CLK
D Q
CKCLK’
CLK
RA-L
CLK’
29
575757Re-circulating Multiplexer
Kuruvilla Varghese
D7:0
RA_ERA_L
CLK
D Q
CK
0
1
CLK
RA-L
Register write on the clock edge
585858
Kuruvilla Varghese
D Q
CK
D Q
CK
0
1
CE
if (clk’event and clk = ‘1’) then
if (cntrl_sig = ‘1’) then
q <= d;
end if;
end if;
Re-circulating Multiplexer
30
595959Clock Gating for low power
Kuruvilla Varghese
CLK
D7:0
RA_E
RA_L
D Q
CK
D Q
CK
CLK1CLK2
CLK
RA-L
CLK1
CLK2
606060Combinational Circuit Mapping
One or More LUTS
Kuruvilla Varghese
Comb
31
616161Sequential Circuit Mapping
One or More LUTS
Kuruvilla Varghese
One or more Flip Flops
CombFF FF
626262Counter, FSM Mapping
Kuruvilla Varghese
FFNSL OL
One or more LUTs
One or More FFs
32
636363Virtex IOB
Kuruvilla Varghese Source: Xilinx Data Sheets
646464Virtex IOB
• Three paths: Output, Input, Tri-state enable
• Direct or Through Flip-flops (synchronization)
• Flip-Flops: Set/reset, Clock enable, Clock selection
• Programmable delay at input to make hold time zero (not an issue once registered at IOB, as tcq > th)
• Programmable pull-up, pull down. Hold, slew rate
• PAR tool may move some of the input/output registers to IOB
Kuruvilla Varghese
33
656565Virtex IOB
• Various IO standards
– LVTTL
– LVCMOS33, LVCMOS25
– LVCMOS18, LVCOMS15, LVCMOS12 …
– PCI33, PCI66
– …
• Some IO standards require a Reference voltage for Inputs
• Banks of I/O pins support some of the IO standards
Kuruvilla Varghese
666666Week keeper (Hold)
• Hold circuit hold the previous state of the bus, but provides a weak drive so that it could be driven to ‘0’ or ‘1’.
• This avoids unnecessary switching of inputs by noise, if the bus would have been left in high impedance.
Kuruvilla Varghese
Bus
34
676767Detailed View
Kuruvilla Varghese
CLB
SB
CLB
SB
CLB
SB
CLB CLB CLB
686868Virtex Routing
Kuruvilla Varghese Source: Xilinx Data Sheets
35
696969Virtex Routing
• Direct connection to adjacent CLB
• 24 single length lines (per GRM in each direction
• 72 buffered Hex lines (per 6th GRM in each direction)
• 12 buffered long lines (horizontal & vertical)
• 4 tri-state lines (horizontal & vertical)
Kuruvilla Varghese
707070Bus Lines
Kuruvilla Varghese
• For Busing and Multiplexing it is better to use tri-state gates than
multiplexers
Source: Xilinx Data Sheets
36
717171Fitting Example: FSM
• FSM, with 2 inputs, 3 states, and 2 Mealy outputs. How
many CLBs to fit in?
– State Variables: 2 flip-flops (3 states)
– NSL: 2 state variables + 2 inputs = 4 inputs
– OL: 2 Inputs + 2 state variables = 4 inputs
– 2 LUTs for NSL
– 2 FFs for state variables,
– 2 LUTs for OL
– This requires 1 CLB minus two FFs In fact if output is registered still it
can be accommodated in one CLB
Kuruvilla Varghese
727272CLBs, FSM
Kuruvilla Varghese Source: Xilinx Data Sheets
FFNSL OL
37
737373Fitting Example: Counter
• 8 bit up counter with parallel load feature
– State Variables: 8 Flip-flops
– Incrementer uses carry chain
– NSL: 1 state variables + load + 1 din = 3 inputs per state
variable
– NSL requires 8 LUTs
– This requires 2 CLBs ( 4 Slices)
Kuruvilla Varghese
747474CLB, Counter
Kuruvilla Varghese
FF+1
38
757575Signal Paths in CLB
library ieee;
use ieee.std_logic_1164.all;
entity test is
port (a, b, c, d, e, f, g, h: in std_logic; z: out std_logic);
end entity test;
architecture arch_test of test is
begin
Kuruvilla Varghese
767676Signal Paths in CLB
process (a, b)
begin
if (a = '1') then z <= '0';
elsif (b'event and b = '1') then
if (c = '1') then
z <= (d and e and f and g) xor h;
end if;
end if;
end process;
end arch_test;
Kuruvilla Varghese
39
Kuruvilla Varghese
ab
c
z
d
e
f
g
d
e
f
g
h
787878Virtex DPRAM
Kuruvilla Varghese Source: Xilinx Data Sheets
40
797979Virtex DPRAM
• True Dual port Memory
• Each port can be read/write, read or write
• Synchronous reads and writes
• Can be combined for larger widths and depths
• Instantiated through Core Generator Tool
• Conflict on simultaneous read/write to a location, read
data could be wrong
• Can be initialized in VHDL code
Kuruvilla Varghese
808080Metastability
Kuruvilla Varghese
D Q
CLK
CLK
D
Q
ts th
tco
ts: Setup time: Minimum time input
must be valid before the active clock
edge
th: Hold time: Minimum time input
must be valid after the active clock
edge
tco: Propagation delay for input to
appear at the output from active clock
edge
41
818181Minimum Clock period
Kuruvilla Varghese
D Q
CLK
D Q
CLKComb
clk
Data path
tclk > tco + tcomb + tsetup
tco(min) + tcomb(min) > th(max)
Here we are considering the data path from first flip-flop to the next. We
Are estimating the minimum clock period for proper latching of data on to
second flip-flop
828282Minimum Clock period
Kuruvilla Varghese
• Sequential Circuit / FSM
tclk > tco + tcomb + tsetup
tco(min) + tcomb(min) > th(max)
CombD
CK Q
AR
NSPS
Outputs
Inputs
Clock
Reset
42
838383Clock skew
• Previous analysis assumes that the clock reaches at flip flops at
the same time, it is not practically true, as the wire delay and
buffer delay gets added.
• This creates relative delays between pair of flip flops or registers
• For analysis it is important to consider the clock skew between
flip-flops/registers where there is a data path between them.
• Clock Skew:
– Difference in arrival time of the clock at the flip flops
Kuruvilla Varghese
848484Max Path and Min Path
Kuruvilla Varghese
CHIP
clock
Max Path
Min Path
43
858585Clock Skew: Max path
Kuruvilla Varghese
D Q
CLK1
D Q
CLK2
Comb
clk
tclk – tskew > tcomax +
tcombmax + tsetup
tclk > tcomax + tcombmax +
tsetup + tskew
slack =
tclk – (tcomax + tcombmax + tsetup
+ tskew)
tclk
CLK1tskew
CLK2
ts
tco tcomb
slack
868686Clock Skew: Max path
• Analysis for data path from first flip-flop to next
• We assume tco + tcomb is greater than the hold time of flip-flop
• Hence, when a clock edge comes to both the flip-flops, new data
from first flip-flop arrives at the second flip-flop after the clock
edge, even after the hold time and won’t get latched in second
flip-flop
• But, we estimate the clock period such that when the next clock
edge comes to second flip-flop, data from the first flip-flop due
to current clock edge get latched in the second flip-flop
Kuruvilla Varghese
44
878787Clock Skew: Max path
• Since, the clock to the second flip-flop is skewed or comes early
compared to first, clock period has to accommodate this skew,
requiring a larger clock period than the case where there would
have been no skew
Kuruvilla Varghese
888888Clock Skew: Min path
Kuruvilla Varghese
D Q
CLK1
D Q
CLK2
Comb
clk
• Same edge
tcomin + tcombmin >
tskewmax + thold
• Next edge
tclk > tco + tcomb +
tsetup - tskew
CLK1
tclk
CLK2
tskew
th
tco tcomb
tskew
45
898989Clock Skew: Min path
• Here, an analysis like the case in max path (i.e. from one clock edge at first
flip-flop to next clock edge on second flip-flop) would result is a smaller
clock period, as the clock edge arrives late on second flip-flop
• But, now the real danger is the data from first flip-flop due to current edge,
appearing in the hold time window of the current edge at the second flip-
flop
• If that happens, solution is only to add extra delay to the data path between
these flip-flops, or route the clock in opposite direction
• Practically, this can happen in shift registers as there may not be
combinational delay between flip-flops
Kuruvilla Varghese
909090Clock routing
• Requirement
– Minimum relative delay between any 2 flip-flops, at least between flip
flops where there is a datapath
• Solution
– Balance the number of buffers and approximate the length of wire from
clock input to the flip-flops
– H Clock Tree
Kuruvilla Varghese
46
919191Virtex Clock Tree
Kuruvilla Varghese Source: Xilinx Data Sheets
929292DLL
Kuruvilla Varghese
CLKI CLKO
CLKFB
CLKIN CLKOUT
CLKIN
CLKOUT
tskew tadd
DLL delays CLKOUT by
“tadd” that clock edges of
both CLKIN and CLKOUT
matches
47
939393DLL / PLL
• In a DLL, input clock is delayed for de-skew
• In a PLL, a VCO synthesizes a clock synchronous to the input clock
• DLL adjusts the phase of the input clock.
• PLL synthesizes the clock of same phase and frequency as that of the input clock.
• PLL has the problem of working with a limited range of frequencies, but in FPGAs clock frequency may not change in most cases.
• PLL also cleans up the input jitters.
• Xilinx Virtex 5 has PLL blocks in addition to DLL in DCM.
Kuruvilla Varghese
949494Current FPGAs
• PLL
• Digital Clock Manager (DCM)
– DLL for de-skewing
– Phase shifter
– Frequency multiplication / division
• Clock Buffers, Muxes (Glitchless)
• All these can be connected in clock path
– Clock pins, Clock tree
Kuruvilla Varghese
48
959595Special Resources Usage
• Resources
– Buffers
– DLL / PLL
– Block RAMs
– DSP Blocks
• Usage
– Vendor library components
– Inferred by synthesis tool, when possible
– VHDL attributes with code
Kuruvilla Varghese
969696Virtex Configuration
• JTAG: Prototyping (PC Board)
• Master Serial:
– Configuring from a Serial PROM
– Embedded boards
• Slave Serial
– Works as a slave to master FPGA connected to a serial PROM
• SelectMAP
– 8 /16 bit wide synchronous slave configuration of FPGA
– Suitable for FPGA Interfaces to a CPU
Kuruvilla Varghese
49
979797Virtex Configuration: Serial PROM
Kuruvilla Varghese Source: Xilinx Data Sheets
989898Serial Configuration
• Multiple FPGAs are configured from a single serial (Flash) PROM.
• Master FPGA supplies clock to PROM and slave FPGAs
• Master and slave FPGAs are daisy chained.
• After power on or after PROGRAM request, all FPGAs configuration memory is
cleared.
• Init phase synchronization is done through INIT I/O pin
• Master FPGA programs first sending out ‘1’s on DOUT and slave FPGA waits.
• Once master FPGA is configured it sends configuration stream for first slave and
so on.
• DONE synchronization is done through open drain output DONE, to form wired
AND operation
Kuruvilla Varghese
50
999999SelectMAP Scheme
Kuruvilla Varghese Source: Xilinx Data Sheets
100100100SelectMAP Configuration: Timing
Kuruvilla Varghese Source: Xilinx Data Sheets
51
101101101FPGA Controls while configuring
• While FPGA is being configured, its internal state is not defined and pins levels are also not defined.
• Xilinx FPGA has two internal signals to keep the FPGA state sane during and after configuration.
• GTS: This signal drives all FPGA outputs to tri-state
• GSR: This signal goes to all flip flop set/reset and keeps all flip-flops set or reset as reset state specified.
• Once FPGA is configured, these signals are released.
• Use separate user resets, for normal reset operation.
Kuruvilla Varghese
102102102Spartan 6: Configuration
Kuruvilla Varghese
• Boundary Scan / JTAG / TAP / IEEE 1149.1
– Single Device, Chain
• Master Serial (Chain, Ganged) (SPI: x1, X2, X4)
• Slave Serial (SPI: x1, X2, X4)
• Master SelectMAP (x8, x16)
– Single Device, Chain, Ganged
• Slave SelectMAP (x8, x16)
52
103103103Spartan 6: Bit Stream encryption
• Bit steam is AES encrypted with 256 bit key using BitGen
tool
• Encryption key is programmed in to FPGA device through
JTAG for decryption.
• Once programmed FPGA can be configured for no read back
• Configuration also can’t be read back.
• AES key can be permanently fused in FPGA, Or in an
SRAM with external battery backup
Kuruvilla Varghese
104104104Spartan 6: Bit Stream compression
• Bit steam can be compressed when there are lot of resources
unused
• Less memory for storage
• Less configuration time
Kuruvilla Varghese
53
105105105Spartan 6: Multi Boot
• Multiple Configuration Images in Program Flash
• At least, one Main configuration and one fallback/golden
configuration
• During configuration, if CRC error of bit steam occurs, or
sync word detection is timed out (WDT), configuration
tries fall back configuration
• Supported in SPI (x1, x2, x4) and BPI Modes
Kuruvilla Varghese
106106106Spartan 6: DSP Slices
• Slices to support DSP computations
• 18 bit 2’s complement pre-adder
• 18 x 18 bit Multiplier, 36 bit result
• Result is sign extended to 48 bit
• 48 bit 2’s complement adder/subtracter
Kuruvilla Varghese
54
107107107Spartan 6: DSP48A1Slice
Kuruvilla Varghese Source: Xilinx Data Sheets
108108108Debug: Internal Signal Probing
• Probing the internal signals in FPGA for debug.
• Signal Probe / Logic Analysis
• Use a Signal Capture IP
• Interface this IP to the JTAG port
• PC based software to configure signal capture IP and display the signal waveforms
• Xilinx: ChipScope Pro
• Altera: Signal Probe
Kuruvilla Varghese
55
109109109Xilinx ChipScope Pro
Kuruvilla Varghese Source: Xilinx Data Sheets
110110110Virtex Pins
Kuruvilla Varghese Source: Xilinx Data Sheets
56
111111111Virtex Pins
Kuruvilla Varghese Source: Xilinx Data Sheets
112112112One hot encoding
Kuruvilla Varghese
Next
State
Logic
D
CK Q
AR
Output
Logic
NS
PS
OutputsInputs
Clock
Reset
LogicD
CK Q
AR
NS
PS
Outputs
Inputs
Clock
Reset
tclk > tco + tlogic + tsetup
57
113113113One hot encoding
• e.g. FSM with 5 inputs, 18 states, and 6 outputs
• NSL: 5 + 5 = 10 inputs (worst case)
• For Virtex (Worst Case)
– Basic block: 4 input LUT
– 1 CLB � 6 input LUT
– 16 CLB’s for 10 input LUT
• NSL would be distributed increasing the delay bringing down the
clock frequency of FSM.
• Solution: one hot encoding, where each state is encoded using a
flip flop.
Kuruvilla Varghese
114114114One hot encoding
Dj = condi . Qi + condj . Qj
NSL: 5 + 2 inputs (Worst Case)
Kuruvilla Varghese
Si
Sj
condi
condj
58
115115115One-hot encoding Output logic
• Most Moore outputs are direct decode of a state or decode of
more than one state
• If output is a decode of a single state, then that state flip-flop
output is the output signal
• In case of multiple states produce an output, the output
signal is the logical OR of all those state flip-flops
• Thus, one-hot encoding reduces the output logic also, at the
cost of extra state flip-flops
Kuruvilla Varghese
116116116One hot encoding
• State encoding
– Sequential, gray, one-hot-one, one-hot-zero
• User defined attributes (state encoding)
– attribute state-encoding of type-name: type is value;
(sequential, gray, one-hot-one, one-hot-zero)
attribute state_encoding of statetype: type is gray;
– attribute enum_encoding of type-name: type is “string”;
attribute enum_encoding of statetype: type is “00 01 11 10”;
Kuruvilla Varghese
59
117117117One-hot one, One-hot zero
• One-hot one
00001
00010
00100
01000
10000
• One-hot zero (Almost one-
hot)
0000
0001
0010
0100
1000
• Easy to initialize (reset all flip-
flops
• Starting state is never revisited
Kuruvilla Varghese
118118118One hot encoding
• Explicit declaration of states
signal pr_state, nx_state: std_logic_vector(3 downto 0);
constant a: std_logic_vector(3 downto 0) := “0001”;
constant b: std_logic_vector(3 downto 0) := “0010”;
constant c: std_logic_vector(3 downto 0) := “0100”;
constant d: std_logic_vector(3 downto 0) := “1000”;
Kuruvilla Varghese
60
119119119Altera Stratix
• Two levels of interconnections
• SRAM based programmable connections
• Logic Array Block (10 LE’s)
• LUT as combinational Logic
• Flip-Flops with sync/async reset/preset
• RAM Block (SPRAM, DPRAM, FIFO)
• Low skew clock trees, PLL
• Carry, Cascade chains
• DSP Blocks (Multipliers, Shift Registers)
• I/O Blocks (Registered / Non-registered)
• Multiple I/O standards
• JTAG, Parallel, and Serial Configurations
Kuruvilla Varghese
120120120Altera Stratix
Kuruvilla Varghese
61
121121121Altera Stratix
Kuruvilla Varghese Source: Altera Data Sheets
122122122Actel 54SX-A
• Antifuse based programmable interconnections
• Simple Combinational and Registered cells
• Simple I/O Blocks
• Low skew Clock trees
• Muliple I/O standards
• Hardware probe pins
Kuruvilla Varghese
62
123123123Actel 54SX-A, C Cell
Kuruvilla Varghese Source: Actel Data Sheets
124124124Actel 54SX-A, R Cell
Kuruvilla Varghese Source: Actel Data Sheets
63
125125125Actel 54SX-A
Kuruvilla Varghese Source: Actel Data Sheets
126126126Actel 54SX-A Routing
Kuruvilla Varghese Source: Actel Data Sheets
64
127127127Actel 54SX-A Probe
Kuruvilla Varghese Source: Actel Data Sheets
128128128Actel ProASIC Plus
Kuruvilla Varghese Source: Actel Data Sheets
65
129129129ProASIC Plus, Logic Tile
Kuruvilla Varghese Source: Actel Data Sheets
130130130Latch / FF
Kuruvilla Varghese
1 0
clk
D
Q
D Q
C
D Q
C
D
CLK
Q
Latch with Mux
FF with Latches
66
131131131ProASIC Plus Routing
• Fast Connect
• Short Lines (1, 2, 4), Long Lines
• Clock Tree
• Pad Ring (Pin Locking)
• SRAM Blocks
• Programming Tech: Flash
• Non-volatile
Kuruvilla Varghese
132132132CPLD vs FPGA
Features CPLD FPGA
Logic AND-OR Mux / LUT / Gates
Register to Logic
ratio
Small Large
Timing Simple Complex
Architecture
Variation
Small Large
Programming
Technology
Flash SRAM, Anti-Fuse, Flash
Capacity 10 K 2 M LUT + RAM
Kuruvilla Varghese
67
133133133Static Timing Analysis (STA)
• Timing simulation: simulates the real time operation of the circuit, with timing models of blocks for the specified test vectors
• Time consuming for exhaustive simulation
• Static Timing Analysis, analyzes various path delay from Block and wire delays
• Can make mistake as it is not aware of the real time behavior of the circuit (inputs, FSM/Controller behavior)
• A path that is never used in circuit operation may be reported (False paths)
• Registers which are not enabled every clock cycle may be reported (Multi-cycle paths)
Kuruvilla Varghese
134134134STA: Sequential Circuit
• Register to register path decides the clock frequency. But, if other 2 exceeds one need to
choose the maximum value as the minimum clock period.
• In real life, this is not a great concern many a time we are designing some IPs which goes
inside the chip interfaced to other blocks close by. Even in case inputs are outputs are
brought to external pins, proper placement should take care of these delays.
Kuruvilla Varghese
D Q
CK
D Q
CK
Comb
CLK
OutputInput
Register to Register Path
Clock to setup Clock to outputSetup to
clock
68
135135135Static Timing Analysis: Sequential Circuit
• Clock to Setup: Register to register path with longest delay
– Clock to Setup on destination clock <clk_signal>
• Clock to Pad: FF output delay - from FF output to chip
output pin
– Clock <clk_signal> to Pad
• Setup to Clock: Setup / Hold time of FF with respect to
input pin/pad
– Setup/Hold to clock <clk_signal>
Kuruvilla Varghese
136136136Static Timing Analysis
• Take Maximum of the three to find the maximum clock
frequency for timing simulation
• But, the actual throughput is given by Clock to Setup:
(Register to register path with longest delay)
• In most cases, the Clock to Pad of a module is not of
consequence, as these output when used in top level module
goes as inputs to the nearby module.
Kuruvilla Varghese
69
137137137False Paths
• Improbable Paths
• Static Paths (e.g. Input Registers)
• Paths between clock domains
Kuruvilla Varghese
138138138Multi-cycle path
Kuruvilla Varghese
Clock Enable CE2 comes 3 clock cycles after CE1
D Q
CE
CK
D Q
CE
CK
clk
Comb
CE1 CE2
70
139139139Critical Path
Kuruvilla Varghese
FF1
D Q
CE
CK
D Q
CE
CK
clk
C1
CE1CE2
FF2
C2
Critical path delay = tCO + tC1 + tC2 + tS
140140140Constraint driven PAR
• Constraint editor
• I/O constraints
– I/O locations
– I/O standards (LVTTL, PCI66-3, LVDS ..)
– Drive strength (current)
– Slew rate
– I/O termination (pull up, pull down, hold)
– Input delay
Kuruvilla Varghese
71
141141141Timing constraints
• Global– Clock period, pad to setup, clock to pad
• Per port– pad to setup, clock to pad
• Per group (by net and clock)– Pad to setup, Clock to pad
– FROM – TO, FROM – THRU – TO
• False Paths
• Multi-cycle paths
Kuruvilla Varghese