Chapter9 Intro FPGA

download Chapter9 Intro FPGA

of 62

Transcript of Chapter9 Intro FPGA

  • 8/2/2019 Chapter9 Intro FPGA

    1/62

    Introduction to FPGATechnology, Devices and Tools

  • 8/2/2019 Chapter9 Intro FPGA

    2/62

    FPGA Devices & Technology

  • 8/2/2019 Chapter9 Intro FPGA

    3/62

    World of Integrated Circuits

    Full-Custom

    ASICs

    Semi-Custom

    ASICs

    User

    Programmable

    PLD FPGA

  • 8/2/2019 Chapter9 Intro FPGA

    4/62

    designs must be sent

    for expensive and timeconsuming fabricationin semiconductor foundry

    ASIC

    ApplicationSpecificIntegratedCircuit

    FPGA

    FieldProgrammableGateArray

    designed all the way

    from behavioral descriptionto physical layout

    Small development

    overheadNo NRE (non-recurringengineering) costs

    Quick time to market

    No minimum quantityorder

    Reprogrammable

  • 8/2/2019 Chapter9 Intro FPGA

    5/62

    How can we make aprogrammable logic?

    One time programmable

    Fuses (destroy internal links with current)

    Anti-fuses (grow internal links) PROM

    Reprogrammable

    EPROM EEPROM

    Flash

    SRAM - volatile

  • 8/2/2019 Chapter9 Intro FPGA

    6/62

    BlockRAMs

    BlockRAMs

    Configurable

    LogicBlocks

    I/OBlocks

    What is an FPGA?

    Block

    RAMs

  • 8/2/2019 Chapter9 Intro FPGA

    7/62

    Which Way to Go?

    Off-the-shelf

    Low development cost

    Short time to market

    Reconfigurability

    High performance

    ASICs FPGAs

    Low power

    Low cost inhigh volumes

  • 8/2/2019 Chapter9 Intro FPGA

    8/62

    Other FPGA Advantages

    Manufacturing cycle for ASIC is very costly,lengthy and engages lots of manpower

    Mistakes not detected at design time have largeimpact on development time and cost

    FPGAs are perfect for rapid prototyping of digitalcircuits

    Easy upgrades like in case of software Unique applications

    reconfigurable computing

  • 8/2/2019 Chapter9 Intro FPGA

    9/62

    Major FPGA Vendors

    SRAM-based FPGAs

    Xilinx, Inc.

    Altera Corp.

    Atmel

    Lattice Semiconductor

    Flash & antifuse FPGAs

    Actel Corp.

    Quick Logic Corp.

    Share over 60% of the market

  • 8/2/2019 Chapter9 Intro FPGA

    10/62

    XILINX

  • 8/2/2019 Chapter9 Intro FPGA

    11/62

    Xilinx

    Primary products: FPGAs and the associated CADsoftware

    Main headquarters in San Jose, CA

    Fabless* Semiconductor and Software Company UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}

    Seiko Epson (Japan)

    TSMC (Taiwan)

    ProgrammableLogic Devices ISE Alliance and Foundation

    Series Design Software

  • 8/2/2019 Chapter9 Intro FPGA

    12/62

  • 8/2/2019 Chapter9 Intro FPGA

    13/62

    Basic Spartan-II FPGA BlockDiagram

  • 8/2/2019 Chapter9 Intro FPGA

    14/62

    F5IN

    CINCLKCE

    COUT

    D Q

    CK

    S

    REC

    D Q

    CK

    REC

    O

    G4G3G2G1

    Look-Up

    Table

    Carry

    &

    Control

    Logic

    O

    YB

    Y

    F4F3F2F1

    XB

    X

    Look-Up

    Table

    BY

    SR

    S

    Carry

    &

    Control

    Logic

    SLICE

    COUT

    D Q

    CK

    S

    REC

    D Q

    CK

    REC

    O

    G4G3G2G1

    Look-Up

    Table

    Carry

    &

    Control

    Logic

    O

    YB

    Y

    F4F3F2F1

    XB

    X

    Look-Up

    Table

    F5IN

    BY

    SR

    S

    Carry

    &

    Control

    Logic

    CINCLKCE SLICE

    CLB Structure

    Each slice has 2 LUT-FF pairs with associated carry logic

    Two 3-state buffers (BUFT) associated with each CLB,accessible by all CLB outputs

  • 8/2/2019 Chapter9 Intro FPGA

    15/62

    CLB Slice Structure

    Each slice contains two sets of the

    following: Four-input LUT

    Any 4-input logic function,

    or 16-bit x 1 sync RAM

    or 16-bit shift register Carry & Control

    Fast arithmetic logic

    Multiplier logic

    Multiplexer logic

    Storage element

    Latch or flip-flop

    Set and reset

    True or inverted inputs

    Sync. or async. control

  • 8/2/2019 Chapter9 Intro FPGA

    16/62

    LUT (Look-Up Table)Functionality

    Look-Up tablesare primaryelements forlogic

    implementation Each LUT can

    implement anyfunction of 4

    inputs

    x1 x2 x3 x4

    y

    x1 x2

    y

    LUT

    x1x2x3x4

    y

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    0100010

    101001100

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    1111111

    111110000

    x1 x2 x3 x4

    y

    x1 x2 x3 x4

    y

    x1 x2

    y

    x1 x2

    y

    LUT

    x1x2x3x4

    y

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    0100010

    101001100

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    0100010

    101001100

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    1111111

    111110000

    0

    x10

    x2 x3 x40 0

    0 0 0 10 0 1 00 0 1 10 1 0 00 1 0 10 1 1 0

    0 1 1 11 0 0 01 0 0 11 0 1 01 0 1 11 1 0 01 1 0 11 1 1 01 1 1 1

    y

    1111111

    111110000

  • 8/2/2019 Chapter9 Intro FPGA

    17/62

    5-Input Functionsimplemented using two LUTs

    One CLB Slice can implement any function of 5 inputs

    Logic function is partitioned between two LUTs

    F5 multiplexer selects LUT

    A4

    A3

    A2

    A1WS DI

    D

    LUT

    ROMRAM

    1

    0

    F4

    F3F2

    F1

    A4

    A3A2

    A1

    WS DI

    D

    LUT

    ROM

    RAM

    F5

    GXOR

    G

    nBX

    BX

    1

    0

    BX

    X

    F5

    A4

    A3

    A2

    A1WS DI

    D

    LUT

    ROMRAM

    A4

    A3

    A2

    A1WS DI

    D

    LUT

    ROMRAM

    1

    0

    1

    0

    F4

    F3F2

    F1

    A4

    A3A2

    A1

    WS DI

    D

    LUT

    ROM

    RAM

    A4

    A3A2

    A1

    WS DI

    D

    LUT

    ROM

    RAM

    F5

    GXOR

    G

    F5

    GXOR

    G

    nBX

    BX

    1

    0

    nBX

    BX

    1

    0

    BX

    X

    F5

  • 8/2/2019 Chapter9 Intro FPGA

    18/62

    5-Input Functions implementedusing two LUTs

    LUTLUT

    X5 X4 X3 X2 X1 Y

    0 0 0 0 0 0

    0 0 0 0 1 1

    0 0 0 1 0 0

    0 0 0 1 1 0

    0 0 1 0 0 1

    0 0 1 0 1 1

    0 0 1 1 0 0

    0 0 1 1 1 0

    0 1 0 0 0 1

    0 1 0 0 1 0

    0 1 0 1 0 0

    0 1 0 1 1 1

    0 1 1 0 0 1

    0 1 1 0 1 1

    0 1 1 1 0 1

    0 1 1 1 1 1

    1 0 0 0 0 0

    1 0 0 0 1 0

    1 0 0 1 0 0

    1 0 0 1 1 0

    1 0 1 0 0 0

    1 0 1 0 1 0

    1 0 1 1 0 01 0 1 1 1 1

    1 1 0 0 0 0

    1 1 0 0 1 1

    1 1 0 1 0 0

    1 1 0 1 1 1

    1 1 1 0 0 0

    1 1 1 0 1 1

    1 1 1 1 0 0

    1 1 1 1 1 0

    LUTLUT

    OUT

  • 8/2/2019 Chapter9 Intro FPGA

    19/62

    CLB

    MUXF6

    Slice

    LUT

    LUT

    MUXF5

    Slice

    LUT

    LUT

    MUXF5

    Dedicated ExpansionMultiplexers

    MUXF5 combines 2 LUTs to create Any 5-input function (LUT5) Or selected functions up to 9 inputs Or 4x1 multiplexer

    MUXF6 combines 2 slices to form Any 6-input function (LUT6) Or selected functions up to 19 inputs 8x1 multiplexer

    Dedicated muxes are faster and more

    space efficient

  • 8/2/2019 Chapter9 Intro FPGA

    20/62

    RAM16X1S

    O

    DWE

    WCLK

    A0

    A1

    A2

    A3

    RAM32X1S

    O

    DWE

    WCLK

    A0A1A2A3A4

    RAM16X2S

    O1

    D0

    WE

    WCLKA0

    A1

    A2A3

    D1

    O0

    =

    =

    LUT

    LUT or

    LUT

    RAM16X1D

    SPO

    D

    WE

    WCLK

    A0

    A1

    A2

    A3

    DPRA0 DPO

    DPRA1

    DPRA2

    DPRA3

    or

    Distributed RAM

    CLB LUT configurable asDistributed RAM

    A LUT equals 16x1 RAM

    Implements Single andDual-Ports

    Cascade LUTs to increaseRAM size

    Synchronous write

    Synchronous/Asynchronousread

    Accompanying flip-flops usedfor synchronous read

  • 8/2/2019 Chapter9 Intro FPGA

    21/62

  • 8/2/2019 Chapter9 Intro FPGA

    22/62

    Shift Register

    Register-rich FPGA Allows for addition of pipeline stages to increase

    throughput

    Data paths must be balanced to keep desiredfunctionality

    64

    Operation A

    4 Cycles 8 Cycles

    Operation B

    3 Cycles

    Operation C

    64

    12 Cycles

    3 Cycles

    9-Cycle imbalance

  • 8/2/2019 Chapter9 Intro FPGA

    23/62

    COUT

    D Q

    CK

    S

    REC

    D Q

    CK

    REC

    O

    G4G3G2G1

    Look-Up

    TableCarry

    &

    Control

    Logic

    O

    YB

    Y

    F4F3F2F1

    XB

    X

    Look-Up

    Table

    F5IN

    BY

    SR

    S

    Carry

    &

    Control

    Logic

    CINCLKCE

    SLICE

    Carry & Control Logic

  • 8/2/2019 Chapter9 Intro FPGA

    24/62

    Each CLB contains separatelogic and routing for the fastgeneration of sum & carrysignals Increases efficiency and

    performance of adders,subtractors, accumulators,comparators, and counters

    Carry logic is independent ofnormal logic and routingresources

    Fast Carry Logic

    LSB

    MSB

    CarryLogic

    Routing

  • 8/2/2019 Chapter9 Intro FPGA

    25/62

    Accessing Carry Logic

    All major synthesis tools can infer carrylogic for arithmetic functions Addition (SUM

  • 8/2/2019 Chapter9 Intro FPGA

    26/62

    Block RAM

    Spartan-IITrue Dual-Port

    Block RAM

    PortA

    P

    ortB

    Block RAM

    Most efficient memory implementation

    Dedicated blocks of memory

    Ideal for most memory requirements

    4 to 14 memory blocks

    4096 bits per blocks

    Use multiple blocks for larger memories

    Builds both single and true dual-port RAMs

  • 8/2/2019 Chapter9 Intro FPGA

    27/62

    Dual Port Block RAM

  • 8/2/2019 Chapter9 Intro FPGA

    28/62

    RAMB4_S4_S16

    Port A Out

    4-Bit Width

    Port B In

    256-Bit Depth

    Port A In

    1K-Bit Depth

    Port B Out

    16-Bit Width

    DOA[3:0]

    DOB[15:0]

    WEA

    ENA

    RSTA

    ADDRA[9:0]

    CLKA

    DIA[3:0]

    WEB

    ENB

    RSTB

    ADDRB[7:0]

    CLKB

    DIB[15:0]

    Dual-Port Bus Flexibility

    Each port can be configured with a different data buswidth

    Provides easy data width conversion without anyadditional logic

    T I d d t

  • 8/2/2019 Chapter9 Intro FPGA

    29/62

    VCC, ADDR[10:0]

    GND, ADDR[10:0]

    RAMB4_S1_S1

    Port B Out1-Bit Width

    DOA[0]

    DOB[0]

    WEAENA

    RSTA

    ADDRA[10:0]

    CLKA

    DIA[0]

    WEB

    ENB

    RSTB

    ADDRB[10:0]

    CLKB

    DIB[0]

    Port B In

    2K-Bit Depth

    Port A Out

    1-Bit Width

    Port A In2K-Bit Depth

    Two IndependentSingle-Port RAMs

    To access the lower RAM

    Tie the MSB address bit toLogic Low

    To access the upper RAM Tie the MSB address bit to

    Logic High

    Added advantage of True Dual-Port

    No wasted RAM Bits Can split a Dual-Port 4K RAM into

    two Single-Port 2K RAM Simultaneous independent access to

    each RAM

  • 8/2/2019 Chapter9 Intro FPGA

    30/62

    I/O Banking

    B i I/O Bl k S

  • 8/2/2019 Chapter9 Intro FPGA

    31/62

    Basic I/O Block Structure

    D

    EC

    Q

    SR

    DEC

    Q

    SR

    DEC

    Q

    SR

    Three-StateControl

    Output Path

    Input Path

    Three-State

    Output

    Clock

    Set/Reset

    Direct Input

    RegisteredInput

    FF Enable

    FF Enable

    FF Enable

  • 8/2/2019 Chapter9 Intro FPGA

    32/62

    IOB Functionality

    IOB provides interface between the packagepins and CLBs

    Each IOB can work as uni- or bi-directionalI/O

    Outputs can be forced into High Impedance

    Inputs and outputs can be registered

    advised for high-performance I/O

    Inputs can be delayed

  • 8/2/2019 Chapter9 Intro FPGA

    33/62

    Routing Resources

    PSM PSM

    CLB

    PSM PSM

    CLB CLB

    CLBCLB CLB

    CLBCLB CLB

    ProgrammableSwitchMatrix

  • 8/2/2019 Chapter9 Intro FPGA

    34/62

    Clock Distribution

  • 8/2/2019 Chapter9 Intro FPGA

    35/62

    FPGA Nomenclature

  • 8/2/2019 Chapter9 Intro FPGA

    36/62

    ALTERA

  • 8/2/2019 Chapter9 Intro FPGA

    37/62

    Device Families & Tools

    L i El FLEX K

  • 8/2/2019 Chapter9 Intro FPGA

    38/62

    Logic Element: FLEX10K

    L i A Bl k FLEX 0K

  • 8/2/2019 Chapter9 Intro FPGA

    39/62

    Logic Array Block: FLEX10K

    FLEX10K A hi

  • 8/2/2019 Chapter9 Intro FPGA

    40/62

    FLEX10K Architecture

    S i A hi

  • 8/2/2019 Chapter9 Intro FPGA

    41/62

    Stratix Architecture

    St ti D i F il

  • 8/2/2019 Chapter9 Intro FPGA

    42/62

    Stratix Device Family

    Feature EP1S10 EP1S20 EP1S25 EP1S30 EP1S40 EP1S60 EP1S80 EP1S120

    Logic Elements (LEs) 10,570 18,460 25,660 32,470 41,250 57,120 79,040 114,140

    M512 RAM Blocks( 512 Bits + Parity)

    94 194 224 295 384 574 767 1,118

    M4K RAM Blocks(4 Kbits + Parity)

    60 82 138 171 183 292 364 520

    M512 RAM Blocks(512 Kbits + Parity)

    1 2 2 4 4 6 9 12

    Total RAM bits 920,448 1,669,248 1,944,576 3,317,184 3,423,744 5,215,104 7,427,520 10,118,016

    DSP Blocks 6 10 10 12 14 18 22 28

    Embedded Multipliers 48 80 80 96 112 144 176 224

    PLLS 6 6 6 10 12 12 12 12

    Maximum User I/O Pins 426 586 706 726 822 1,022 1,238 1,314

    Engineering SampleAvailability

    NowUse

    ProductionUse

    ProductionN/A Now N/A Now 2003

    ProductionDevice Availability

    March2003

    Now Now NowMarch2003

    April2003

    January2003

    2003

    FPGA T h l R d

  • 8/2/2019 Chapter9 Intro FPGA

    43/62

    FPGA Technology Roadmap

    year 1995 1996 1997 2000 2003 2004 ?

    Technology 0.6 0.35 0.25 0.18 0.13 0.07

    Gate count 25K 100K 250K 1 M

    100K LC*

    8Mb RAM

    400 18X18multipliers

    Transistorcount

    3.5M 12M 23M 75M 430M 1B

    *note: Xilinx Virtex-II ProXC2VP100 (9/16/2003)

  • 8/2/2019 Chapter9 Intro FPGA

    44/62

    Advance architecture onmodern FPGAs

  • 8/2/2019 Chapter9 Intro FPGA

    45/62

    More guts

    Additional components

    RAM blocks

    Dedicated multipliers

    Tri-state buffers

    Transceivers

    Processor cores

    DSP blocks

    D di t A ith ti Bl k

  • 8/2/2019 Chapter9 Intro FPGA

    46/62

    Dedicate Arithmetic Blocks

    Altera

    Xilinx

    QuickLogic

    P C

  • 8/2/2019 Chapter9 Intro FPGA

    47/62

    Processor Cores

    P PC V t II P

  • 8/2/2019 Chapter9 Intro FPGA

    48/62

    PowerPC on Vertex II Pro

    Embedded 300+ MHz Harvard Architecture Core

    Low Power Consumption: 0.9 mW/MHz Five-Stage Data Path Pipeline Hardware Multiply/Divide Unit Thirty-Two 32-bit General Purpose Registers

    16 KB Two-Way Set-Associative Instruction Cache 16 KB Two-Way Set-Associative Data Cache Memory Management Unit (MMU)

    - 64-entry unified Translation Look-aside Buffers (TLB)- Variable page sizes (1 KB to 16 MB)

    Dedicated On-Chip Memory (OCM) Interface Supports IBM CoreConnect Bus Architecture Debug and Trace Support Timer Facilities

    ARM in Excalibur

  • 8/2/2019 Chapter9 Intro FPGA

    49/62

    ARM in Excalibur

    Industry-standard ARM922T 32-bit RISC processor core

    operating up to 200MHzARMv4T instruction set with Thumb extensions

    Memory management unit (MMU) included for real-time operatingsystems (RTOS) support

    Harvard cache architecture with 64-way set associative separate 8-Kbyte instruction and 8-Kbyte data caches

    Embedded programmable on-chip peripherals

    ETM9 embedded trace module to assistant software debugging

    Flexible interrupt controller

    Universal asynchronous receiver/transmitter (UART)

    General-purpose timer

    Watchdog timer

  • 8/2/2019 Chapter9 Intro FPGA

    50/62

    FPGA Tools

  • 8/2/2019 Chapter9 Intro FPGA

    51/62

    Design process (1)Design and implement a simple unit permitting to

    speed up encryption with RC5-similar cipher with

    fixed key set on 8031 microcontroller. Unlike in

    the experiment 5, this time your unit has to be able

    to perform an encryption algorithm by itself,

    executing 32 rounds..

    LibraryIEEE;

    use ieee.std_logic_1164.all;

    use ieee.std_logic_unsigned.all;

    entity RC5_core is

    port(clock, reset, encr_decr: in std_logic;

    data_input: in std_logic_vector(31downto0);

    data_output: out std_logic_vector(31downto0);

    out_full: in std_logic;

    key_input: in std_logic_vector(31downto0);

    key_read: out std_logic;

    );

    end AES_core;

    Specification (Lab Experiments)

    VHDL description (Your Source Files)

    Functional simulation

    Post-synthesis simulationSynthesis

  • 8/2/2019 Chapter9 Intro FPGA

    52/62

    Design process (2)

    Implementation

    Configuration

    Timing simulation

    On chip testing

    Active HDL

  • 8/2/2019 Chapter9 Intro FPGA

    53/62

    Active-HDL

  • 8/2/2019 Chapter9 Intro FPGA

    54/62

    Simulation Tools

    Synthesis Tools

    L i S th i

  • 8/2/2019 Chapter9 Intro FPGA

    55/62

    architecture MLU_DATAFLOW of MLU is

    signal A1:STD_LOGIC;signal B1:STD_LOGIC;signal Y1:STD_LOGIC;signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC;

    begin

    A1

  • 8/2/2019 Chapter9 Intro FPGA

    56/62

    Features of synthesis tools

    Interpret RTL code

    Produce synthesized circuit netlist in astandard EDIF format

    Give preliminary performance estimates

    Some can display circuit schematicscorresponding to EDIF netlist

    Implementation

  • 8/2/2019 Chapter9 Intro FPGA

    57/62

    Implementation

    After synthesis the entire implementationprocess is performed by FPGA vendor tools

    Xilinx ISE foundation 6.2i

    Altera Quartus II 4.0

    3rd party tools for alliance version

    Circuit Compilation

  • 8/2/2019 Chapter9 Intro FPGA

    58/62

    Circuit Compilation

    LUT

    LUT

    ?

    Assign a logicalLUT to a physicallocation.

    Select wire segmentsAnd switches forInterconnection.

    1. Technology Mapping

    2. Placement

    3. Routing

    Routing Example

  • 8/2/2019 Chapter9 Intro FPGA

    59/62

    Routing Example

    Programmable Connections

    FPGA

    Static Timing Analyzer

  • 8/2/2019 Chapter9 Intro FPGA

    60/62

    Static Timing Analyzer

    Performs static analysis of the circuitperformance

    Reports critical paths with all sources of

    delays

    Determines maximum clock frequency

    Static Timing Analysis

  • 8/2/2019 Chapter9 Intro FPGA

    61/62

    Static Timing Analysis

    Critical Path The Longest Path From

    Outputs of Registers to Inputs of Registers

    D Qin

    clk

    D Qout

    tP logic

    tCritical = tP FF + tPlogic + tS FF

    Min. Clock Period = Length of The Critical Path

    Max. Clock Frequency = 1 / Min. Clock Period

    Configuration

  • 8/2/2019 Chapter9 Intro FPGA

    62/62

    Configuration

    Once a design is implemented, you must

    create a file that the FPGA can understand This file is called a bit stream: a BIT file (.bit

    extension)

    The BIT file can be downloaded directly tothe FPGA, or can be converted into a PROMfile which stores the programming information