commbook_chap3

59
On-Chip Communication On-Chip Communication Architectures Architectures Standards ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 3 1 © 2008 Sudeep Pasricha & Nikil Dutt

description

commbook

Transcript of commbook_chap3

  • On-Chip Communication ArchitecturesStandards

    ICS 295Sudeep Pasricha and Nikil DuttSlides based on book chapter 3* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • OutlineWhy Standards?On-chip standard bus architecturesAMBA 2.0/3.0IBM CoreConnectSTMicroelectronics STBusSonics Smart InterconnectSocket based on-chip bus interface standardsOCP-IP

    * 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • Why Standards?SoC components (IPs) have an interface to the outside world consisting of a set of pins responsible for sending/receiving addresses, data, control Number and functionality of pins must adhere to a specific interface standard Important for seamless integration of SoC IPs helps avoid integration mismatchese.g. 1 - connecting IP with 32 data pins to a 30 bit data bus e.g. 2 - connecting IP supporting data bursts to a bus with no burst supportMismatches require development of logic wrappers at IP interfaces to ensure correct data transferstime consuming to create, reduce performance, take up area* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • Why Standards?Interface standards define a specific data transfer protocoldecide number and functionality of pins at IP interfacesmake it easy to connect diverse IPs quicklyTwo categories of standards for SoC communication:Standard bus architecturesdefine interface between IPs and bus architecture define (at least some) specifics of bus architecture that implements data transfer protocolSocket based bus interface standardsdefine interface between IPs and bus architecturefreedom w.r.t choice and implementation of bus architectureIdeally, designers want one standard to interconnect all IPsIn reality, several competing standards have emerged* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • *Standard Bus ArchitecturesAMBA 2.0, 3.0 (ARM)CoreConnect (IBM)Sonics Smart Interconnect (Sonics)STBus (STMicroelectronics)Wishbone (Opencores)Avalon (Altera)PI Bus (OMI)MARBLE (Univ. of Manchester)CoreFrame (PalmChip)

    widely used

    2008 Sudeep Pasricha & Nikil Dutt

  • *Standard Bus ArchitecturesAMBA 2.0, 3.0 (ARM)CoreConnect (IBM)Sonics Smart Interconnect (Sonics)STBus (STMicroelectronics)Wishbone (Opencores)Avalon (Altera)PI Bus (OMI)MARBLE (Univ. of Manchester)CoreFrame (PalmChip)

    2008 Sudeep Pasricha & Nikil Dutt

  • AMBA 2.0* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB Basic Transfer* 2008 Sudeep Pasricha & Nikil DuttSplit ownership of Address and Data bus

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB Basic Transfer* 2008 Sudeep Pasricha & Nikil DuttData transfer with slave wait states

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB Pipelining* 2008 Sudeep Pasricha & Nikil DuttTransaction pipelining increases bus bandwidth

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB Architecture* 2008 Sudeep Pasricha & Nikil Duttcentralized arbitration / decode1 unidirectional address bus (HADDR)2 unidirectional data buses (HWDATA, HRDATA)

    At any time only 1 active data bus

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB Arbitration* 2008 Sudeep Pasricha & Nikil DuttHBREQ_M1HBREQ_M2HBREQ_M3ArbiterArbitration protocol is specified, but not the arbitration policy

    2008 Sudeep Pasricha & Nikil Dutt

  • Cost of Arbitration in AHB* 2008 Sudeep Pasricha & Nikil DuttTime for arbitrationTime for handshaking

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB Pipelined Burst Transfers* 2008 Sudeep Pasricha & Nikil DuttBursts cut down on arbitration, handshaking time, improving performance

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB Burst Types* 2008 Sudeep Pasricha & Nikil DuttFixed length burstsIncremental bursts access sequential locationse.g. 0x64, 0x68, 0x6C, 0x70 for INCR4, transferring 4 byte dataWrapping bursts wrap around address if starting address is not aligned to total no. of bytes in transfere.g. 0x64, 0x68, 0x6C, 0x60 for WRAP4, transferring 4 byte data

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB Control Signals* 2008 Sudeep Pasricha & Nikil DuttTransfer directionHWRITE write transfer when high, read transfer when lowTransfer sizeHSIZE[2:0] indicates the size of the transfer

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB Control Signals* 2008 Sudeep Pasricha & Nikil DuttProtection controlHPROT[3:0], provide additional information about a bus access

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB Split Transfers* 2008 Sudeep Pasricha & Nikil DuttImproves bus utilizationMay cause deadlocks if not carefully implemented

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB Bus Matrix TopologyIn addition to shared bus and hierarchical bus, AHB can be implemented as a bus matrix* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • APB State Diagram* 2008 Sudeep Pasricha & Nikil DuttWhen AHB wantsto drive a transferOne cycle penalty forAPB peripheral addressdecodingTransfer occurs hereno (multi-cycle) bursts, pipelined transfers

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB-APB Bridge* 2008 Sudeep Pasricha & Nikil DuttAHB signalsHigh performanceLow power (and performance)

    2008 Sudeep Pasricha & Nikil Dutt

  • AMBA 3.0Introduces AXI high performance protocolSupport for separate read address, write address, read data, write data, write response channelsOut of order (OO) transaction completionFixed mode burst supportUseful for I/O peripheralsAdvanced system cache supportSpecify if transaction is cacheable/bufferable Specify attributes such as write-back/write-throughEnhanced protection supportSecure/non-secure transaction specificationExclusive access (for semaphore operations)Register slice support for high frequency operation

    * 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB vs. AXI BurstAHB BurstAddress and Data are locked together (single pipeline stage) HREADY controls intervals of address and data

    * 2008 Sudeep Pasricha & Nikil DuttAXI BurstOne Address for entire burst

    2008 Sudeep Pasricha & Nikil Dutt

  • AHB vs. AXI Burst* 2008 Sudeep Pasricha & Nikil DuttAXI BurstSimultaneous read, write transactionsBetter bus utilization

    2008 Sudeep Pasricha & Nikil Dutt

  • AXI Out of Order CompletionWith AHBIf one slave is very slow, all data is held upSPLIT transactions provide very limited improvement

    * 2008 Sudeep Pasricha & Nikil DuttWith AXI BurstMultiple outstanding addresses, out of order (OO) completion allowedFast slaves may return data ahead of slow slaves

    2008 Sudeep Pasricha & Nikil Dutt

  • Register Slices for Max Frequency* 2008 Sudeep Pasricha & Nikil DuttRegister slices can be applied across any channel

    Allows maximum frequency of operation by matching channel latency to channel delay

    Allows system topology to be matched to performance requirements

    WREADYWIDWDATAWSTRBWLASTWVALID

    2008 Sudeep Pasricha & Nikil Dutt

  • Summary: AHB vs. AXI* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • *Standard Bus ArchitecturesAMBA 2.0, 3.0 (ARM)CoreConnect (IBM)Sonics Smart Interconnect (Sonics)STBus (STMicroelectronics)Wishbone (Opencores)Avalon (Altera)PI Bus (OMI)MARBLE (Univ. of Manchester)CoreFrame (PalmChip)

    2008 Sudeep Pasricha & Nikil Dutt

  • IBM CoreConnect* 2008 Sudeep Pasricha & Nikil DuttPLBPipelinedBurst modesSplit transactionsMultiple mastersOPBLow bandwidthBurst modeMultiple MastersDCRLow throughput1 r/w = 2 cyclesRing type data bus

    2008 Sudeep Pasricha & Nikil Dutt

  • Processor Local Bus (PLB)High performance synchronous busShared address, separate read and write data busesSupport for 32-bit address, 16, 32, 64, and 128-bit data bus widthsDynamic bus sizingbyte, half-word, word, and double-word transfersUp to 16 masters and any number of slavesANDOR implementation structureVariable or fixed length (16-64 byte) burst transfersPipelined transfersSPLIT transfer supportOverlapped read and write transfers (up to 2 transfers per cycle)Centralized arbiterLocked transfer support for atomic accesses* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • PLB Transfer Phases* 2008 Sudeep Pasricha & Nikil DuttAddress and data phases are decoupled

    2008 Sudeep Pasricha & Nikil Dutt

  • Overlapped PLB Transfers* 2008 Sudeep Pasricha & Nikil DuttPLB allows address and data buses to have different masters at the same time

    2008 Sudeep Pasricha & Nikil Dutt

  • PLB ArbiterBus Control Uniteach master drives a 2-bit signal that encodes 4 priority levelsin case of a tie, arbiter uses static or RR schemeTimerpre-empts long burst mastersensures high priority requests served with low latency* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • On-chip Peripheral Bus (OPB)Synchronous bus to connect low performance peripherals and reduce capacitive loading on PLBShared address bus, multiple data busesUp to a 64-bit address bus width32- or 64-bit read, write data bus width supportSupport for multiple mastersBus parking (or locking) for reduced transfer latencySequential address transfers (burst mode)Dynamic bus sizingbyte, half-word, word, double-word transfersMUX-based (or ANDOR) structural implementation.Single cycle data transfer between OPB masters and slaves.Timeout capability to guarantee low latency for high priority xfers* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • Device Control Register (DCR) Bus Low speed synchronous bus, used for on-chip device configuration purposesmeant to off-load the PLB from lower performance status and control read and write transfers10-bit, up to 32-bit address bus32-bit read and write data buses4-cycle minimum read or write transfersSlave bus timeout inhibit capabilityMulti-master arbitrationPrivileged and non-privileged transfersDaisy-chain (serial) or distributed-OR (parallel) bus topologies* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • *Standard Bus ArchitecturesAMBA 2.0, 3.0 (ARM)CoreConnect (IBM)Sonics Smart Interconnect (Sonics)STBus (STMicroelectronics)Wishbone (Opencores)Avalon (Altera)PI Bus (OMI)MARBLE (Univ. of Manchester)CoreFrame (PalmChip)

    2008 Sudeep Pasricha & Nikil Dutt

  • Sonics Smart InterconnectConsists of 3 synchronous bus-based interconnect specificationsSonicsMXhigh performance interconnect fabricSonicsLXhigh performance interconnect fabric, but with less advanced featuresSynapse 3220peripheral interconnect designed to connect slower peripheral components* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • SonicsMXHigh performance synchronous bus fabricPipelined, non-blocking, multi-threaded communication supportSplit/outstanding transactions for high performanceConfigurable data bus width: 32, 64, or 128 bitsSocket-based connection support, using native OCP 2.0 interfaceBandwidth and latency-based arbitration schemes to obtain desired quality of service (QoS) for threadsRegister points (RPs) for pipelining long interconnects and providing timing isolationProtection mode supportAdvanced error handling supportFine-grained power management support* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • SonicsMX TopologySonicsMX supports full crossbar, partial crossbar, and shared bus topology* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • SonicsMX ArbitrationWeighted QoSavailable bandwidth distributed among masters based on ratio of bandwidth weights configured for each masterPriority QoS extends bandwidth-based scheme above1-2 threads are assigned a static priority (guaranteed service)Other threads assigned bandwidth weights (best effort)Controlled QoSdynamically switches between three arbitration schemes based on traffic characteristicsStatic priority (guaranteed service)Bandwidth weighted scheme (best-effort)Guaranteed Bandwidth allocation (guaranteed service)* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • SonicsLX* 2008 Sudeep Pasricha & Nikil DuttHigh performance synchronous bus fabricsubset of SonicsMX feature setpipelined, multithreaded, non-blocking communication supportweighted and priority QoS modesSPLIT transactions

    2008 Sudeep Pasricha & Nikil Dutt

  • Synapse 3220Synchronous bus targeted at low bandwidth, physically dispersed peripheral slave cores* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • Synapse 3220 FeaturesUp to 4 masters and 63 slavesUp to 24-bit configurable address busConfigurable data bus widths8, 16, 32 bitsFair arbitration scheme, with high priority allowed for a single initiator threadPower management interfaceExclusive (semaphore) access supportError detection and recoverywatchdog timer to identify unresponsive peripheralsProtection mode support* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • *Standard Bus ArchitecturesAMBA 2.0, 3.0 (ARM)CoreConnect (IBM)Sonics Smart Interconnect (Sonics)STBus (STMicroelectronics)Wishbone (Opencores)Avalon (Altera)PI Bus (OMI)MARBLE (Univ. of Manchester)CoreFrame (PalmChip)

    2008 Sudeep Pasricha & Nikil Dutt

  • STBusConsists of 3 synchronous bus-based interconnect specificationsType 1Simplest protocol meant for peripheral accessType 2More complex protocolPipelined, SPLIT transactionsType 3Most advanced protocolOO transactions, transaction labeling/hints

    * 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • Type 1Simple handshake mechanism32-bit address busData bus sizes of 8, 16, 32, 64 bitsSimilar to IBM CoreConnect DCR bus* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • Type 2Supports all Type 1 functionalityPipelined transfersSPLIT transactionsData bus sizes up to 256 bitsCompound operationsREADMODWRITEReturns read data and locks slave till same master writes to locationSWAPExchanges data value between master and slaveFLUSH/PURGEEnsure coherence between local and main memoryUSERReserved for user defined operations * 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • Type 3Supports all Type 2 functionalityOO transaction completionRequires only single response/ACK for multiple data transfers (burst mode)* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • STBusAll types haveMUX-based implementationShared, partial or full crossbar implementation* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • STBus ArbitrationStatic priorityNon-preemptiveProgrammable priorityLatency basedEach master has register with max. allowed latency (in clock cycles)If value is 0, master must be granted bus access as soon as it requests itEach master also has counter loaded with max. latency value when master makes requestMaster counters are decremented at every subsequent cycleArbiter grants access to master with lowest counter valueIn case of a tie, static priority is used

    * 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • STBus ArbitrationBandwidth basedSimilar to TDMA/RR schemeSTBHybrid of latency based and programmable priority schemesIn normal mode, programmable priority scheme is usedMasters have max. latency registers, counters (latency based scheme)Each master also has an additional latency-counter-enable bitIf this bit is set, and counter value is 0, master is in panic stateIf one or more masters in panic state, programmable priority scheme is overridden, and panic state masters granted access Message basedPre-emptive static priority scheme

    * 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • Socket-based Interface StandardsDefines the interface of componentsDoes not define bus architecture implementationShield IP designer from knowledge of interconnection system, and enable same IP to be ported across different systemsRequires Adaptor components to interface with implementation

    * 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • Socket-based Interface StandardsMust be generic, comprehensive, and configurableto capture basic functionality and advanced features of a wide array of bus architecture implementationsAdaptor (or translational) logic componentMust be created only once for each implementation (e.g. AMBA) adds area, performance penalties, more design time+ enhances reuse, speeds up design time across many designsCommonly used socket-based interface standardsOpen Core Protocol (OCP) ver 2.0Most popular used in Sonics Smart InterconnectVSIA Virtual Component Interface (VCI)Subset of OCPDTLProprietary* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • OCP 2.0Point-to-point synchronous interfaceBus architecture independentConfigurable data flow (address, data, control) signals for area-efficient implementationConfigurable sideband signals to support additional communication requirementsPipelined transfer supportBurst transfer supportOO transaction completion supportMultiple threads* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • OCP 2.0 SignalsDataflowBasic signalsSimple extensionse.g. byte enables, data byte parity, error correction codes, etc.Burst extensionse.g. length, type (WRAP/INCR), pack/unpack, ACK requirements etc.Tag extensionsAssign IDs to transactions for reordering supportThread extensionsAssign IDs to threads for multi-threading supportSideband (optional)Not part of the dataflow processConvey control and status information such as reset, interrupt, error, and core-specific flagsTest (optional)add support for scan, clock control, and IEEE 1149.1 (JTAG)* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • OCP 2.0 Protocol HierarchyData flow signals combined into groups of request signals, response signals and data handshake signalsGroups map one-on-one to their corresponding protocol phases (request, response, handshaking)Different combinations of protocol phases are used by different types of transfers (e.g. single request/multiple data burst)Burst transactions are comprised of a set of transfers linked together having a defined address sequence and no. of transfers

    * 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • OCP 2.0 ProfilesOCP 2.0 specifies pre-defined configurations of interface called profilesconsist of OCP interface signals, specific protocol features, and application guidelinesTwo sets of profiles are providedProfiles for new IP cores implementing native OCP interfacesBlock data flowSequential undefined length data flow (streaming access)Register accessProfiles for designers of bridges between OCP & other bus protocolsSimple H-busX-bus packet writeX-bus packet read

    * 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • Example: SoC with Mixed Profiles* 2008 Sudeep Pasricha & Nikil Dutt

    2008 Sudeep Pasricha & Nikil Dutt

  • SummaryStandards important for seamless integration of SoC IPsavoid costly integration mismatches Two categories of standards for SoC communication:Standard bus architecturesdefine interface between IPs and bus architecture define (at least some) specifics of bus architecture that implements data transfer protocole.g. AMBA 2.0/3.0, Coreconnect, Sonics Smart Interconnect, STBusSocket based bus interface standardsdefine interface between IPs and bus architecturedo not define bus architecture implementation specificse.g. OCP 2.0Open Issue: Robust standards for DSM-aware communication 2008 Sudeep Pasricha & Nikil Dutt*

    2008 Sudeep Pasricha & Nikil Dutt