Lizheng ZhangLizheng ZhangYuHenYuHen HuHu
Charlie ChenCharlie Chen
Wave Pipelined Global InterconnectWave Pipelined Global Interconnect
OutlineOutline
! Background and Motivation ! Wave-pipelined global interconnect! Design Challenges in a wave-pipelined global interconnect
! Performance Evaluation! Conclusion
Global Interconnect MicroGlobal Interconnect Micro--ArchitectureArchitecture
! Conventional Interconnect Designï Single clock cycle delay:ï Buffer insertion
! High-speed Interconnect Design (GHz range)ï Flip Flop/Latch insertionï Wave Pipelining
MotivationMotivation
! Promises of Wave-pipelining Global Interconnectï High Speed
"Flip flop has setup time and CLK-Q delay overhead
ï Low Power Consumption"No flip flop in the middle of the interconnect
ï Minimum Error Rate"Cascaded flip flop stages is sensitive to noises
ï No Need for Globally Synchronized Clock"With phase lock loop based receiver
OutlineOutline
! Background and Motivation! Wave-pipelined global interconnect! Design Challenges in a wave-pipelined global interconnect
! Performance Evaluation! Conclusion
WaveWave--Pipelined ComputationPipelined Computation
! Data take multiple clocks to propagate from FFI to FFO! Each clock cycle FFI issues a set of data and FFO catches a set of results
! Multiple sets of data are propagating through the logic block simultaneously
! The timing constraints are very stringentï Consecutive data sets should not overlap
Clocking OutputClocking Output
! Multiple data paths from input to output! Logic Depth becomes larger
ï difference between Tmax and Tmin becomes larger! Limits the number of logic levels that can be wave pipelinedï Shaded area overlaps
Internal NodesInternal Nodes
! Multiple-input gates! All inputs of any gate have to arrive at the same timeï Very difficult to achieve for large logic blocks
WaveWave--Pipelined InterconnectPipelined Interconnect
! Only Uniform Buffer Insertionï No Multiple Signal Propagation Pathsï No Multiple Input Gates
! Except noise, No intrinsic time constraints
OutlineOutline
! Motivation and Background ! Wave-pipelined global interconnect! Design Challenges in a wave-pipelined global interconnect
! Performance Evaluation! Conclusion
Challenges for Wave Pipelined Interconnect Challenges for Wave Pipelined Interconnect
! Interconnect delay uncertainty caused by noises:ï Process variationï Thermal Noise, Supply Voltage fluctuation, Coupling noise
! Large number of wave pipelined logic levelsï Global Interconnect: long wire
! Synchronization is needed in the Receiverï Delay uncertainty cumulatesï Smaller clock cycle to get high throughput
! Dynamically change the phase of the receiver clockï Phase lock loop
! Challenge for On-Chip Global Interconnectï Fully digital to be easily integratedï High speed, low power and area overhead
Phase Lock LoopPhase Lock Loop
! Din is Sampled by CLKS at rising edge! Falling edge of CLKS is dynamically aligned with transitions in Din
! CLKS will be significantly skewed/jittered from the expected receiving clock CLKRï FIFO re-timer is used to transfer data between clock domains
Alexander Phase DetectorAlexander Phase Detector
Challenges! High Speed makes it difficult for FF1 to register the output of FFN
! Intentional clock skewï FFN & FF2 use same clockï FF0 & FF1 use delayed clockï S2 is also delayed before it gets into XOR
Loop FilterLoop Filter
! Next phase adjustment cannot happen before the current one takes effectï Guarantee Feedback loop stability
! Digital Counterï Shifting Register: High Speed
FourFour--Phase VCOPhase VCO
! Four Phases Steps for CLKSï Reduce the power consumption: ï Generated from a digital phase generator (PD)
! Four Counter states(C1C0) are coded as Gray Codesï To eliminate the glitches when multiplexing between neighboring clock phases
ï Digital counter is also implemented as a shifting register: highspeed
Digital Phase Generator (PD)Digital Phase Generator (PD)
FIFO ReFIFO Re--TimerTimer
! Cyclic FIFO queue buffers off phase difference between sampling clock CLKS and receiving clock CLKR
! 4 Entries are sufficientï Maximum phase difference between CLKS and CLKR is 180o
PLL Layout PicturePLL Layout Picture
VCOVCO
PGPG PDPD LFLF
MUXMUX CounterCounter
0.180.18µµm Technologym Technology
Cadence ToolsCadence Tools
FIFO LayoutFIFO Layout
FIFOFIFOQUEUEQUEUE
Deq
ueue
Deq
ueue
Poin
ter
Poin
ter
Enq
ueue
Enq
ueue
Poin
ter
Poin
ter
0.180.18µµm Technologym Technology
Cadence ToolsCadence Tools
Simulation WaveformsSimulation Waveforms
0.180.18µµm Technologym Technology
Cadence ToolsCadence Tools
UpUp
DownDown
DDoutout
DDinin
OutlineOutline
! Motivation and Background ! Wave-pipelined global interconnect! Design Challenges in a wave-pipelined global interconnect
! Performance Evaluation! Conclusion
Performance AnalysisPerformance Analysis
! When will wave pipelined interconnect system generate a bit error?ï Signals Propagating through buffered wire are subjecting to noises"Waveform Distortion
ï Receiver can only tolerate the waveform distortion within some limit "Maximum Tolerable Waveform Distortion
Waveform DistortionWaveform Distortion
! Received waveform could have larger or smaller pulse width than the Transmitted waveform
! Distortion Rate =
Data TransmittedData Transmitted
Data ReceivedData Received
ind
indoutd
WWW
,
,, −
Noise ToleranceNoise Tolerance
! Current data transition is located within Lock Region! Next data transition has to be within Sample Region to correctly register current data
! Maximum Tolerable Waveform Distortion 25% :ï 0.75CLK < Wd < 1.25CLK
Monte Carlo Simulation (1) Monte Carlo Simulation (1)
! The probability to have Waveform Distortion larger than 25%ï Once per 5 days
! Bit Error Rateï Once per 5 days
Waveform distortion Waveform distortion of Wave Pipelining Wireof Wave Pipelining Wire
(1.4mm/segment, 2GHz)(1.4mm/segment, 2GHz)
Flip Flop PipeliningFlip Flop Pipelining
! One flip flop pipelined stage has the same wire length as one wave pipelined wire segment
! Clock cycle TCLK > Tstage + CLKï Tstage = Tprop + Tdrive + Twire + Tacpt + Tsetupï CLK is uncontrollable clock skew and jitter
Monte Carlo Simulation (2) Monte Carlo Simulation (2)
260 280 300 320 340 360 3800
2
4
6
8
10
12
77OCm=319.6pss =26.4ps
Delay [ps]240 260 280 300 320 340 360
0
2
4
6
8
10
12
27OCm=298.3pss =24.1ps
Delay [ps]
To achieve same bit errorrate: 1 bit error per 5 days! Maximum clock rate of DFF pipelining: 1.7GHz
! Wave pipelining works with 2.0GHz clockï 17% speed up
DFF Pipelining Stage DelayDFF Pipelining Stage Delay
Power and Area ComparisonPower and Area Comparison
! Short Interconnectï DFF Pipelining is more power/area efficient
! Long Interconnectï Wave pipelining is more power/area efficient
Layout AreaLayout AreaPower Power
ConsumptionConsumption
ConclusionConclusion
! Short Interconnectï DFF Pipelining is more power efficient
! Long Interconnectï Wave pipelining is more power efficient
! Same reliability requirementï Wave pipelining has higher performance
! No globally synchronous clock is needed
Top Related