Synthesis Challenges for Next- Generation High-Performance...
Transcript of Synthesis Challenges for Next- Generation High-Performance...
Synthesis Challenges for Next-Generation High-Performance and
High-Density PLDs
Synthesis Challenges for Next-Generation High-Performance and
High-Density PLDs
Jason CongDepartment of Computer Science
University of California,Los Angeles, USA
Jason CongDepartment of Computer Science
University of California,Los Angeles, USA
Songjie XuAplus Design Technologies, Inc.
Los Angeles, USA
Songjie XuAplus Design Technologies, Inc.
Los Angeles, USA
Slide 2
OutlineOutline
u Introduction
u Synthesis Challenges for New Architectures
u Synthesis Challenges for High Density and High Performance
u Concluding Remarks
u Introductionu Introduction
Slide 3
PLD Industry GrowthPLD Industry Growthu Enjoyed the exponential growth as the rest of the
semiconductor industry
u With an even faster rate
Introduction
27.78%
36.07%
24.50%
15.71%
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
An
nu
al G
row
th R
ate
(199
4-19
98)
Company/Industry
Semiconductor Industry Altera Intel LSI Logic
Slide 4
DefinitionsDefinitions
u PLD (Programmable Logic Device)z CPLD (Complex PLD)
z Extensions of early PALz Consist of PLA-like blocksz Macrocell
z FPGA (Field Programmable Gate Array)z Typically based on look-up tables (LUTs)z Multiple LUTs form a programmable logic block (PLB)
Introduction
Slide 5
CPLDCPLDu Example: Altera MAX 7000
Introduction
Slide 6
MacrocellMacrocellu Example: Altera MAX 7000
z Each macrocell has a logic array, a product-term select matrix, and a programmable register
Introduction
Slide 7
DefinitionsDefinitions
u PLD (Programmable Logic Device)z CPLD (Complex PLD)
z Extensions of early PALz Consist of PLA-like blocksz Macrocell
z FPGA (Field Programmable Gate Array)z Typically based on look-up tables (LUTs)z Multiple LUTs form a programmable logic block (PLB)
Introduction
Slide 8
FPGAFPGAu Example: Xilinx XC 4000
Introduction
Slide 9
PLBPLBu Xilinx XC 4000
z Each PLB has two 4-LUTs, one 3-LUT and 2 FFs
Introduction
Slide 10
Advance of PLD ArchitecturesAdvance of PLD Architectures
Introduction
1980’s 1998/1999
Altera MAX 5000:32-192 P-terms600-3,750 usablegates
APEX 20K:51,840 Logic elements (LUTs)442,368 RAM bits3,456 P-term macrocells60,000-1.5M usable gates
Xilinx XC 2000:64-100 LUTs1,200-1,800 logicgates
Virtex:58K-4M system gates1Mb distributed RAM832Kb embedded memory
1980’s 1998/1999
Altera MAX 5000:32-192 P-terms600-3,750 usablegates
APEX 20K:51,840 Logic elements (LUTs)442,368 RAM bits3,456 P-term macrocells60,000-1.5M usable gates
Xilinx XC 2000:64-100 LUTs1,200-1,800 logicgates
Virtex:58K-4M system gates1Mb distributed RAM832Kb embedded memory
Slide 11
PLD Synthesis Tends to Fall Behind ...PLD Synthesis Tends to Fall Behind ...
u Additional features and capabilities in the new architecture often place new requirements for synthesis tools
u Higher density and higher performance demand better scalability and more efficient optimization
u Devil is always in the software …z Tool effort is often being underestimated
z Quick customization from ASIC or existing PLD synthesis tool leads to considerably inferior results
z Software is often the bottleneck of new PLD product release ...
Introduction
Slide 12
Challenges to PLD SynthesisChallenges to PLD Synthesis
u Support for new PLD architecturesz Hierarchical architectures
z Heterogeneous architectures
u Support for high-performance and high-density PLD designsz Layout-driven synthesis
z Incremental synthesis
z IP-based synthesis
Introduction
Slide 13
OutlineOutline
u Introduction
u Synthesis Challenges for New Architectures
u Synthesis Challenges for High Density and High Performance
u Concluding Remarks
u Synthesis Challenges for New Architectures
u Synthesis Challenges for New Architectures
Slide 14
PLD Architecture DevelopmentPLD Architecture Development
u Two important trendsz Hierarchical architectures
z Heterogeneous architectures
u Synthesis needs
Synthesis Challenges for New Architectures
Slide 15
PLD Architecture Development Trend ……Hierarchical ArchitecturesPLD Architecture Development Trend ……Hierarchical Architectures
u Basic Ideaz Group of basic logic blocks into clusters
z Fast local programmable interconnects inside clusters
z May have multiple levels of hierarchy
u Benefitsz Exploit the inherent locality of interconnections
in most applications
z Lead to the improvement in both performance and density
Synthesis Challenges for New Architectures
Slide 16
Example Hierarchical ArchitecturesExample Hierarchical Architecturesu Altera FLEX 10K
z Each LAB has 8 LEsz Each LE has a 4-LUT and a programmable register
Synthesis Challenges for New Architectures
Slide 17
Two Types of ClustersTwo Types of Clusters
u Hard-wired connection based cluster (HCC)z Intra-cluster connection is formed by hard wires
z e.g. CLB in XC4000
u Programmable interconnection based cluster (PIC)z Intra-cluster connection is formed by a local
programmable interconnection array
z e.g. LAB in FLEX 10K and APEX 20K
Synthesis Challenges for New Architectures
Slide 18
Existing Synthesis Results for HCCExisting Synthesis Results for HCC
u Traditional approach
z Map into LUTs and then combine the LUTs to form HCCs in a heuristic post-processing step
u Recent advance [Cong & Hwang, FPGA’97]z Use Boolean matching techniques to completely
characterize the set of functions that can be implemented in a HCC
z Map a netlist directly into HCCs
Synthesis Challenges for New Architectures
Slide 19
Hard-Wired Connection Based Clusters (HCCs)Hard-Wired Connection Based Clusters (HCCs)
u Example: Xilinx XC 4000 CLBz Each CLB has two 4-LUTs connected to a 3-LUT
Synthesis Challenges for New Architectures
Slide 20
u Characterization based on functional
decomposition
z f (X) = H ( F (X1) , G (X2) ),
z f(X) = H ( F (X1) , G (X2) , x ),
z f(X) = H (F(X1,x), G(X2), x ),
z f(X) = H (F(X1,x), G(X2,x), x ).
u Conditionsz F and G input sizes ≤ 4
u Result: matched all “difficult examples” (over 1,700) from Xilinx
z Best known tool produced only about 70% match
XC4K CLB
G
F
Hxf(X)
Example: Boolean Matching for XC4K CLBExample: Boolean Matching for XC4K CLB
Synthesis Challenges for New Architectures
Slide 21
Example: Mapping to XC4K CLBExample: Mapping to XC4K CLB
o Given a function f(0,1,2,3,4,5) where
a = 1’ + 3, b = 1 + 3
f = 0’245b’ + 0’245’b + 0’145b + 012’5’a + 0’2’4’5a + 025b + 0’2’5’a’ + 045a’ + 05’b’
o How many XC4K CLBs are needed to
implement f(0,1,2,3,4,5) ?
Synthesis Challenges for New Architectures
Slide 22
Mapping Packing #CLBs #LevelsChortle-crf simple 9 4FlowMap simple 8 3FlowMap functional 6 3Boolean 1 1
G
F
H
31
20
5
4
The Boolean matching result
Example: Mapping to XC4K CLB (Cont’d)Example: Mapping to XC4K CLB (Cont’d)
Synthesis Challenges for New Architectures
Slide 23
Programmable Interconnection Based Cluster (PIC)Programmable Interconnection Based Cluster (PIC)
u Example: Altera APEX 20Kz Each LAB has 10 LEs (LUT + FF) connected
through a fully programmable matrix
Synthesis Challenges for New Architectures
Slide 24
Existing Synthesis Results for PICExisting Synthesis Results for PIC
u Common approachesz Map into basic logic blocks and then group the
them into clusters under size and pin constraints
z Recent progress on circuit clusteringz Performance driven clustering for combinational
circuits [Lawler’69] [Yang & Wong, T-CAD’97]z Simultaneous clustering with retiming for sequential
circuits [Pan, et al, T-CAD’98][Cong, et al, DAC’99]
Synthesis Challenges for New Architectures
Slide 25
Benefits of Considering Retiming during ClusteringBenefits of Considering Retiming during Clustering
u Proper clustering allows retiming to hide inter-cluster delays
(E.g., assume gate_delay = 1, inter_cluster_delay = 2)
Φ=8
retiming cannot help
Φ=6
retimingreduces delay
same cutsize
Φ=8
Clustering A
Φ=8
Clustering B
Slide 26
Major Challenge in Synthesis for Hierarchical ArchitecturesMajor Challenge in Synthesis for Hierarchical Architectures
u Can we synthesize a design directly into a multi-level hierarchical architecture?z Most existing PLD synthesis algorithms
transform a given design into a flat netlist of basic PLBs and then go through a separate clustering/partitioning step.
z Very few consider synthesizing directly for hierarchical architectures
Synthesis Challenges for New Architectures
Slide 27
PLD Architecture Development Trend ……Heterogeneous ArchitecturesPLD Architecture Development Trend ……Heterogeneous Architectures
u Three types of heterogeneous architecturesz Type 1: Multiple sizes and/or configurations of
the same type of logic blocksz e.g. ORCA 2C, VF1, XC4000
z Type 2: Multiple types of logic blocksz LUTs, macrocells, and MUXesz e.g. APEX 20K
z Type 3: Different kinds of resources on the same chip
z Programmable logic blocksz Embedded memory blocks (EMBs)z Embedded processors
Synthesis Challenges for New Architectures
Slide 28
Type 1 Heterogeneous ArchitecturesType 1 Heterogeneous Architectures
u Example: Xilinx XC 4000z Each CLB can implement two 4-LUTs or one 5-LUT
Synthesis Challenges for New Architectures
Slide 29
Synthesis Results for Type 1 Heterogeneous ArchitecturesSynthesis Results for Type 1 Heterogeneous Architectures
u Area minimizationz [He & Rose, FPGA’94]z [Korupolu, et al, DAC’98]z [Cong, Ding & Wu, FPGA’99]
u Delay minimizationz HeteroMap [Cong & Xu, DAC’98]
z Delay optimal polynomial-time algorithm
u Evaluation results showz Heterogeneous architectures are superior to
homogeneous ones for both area and delayz “One size fits all” doesn’t produce best results.
Synthesis Challenges for New Architectures
0
0.5
1
1.5
2
2.5
Mapping-Delay MemoryCell-Area
3-LUT-FPGA
4-LUT-FPGA
5-LUT-FPGA
6-LUT-FPGA
3-4-5-6-LUT-HeteroFPGA
Delay(3-LUT) : Delay(4-LUT) : Delay(5-LUT) : Delay(6-LUT) = 1 : 1.3 : 1.7 : 2Area(3-LUT) : Area(4-LUT) : Area(5-LUT) : Area(6-LUT) = 1 : 2 : 4 : 8
Architecture Evaluation—Homogeneous vs. Heterogeneous FPGAsArchitecture Evaluation—Homogeneous vs. Heterogeneous FPGAs
Synthesis Challenges for New Architectures
Delay(3-LUT) : Delay(4-LUT) : Delay(5-LUT) : Delay(6-LUT) = 1 : 1.3 : 1.7 : 2Area(3-LUT) : Area(4-LUT) : Area(5-LUT) : Area(6-LUT) = 1 : r : r2 : r3
0
50000
100000
150000
200000
250000
300000
350000
1 1.25 1.5 1.75 2 2.25 2.5 2.75 3r
Are
a x
Del
ay x
Del
ay
3-LUT 4-LUT 5-LUT 6-LUT 3-4-5-6-LUT
0
50000
100000
150000
200000
250000
300000
350000
1 1.25 1.5 1.75 2 2.25 2.5 2.75 3r
Are
a x
Del
ay x
Del
ay
3-LUT 4-LUT 5-LUT 6-LUT 3-4-5-6-LUT
“AT2-Metric” for Homogeneous and Heterogeneous FPGAs“AT2-Metric” for Homogeneous and Heterogeneous FPGAs
Synthesis Challenges for New Architectures
Slide 32
Type 2 Heterogeneous ArchitecturesType 2 Heterogeneous Architectures
u An example: Altera APEX 20Kz Embedded system blocks (ESB) can implement dual-
port RAM, ROM, FIFO, CAM blocks, and P-term logicz In P-term mode, each ESB has 16 macrocells
z Each macrocell has two P-terms
Synthesis Challenges for New Architectures
Slide 33
Synthesis for Type 2 Heterogeneous ArchitecturesSynthesis for Type 2 Heterogeneous Architectures
u Very little work
u Preliminary study for a hybrid architecture of LUTs and Pterm blocks [Kaviani, Ph.D. thesis’99] z Use a greedy approach for hybrid mapping
z Use LUTs for density optimizationz Use Pterm blocks for performance optimization
Synthesis Challenges for New Architectures
Slide 34
Type 3 Heterogeneous ArchitecturesType 3 Heterogeneous Architecturesu An example: FLEX 10K (logic array + embedded
memory blocks (EMBs))z 576 to 12,160 LEsz 3 to 20 embedded array blocks (EABs)
z Each EAB has 2K bits (11x1, 10x2, 9x4, 8x8)
Synthesis Challenges for New Architectures
Slide 35
Field-Programmable System-on-a-Chip (FPSOC)Field-Programmable System-on-a-Chip (FPSOC)
processor
memory
ProgrammableLogic
General-Purpose FPSOC
processor
memoryProgrammableLogic
ASIC
Application-specific FPSOC
Synthesis Challenges for New Architectures
Slide 36
Synthesis for Type 3 Heterogeneous ArchitecturesSynthesis for Type 3 Heterogeneous Architectures
u Explore logic implementation using EMBsz Area minimization
z EMB_Pack [Cong & Xu, FPGA’98]z With Delay constraint
z SMAP [Wilton, FPGA’98]
z Delay minimizationz [Cong & Xu, ICCAD’98]
u The general synthesis problem for FPSOC is largely untouched
Synthesis Challenges for New Architectures
Slide 37
Synthesis Needs for FP-SOCSynthesis Needs for FP-SOCu Partition the design/application to
heterogeneous resources. E.g.z Software/hardware partitioningz Memory/logic partitioning
u Efficient use of each type of resources. E.g.z Code generation for embedded CPUsz Automatic synthesis for FPGA
u Scheduling & synchronization of various components. E.g.z Real-time O/S
u Trade-off between heterogeneous resourcesu Support for IP integration
Synthesis Challenges for New Architectures
Slide 38
OutlineOutline
u Introduction
u Synthesis Challenges for New Architectures
u Synthesis Challenges for High Density and High Performance
u Concluding Remarks
u Synthesis Challenges for High Density and High Performance
u Synthesis Challenges for High Density and High Performance
Slide 39
Important Synthesis ProblemsImportant Synthesis Problems
u Layout-driven synthesis
u Incremental synthesis
u IP-based design
Synthesis Challenges for High Density and High Performance
Slide 40
Layout-Driven SynthesisLayout-Driven Synthesis
u Scaling of IC feature size [NTRS’97]
z Interconnect delay becomes more and more dominant in the overall circuit delay
u FPGA designz Interconnect delay has always been very significant (due
to programmable switches)
u Layout design has a significant impact on performance
u Synthesis needs to consider impact on layout
Synthesis Challenges for High Density and High Performance
Slide 41
Logic v.s. Local Interconnect v.s.Global Interconnect DelayLogic v.s. Local Interconnect v.s.Global Interconnect Delay
Delay Resource Delay Value (ns)Logic Element (LE) 2.4Local Inerconnect 0.5Row Interconnect 4.7Column Interconnect 7.2
Altera FLEX8K part
Synthesis Challenges for High Density and High Performance
Slide 42
Delay DistributionDelay Distribution
Logic30%
Local Interconnect
9%
Global Interconnect
61%
Synthesis Challenges for High Density and High Performance
Slide 43
Challenges and Opportunities for Layout-Driven SynthesisChallenges and Opportunities for Layout-Driven Synthesis
u Challenges:z Interconnect design is not finalized until after placement
and routing
z Both synthesis and layout are highly complex. How to properly combine them without complexity explosion?
u Opportunities: substantial performance gain z Example: Mapping with consideration of fast
interconnections (cascade chains)
Synthesis Challenges for High Density and High Performance
Slide 44
Comparison between FlowMap and Fast Interconnection MappingComparison between FlowMap and Fast Interconnection Mapping
0
0.20.4
0.6
0.81
1.2
1.4
Mapping-Delay #4-LUT
FlowMap (K=4) Fast Interconnect Mapping
-34%
+24%
• Delay Assumption: 4-LUT fast pin delay = 0.7ns 4-LUT slow pin delay = 2.7 nsfast interconnect delay = 0.2 ns general interconnect delay = 4.1 ns
• LUT fast interconnect is connected to the fast pin
Slide 45
Comparison between FlowMap and Fast Interconnection Mapping (Cont’d)
Comparison between FlowMap and Fast Interconnection Mapping (Cont’d)
0
0.2
0.4
0.6
0.8
1
Mapping-Delay #4-LUT
FlowMap (K=4) Fast Interconnect Postprocessing
-21% +0%
• Delay Assumption: 4-LUT fast pin delay = 0.7ns 4-LUT slow pin delay = 2.7 nsfast interconnect delay = 0.2ns general interconnect delay = 4.1 ns
• LUT fast interconnect is connected to the fast pin
Slide 46
Incremental SynthesisIncremental Synthesis
u Motivationz The PLD designs are getting more complex
z All design process is iterative/incremental
z Resynthesizing the entire large design is not acceptable with consideration of multiple design iterations
z The highly incremental design process requires fast incremental synthesis capabilities
Synthesis Challenges for High Density and High Performance
Slide 47
Requirements on Incremental SynthesisRequirements on Incremental Synthesis
u Preservabilityz Preserve as much information as possible from
the existing synthesis solution
u Efficiencyz A faster synthesis system will enable more
design iterations and shorten the overall design time
u Quality of the synthesis solutionz Delay, area, etc. should be as close as possible
to that by complete re-synthesis
Synthesis Challenges for High Density and High Performance
Slide 48
Status on Incremental SynthesisStatus on Incremental Synthesis
u Very few worksz ECO [Kukimoto & Fujita, ICCAD’92]
z No structural change is allowedz Only functional change is allowed
z Incremental mapping [Cong & Hui, DAC’2000]z Preserve optimal mapping depthz Achieve over 300X speed-up for circuits of about
100,000 gates compared to re-mapping by FlowMap
u Much more work is needed in this area
Synthesis Challenges for High Density and High Performance
Slide 49
IP-Based DesignIP-Based Design
u Motivationz Design reuse to improve productivity
z Better performance and density
u Example:z Altera IP MegaStore
Synthesis Challenges for High Density and High Performance
Slide 50
Slide 51
Requirements on IP-Based DesignRequirements on IP-Based Design
u IP representation -- should allow migration between
z Different FPGA vendorsz Different FPGA generations
u Characterizationz functionalityz performance
u Interface with synthesis toolsz automatic inference/instantiationz optimization and constraint propagationz simulation and verification
u IP protectionz How to prevent un-authorized use?z E.g. Embed watermarks in FPGA mapping solutions
[Kirovski, et al, ICCAD’98]
Synthesis Challenges for High Density and High Performance
Slide 52
OutlineOutline
u Introduction
u Synthesis Challenges for New Architectures
u Synthesis Challenges for High Density and High Performance
u Concluding Remarksu Concluding Remarksu Concluding Remarks
Slide 53
Concluding RemarksConcluding Remarks
u PLD market is going through a rapid expansion
u PLD synthesis is facing many new challengesz Support for new PLD architectures
z Hierarchical architecturesz Heterogeneous architectures
z Support for high-performance and high-density PLD designs
z Layout-driven synthesisz Incremental synthesisz IP-based synthesis
u Many research and business opportunitiesz UCLA VLSI CAD Laboratory
z Aplus Design Technologies, Inc.
Concluding Remarks
Slide 54
PLD Synthesis Research at UCLAPLD Synthesis Research at UCLA
u Advanced synthesis algorithmsz Synthesis for heterogeneous architecturesz Synthesis for sequential circuits with simultaneous mapping,
retiming, and pipeliningz Layout-driven synthesisz IP-based synthesisz Synthesis/compilation techniques for FPSOC …z Software prototype: RASP system
u Architecture evaluationz Evaluation of PLB architecturez Evaluation of heterogeneous architecturesz Evaluation of hierarchical architectures …z Software prototype: fpgaEva tool
u URL: http://cadlab.cs.ucla.edu/~xfpga
Concluding Remarks
Slide 55
UCLA RASP Synthesis System for LUT-Based FPGAsUCLA RASP Synthesis System for LUT-Based FPGAs
EDIFnetlist
HDL design
Internal netlist
LUT MappingEngine
LUT netlistPLB Mapping
Engine
Vendor Specific netlistXilinx, Altera, ORCA
PlacementRouting
Chip ProgrammingInformation
Concluding Remarks
Slide 56
FPGA Architecture EvaluationFPGA Architecture Evaluation
Concluding Remarks
Slide 57
Aplus Design Technologies, Inc.Aplus Design Technologies, Inc.
u A new start-up in PLD synthesisz Based in Los Angeles (near UCLA)
u Objective: provide Advanced Programmable Logic Unified Solution (APLUS)
z Unify architecture and synthesis
z Unify synthesis and layout
u Products & Servicesz Next generation synthesis tool for high-density, high-
performance PLDs
z Architecture evaluation tool kits and services
u Has already established strategic partnership with several major PLD vendors
u URL: http://www.aplus-dt.com
Concluding Remarks
THANK YOU!
J. Cong and S. Xu
Slide 59
The Typical Design Flow Using LPMsThe Typical Design Flow Using LPMs
Synthesis Challenges for High Density and High Performance