Word Power -- outer space & space exploration space exploration Space exploration.
Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize...
Transcript of Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize...
![Page 1: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/1.jpg)
Luigi Nardi, Ph.D.Stanford University
@Google, July 30, 2019
Design Space Exploration - and Its Application to Computer Hardware Engineering -
![Page 2: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/2.jpg)
Luigi Nardi, Ph.D. - Stanford
Main Papers Behind This Talk
2
1. Nardi, et al., "Practical Design Space Exploration", MASCOTS, 2019.
2. Koeplinger, et al., "Spatial: A Language and Compiler for Application Accelerators", PLDI, 2018.
3. Nardi, et al., "Algorithmic performance-accuracy trade-off in 3D vision applications using HyperMapper", iWAPT-IPDPS, 2017.
4. Saeedi, et al., "Application-oriented design space exploration for SLAM algorithms", ICRA, 2017.
5. Bodin, et al., "Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding", PACT, 2016.
![Page 3: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/3.jpg)
Luigi Nardi, Ph.D. - Stanford
Outline
3
1. Design Space Exploration– Motivation Example – Problem Setting – Important Features– Taxonomy
2.The HyperMapper Framework– Pareto and Hypervolumes– Prior Distribution– Pareto-based Active Learning
3.Experimental Results
![Page 4: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/4.jpg)
Luigi Nardi - Stanford
4
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
• Goal of Spatial [Koeplinger, et al.]: design of application accelerators• On reconfigurable architectures FPGAs and CGRAs• Spatial compiler lowers user programs into synthesizable Chisel [Bachrach, et al.] [Koeplinger, et al.] "Spatial: a language and compiler for application accelerators." PLDI 2018[Bachrach, et al.] "Chisel: constructing hardware in a Scala embedded language." DAC 2012.
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
![Page 5: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/5.jpg)
Luigi Nardi - Stanford
4
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
• Goal of Spatial [Koeplinger, et al.]: design of application accelerators• On reconfigurable architectures FPGAs and CGRAs• Spatial compiler lowers user programs into synthesizable Chisel [Bachrach, et al.] [Koeplinger, et al.] "Spatial: a language and compiler for application accelerators." PLDI 2018[Bachrach, et al.] "Chisel: constructing hardware in a Scala embedded language." DAC 2012.
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
Performs basic hardware optimizations and estimates a domain for each design parameter
![Page 6: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/6.jpg)
Luigi Nardi - Stanford
4
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
• Goal of Spatial [Koeplinger, et al.]: design of application accelerators• On reconfigurable architectures FPGAs and CGRAs• Spatial compiler lowers user programs into synthesizable Chisel [Bachrach, et al.] [Koeplinger, et al.] "Spatial: a language and compiler for application accelerators." PLDI 2018[Bachrach, et al.] "Chisel: constructing hardware in a Scala embedded language." DAC 2012.
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
Computes loop pipeline schedules and on-chip memory layouts for some given value for each parameter
![Page 7: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/7.jpg)
Luigi Nardi - Stanford
4
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
• Goal of Spatial [Koeplinger, et al.]: design of application accelerators• On reconfigurable architectures FPGAs and CGRAs• Spatial compiler lowers user programs into synthesizable Chisel [Bachrach, et al.] [Koeplinger, et al.] "Spatial: a language and compiler for application accelerators." PLDI 2018[Bachrach, et al.] "Chisel: constructing hardware in a Scala embedded language." DAC 2012.
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
Estimates the amount of hardware resources and the runtime of the application
![Page 8: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/8.jpg)
Luigi Nardi - Stanford
4
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
• Goal of Spatial [Koeplinger, et al.]: design of application accelerators• On reconfigurable architectures FPGAs and CGRAs• Spatial compiler lowers user programs into synthesizable Chisel [Bachrach, et al.] [Koeplinger, et al.] "Spatial: a language and compiler for application accelerators." PLDI 2018[Bachrach, et al.] "Chisel: constructing hardware in a Scala embedded language." DAC 2012.
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
Unrolls loops, retimes pipelines, and performs on-chip memory layout. The optimizations are computed based on the analyses of the previous phase
![Page 9: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/9.jpg)
Luigi Nardi - Stanford
4
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
• Goal of Spatial [Koeplinger, et al.]: design of application accelerators• On reconfigurable architectures FPGAs and CGRAs• Spatial compiler lowers user programs into synthesizable Chisel [Bachrach, et al.] [Koeplinger, et al.] "Spatial: a language and compiler for application accelerators." PLDI 2018[Bachrach, et al.] "Chisel: constructing hardware in a Scala embedded language." DAC 2012.
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
Generates a Chisel design which can be synthesized and run on the target FPGA
![Page 10: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/10.jpg)
Luigi Nardi - Stanford
4
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
• Goal of Spatial [Koeplinger, et al.]: design of application accelerators• On reconfigurable architectures FPGAs and CGRAs• Spatial compiler lowers user programs into synthesizable Chisel [Bachrach, et al.] [Koeplinger, et al.] "Spatial: a language and compiler for application accelerators." PLDI 2018[Bachrach, et al.] "Chisel: constructing hardware in a Scala embedded language." DAC 2012.
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
[Optional]DesignTuning
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
![Page 11: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/11.jpg)
Luigi Nardi - Stanford
5
TheSpatialCompilerSpatialIR
ControlScheduling
Mem.Banking/Buffering
AccessPatternAnalysis
ControlInference
PipelineUnrolling
PipelineRetiming
UserParameters
HyperMapper DSE
ModifiedParameters
HostResourceAllocation
ControlSignalInference
Chisel CodeGeneration
Area/RuntimeAnalysis
IntermediateRepresentation
DesignParameters
IRTransformation
IRAnalysis
CodeGeneration
Legend
DesignSpaceExploration
Two objectives: 1.Minimize clock cycles (runtime)2.Minimize FPGA logic utilization
• Useful for fitting multiple applications• Proxy for energy consumption
One constraint: design must fit in the chip
This talk
![Page 12: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/12.jpg)
Luigi Nardi, Ph.D. - Stanford
• Benchmark: SLAMBench 1.0 runtime response surface is: non-linear, multi-modal and non-smooth
6
Motivation - Mono-objectiveRu
ntim
e
Parameter 2 Parameter 1
![Page 13: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/13.jpg)
Luigi Nardi, Ph.D. - Stanford
7
Input space (a.k.a. search or design space)
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
Derivative-Free Optimization (DFO) 3-parameters and 2-objective Pictorial
Derivatives are unavailable: e.g., categorical variables
![Page 14: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/14.jpg)
Luigi Nardi, Ph.D. - Stanford
7
Input space (a.k.a. search or design space)
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
Derivative-Free Optimization (DFO) 3-parameters and 2-objective Pictorial
Output space (a.k.a. objective space)
Derivatives are unavailable: e.g., categorical variables
![Page 15: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/15.jpg)
Luigi Nardi, Ph.D. - Stanford
7
Input space (a.k.a. search or design space)
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
Derivative-Free Optimization (DFO) 3-parameters and 2-objective Pictorial
Output space (a.k.a. objective space)
Derivatives are unavailable: e.g., categorical variables
![Page 16: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/16.jpg)
Luigi Nardi, Ph.D. - Stanford
7
Input space (a.k.a. search or design space)
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
Derivative-Free Optimization (DFO) 3-parameters and 2-objective Pictorial
Output space (a.k.a. objective space)
Derivatives are unavailable: e.g., categorical variables
![Page 17: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/17.jpg)
Luigi Nardi, Ph.D. - Stanford
7
Input space (a.k.a. search or design space)
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
Derivative-Free Optimization (DFO) 3-parameters and 2-objective Pictorial
Output space (a.k.a. objective space)
Derivatives are unavailable: e.g., categorical variables
![Page 18: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/18.jpg)
Luigi Nardi, Ph.D. - Stanford
7
Input space (a.k.a. search or design space)
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
Derivative-Free Optimization (DFO) 3-parameters and 2-objective Pictorial
Output space (a.k.a. objective space)
Derivatives are unavailable: e.g., categorical variables
![Page 19: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/19.jpg)
Luigi Nardi, Ph.D. - Stanford
7
Input space (a.k.a. search or design space)
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
Min
Min
Derivative-Free Optimization (DFO) 3-parameters and 2-objective Pictorial
Output space (a.k.a. objective space)
Derivatives are unavailable: e.g., categorical variables
![Page 20: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/20.jpg)
Luigi Nardi, Ph.D. - Stanford
7
Input space (a.k.a. search or design space)
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
Min
Min
1
2
3
4
5
6
7
True False
1.5
6.3
Categorical
Real
Ordinal
Min
Min
Max
Max
Derivative-Free Optimization (DFO) 3-parameters and 2-objective Pictorial
Output space (a.k.a. objective space)
Derivatives are unavailable: e.g., categorical variables
![Page 21: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/21.jpg)
Luigi Nardi, Ph.D. - Stanford
8
Practical DSE: Important Features
1.Real, integer, ordinal and categorical variables (RIOC var.)
• Example: tile size is an ordinal, parallelism is a categorical
![Page 22: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/22.jpg)
Luigi Nardi, Ph.D. - Stanford
8
Practical DSE: Important Features
1.Real, integer, ordinal and categorical variables (RIOC var.)
• Example: tile size is an ordinal, parallelism is a categorical
2.User prior knowledge in the search (Prior)
• Example: # threads on a CPU with 4 cores? Gaussian-shaped around 4
![Page 23: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/23.jpg)
Luigi Nardi, Ph.D. - Stanford
8
Practical DSE: Important Features
1.Real, integer, ordinal and categorical variables (RIOC var.)
• Example: tile size is an ordinal, parallelism is a categorical
2.User prior knowledge in the search (Prior)
• Example: # threads on a CPU with 4 cores? Gaussian-shaped around 4
3.Unknown feasibility constraints (Constr.)
• Example: does a design fit in the FPGA?
![Page 24: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/24.jpg)
Luigi Nardi, Ph.D. - Stanford
8
Practical DSE: Important Features
1.Real, integer, ordinal and categorical variables (RIOC var.)
• Example: tile size is an ordinal, parallelism is a categorical
2.User prior knowledge in the search (Prior)
• Example: # threads on a CPU with 4 cores? Gaussian-shaped around 4
3.Unknown feasibility constraints (Constr.)
• Example: does a design fit in the FPGA?
4.Multi-objective optimization (Multi)
• Example: trade-off runtime and area
![Page 25: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/25.jpg)
Luigi Nardi, Ph.D. - Stanford
9
DFO Tools TaxonomyWe introduce a new framework dubbed HyperMapper
None of the tools available support all these DSE features
[Bodin, et al.] "Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding”, PACT, 2016.[Nardi, et al.] "Algorithmic performance-accuracy trade-off in 3D vision applications using HyperMapper”, IPDPS-iWAPT, 2017.[Nardi, et al., 2019] “Practical Design Space Exploration”, MASCOTS, 2019.
![Page 26: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/26.jpg)
Luigi Nardi, Ph.D. - Stanford
Outline
10
1. Design Space Exploration– Motivation Example – Problem Setting – Important Features– Taxonomy
2.The HyperMapper Framework– Pareto and Hypervolumes– Prior Distribution– Pareto-based Active Learning
3.Experimental Results
![Page 27: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/27.jpg)
Luigi Nardi, Ph.D. - Stanford
Multi-Objective Goal: Pareto Front
11
Pareto Front
Feasible Samples
Objective 1
Obj
ective
2 Obj
2 T
hres
hold
Obj 1 Threshold
Infeasible Samples
![Page 28: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/28.jpg)
Luigi Nardi, Ph.D. - Stanford
12
HyperVolume Indicator (HVI)
Objective 1
Obj
ective
2
Pareto Front 2
Reference PointPareto Front 1
y1
y3
y4
y2
HVI1
HVI2
• HVI is a hypervolume • with respect to a reference
• It is always a scalar • even with multiple objectives
• Used to compare 2 Paretos• The lower the better:HVItotal - HVIi
![Page 29: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/29.jpg)
Luigi Nardi, Ph.D. - Stanford
Injecting Prior Knowledge in the DSE
13
• Need a probability distribution that: 1.Has a finite domain2.Can flexibly model shapes including:
1. Bell-shape, 2. U-shape and 3. J-shape • Beta distribution - parameters alpha and beta• In addition, need of a discrete distribution for categorical variables
PDF given by:
for and ,
where is the Gamma function
↵,� > 0
�
x 2 [0, 1]
f(x|↵,�) = �(↵+ �)
�(↵)�(�)x↵�1(1� x)��1
![Page 30: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/30.jpg)
Luigi Nardi, Ph.D. - Stanford
Surrogate Model
14
• Predicts an outcome given an input x
![Page 31: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/31.jpg)
Luigi Nardi, Ph.D. - Stanford
Surrogate Model
14
• Predicts an outcome given an input x
• One surrogate model per objective
• Random forests (RF) for regression
![Page 32: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/32.jpg)
Luigi Nardi, Ph.D. - Stanford
Surrogate Model
14
• Predicts an outcome given an input x
• One surrogate model per objective
• Random forests (RF) for regression
• One surrogate model per feasibility constraint
• Random forests for classification
![Page 33: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/33.jpg)
Luigi Nardi, Ph.D. - Stanford
Surrogate Model
14
• Predicts an outcome given an input x
• One surrogate model per objective
• Random forests (RF) for regression
• One surrogate model per feasibility constraint
• Random forests for classification
• Intuition of why RF is a good model:
• Good at non-linearity, multi-modality and non-smoothness
![Page 34: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/34.jpg)
Luigi Nardi, Ph.D. - Stanford
Surrogate Model
14
• Predicts an outcome given an input x
• One surrogate model per objective
• Random forests (RF) for regression
• One surrogate model per feasibility constraint
• Random forests for classification
• Intuition of why RF is a good model:
• Good at non-linearity, multi-modality and non-smoothness
• Has not to be perfect (interested in optima not the best model)
• Coefficient of determination (R2) is the wrong indicator of success
![Page 35: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/35.jpg)
Luigi Nardi, Ph.D. - Stanford
15
The HyperMapper Framework
Prior
SearchSpace
Declaration[ ]
Obj 1 Constraint....
Obj q Constraint
Machine Learning
RunPredicted
Pareto
SurrogateFeasibility
Constraints
New Samples
Active Learning
[ ]
Pareto Front
Processing Output
Objective 1....
Objective n[ ] Predict
Feasible
Pareto
SurrogateObjectives
Warm-upSampling
Random orLatin Hypercube
Sampling
Input Warm-up
![Page 36: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/36.jpg)
Luigi Nardi, Ph.D. - Stanford
Outline
16
1. Design Space Exploration– Motivation Example – Problem Setting – Important Features– Taxonomy
2.The HyperMapper Framework– Pareto and Hypervolumes– Prior Distribution– Pareto-based Active Learning
3.Experimental Results
![Page 37: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/37.jpg)
Luigi Nardi, Ph.D. - Stanford
Spatial Search Spaces (Optimization Knobs)
17
InputCompiler automatically provides params:• Tile size (ordinal)• Inner and outer loop pipelining (ordinal)• Meta-pipe (categorical)• Unrolling factor (ordinal)• Memory banking (ordinal)• Parallelism (categorical)
OutputCompiler evaluation provides:• Clock cycles (runtime): objective 1• FPGA logic utilization: objective 2• Feasibility constraint:
• true if design fits in the chip
![Page 38: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/38.jpg)
Luigi Nardi, Ph.D. - Stanford
18
Feasibility Constraint Model Performance
Valid and invalid mean that a design does or does not fit in the FPGA
GDA Benchmark
![Page 39: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/39.jpg)
Luigi Nardi, Ph.D. - Stanford
18
Feasibility Constraint Model Performance
Valid and invalid mean that a design does or does not fit in the FPGA
HyperMapper not using the model for the feasibility constraint
GDA Benchmark
![Page 40: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/40.jpg)
Luigi Nardi, Ph.D. - Stanford
18
Feasibility Constraint Model Performance
Valid and invalid mean that a design does or does not fit in the FPGA
HyperMapper using the model for the feasibility constraint
GDA Benchmark
![Page 41: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/41.jpg)
Luigi Nardi, Ph.D. - Stanford
18
Feasibility Constraint Model Performance
Valid and invalid mean that a design does or does not fit in the FPGA
About 2x in Cycles,10% difference in LogicGDA Benchmark
![Page 42: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/42.jpg)
Luigi Nardi, Ph.D. - Stanford
18
Feasibility Constraint Model Performance
Valid and invalid mean that a design does or does not fit in the FPGA
Wastedsamples
GDA Benchmark
![Page 43: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/43.jpg)
Luigi Nardi, Ph.D. - Stanford
19
Small benchmarks - Cycles/Logic Utilization
Black Scholes Outer Product
Dot Product
• 3 small benchmarks: compute real Pareto• Real Pareto is exhaustive search• Exhaustive: 6 to 12 hours on 16 cores• Approximated Pareto is HyperMapper
![Page 44: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/44.jpg)
Luigi Nardi, Ph.D. - Stanford
20
HVI Cycles/Logic Utilization • Hypervolume indicator (HVI): scalar to compare Paretos• Baseline is a pruning (based on heuristics) and 100K random sampling
![Page 45: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/45.jpg)
Luigi Nardi, Ph.D. - Stanford
21
• Goal: HyperMapper is a multi-objective optimization framework
• Demonstrated on: hardware design, CV applications and robotics
• HyperMapper provides Intelligence Augmentation (IA):
• White-box surrogate model: easy to understand and interpret
• Prior: non-blind search - grey-box optimization approach
• Future:
• Use of prior knowledge in the multi-objective case
• For the moment only used in the DoE warm-up phase
• Application of HyperMapper to the popular Halide CV language
• Support other Black-box optimization algorithms
• Support multi-fidelities to increase efficiency on DNNs and CV
• Support batch evaluation for when a cluster is available
Conclusion and Future Work
![Page 46: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/46.jpg)
Luigi Nardi, Ph.D. - Stanford
Info on HyperMapper
22
• Join HyperMapper on Slack: hypermapper.slack.com
• Tutorials: to come
• Repo: https://github.com/luinardi/hypermapper
• Wiki: https://github.com/luinardi/hypermapper/wiki
•Early adopters:
•Microsoft (DBMS) •Stanford (hardware design) •UCSD (Spector benchmarks) •UT Austin (Capri approximate computing) •ICL (computer vision and robotics) •Etc.
![Page 47: Design Space Exploration - GitHub Pages · Design Space Exploration Two objectives: 1.Minimize clock cycles (runtime) 2.Minimize FPGA logic utilization • Useful for fitting multiple](https://reader034.fdocuments.us/reader034/viewer/2022042311/5ed97a7e1b54311e7967a341/html5/thumbnails/47.jpg)
Luigi Nardi - Stanford
Demo HyperMapper
23