Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang,...
-
date post
21-Dec-2015 -
Category
Documents
-
view
212 -
download
0
Transcript of Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang,...
![Page 1: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/1.jpg)
Adaptive System on a Chip (aSoC) for Low-Power Signal Processing
Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng,
Wayne Burleson, Russell TessierDepartment of Electrical and Computer
EngineeringUniversity of Massachusetts, Amherst
{alaffely, jliang, pjain, nweng, burleson, tessier} @ecs.umass.edu
This material is based upon work supported by the National Science Foundation under Grant No. 9988238.Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
![Page 2: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/2.jpg)
Overview
• Motivation• Video Processing
• Architecture• Dynamic Power Management
• Core, Interconnect, and Clock
![Page 3: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/3.jpg)
Problem
• Wireless video processing requires• High throughput • Low Power• Flexible
![Page 4: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/4.jpg)
System on a Chip Solutions
• Take advantage of parallelism• Possible improved performance
• Allow use and reuse of existing integrated components
• If• The application can be partitioned • The appropriate architecture is used
![Page 5: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/5.jpg)
Proposed Architecture: aSoC• High throughput
• Heterogeneous processor elements• Use the right tool for the job
• Fast and predictable interconnect
• Flexible• Runtime reconfiguration of cores and
interconnect
• Power consumption• Implement power saving features in both
cores and interconnect• Use reconfiguration to dynamically control
power consumption
![Page 6: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/6.jpg)
aSoC: adaptive System on a Chip
• Tiled SoC architectureDCT
VLE
MemoryViterbiFIR
EncryptControl
Motion Estimationand Compensation
![Page 7: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/7.jpg)
aSoC: adaptive System on a Chip
• Tiled SoC architecture• Supports the use of
independently developed heterogeneous cores
• Pick and place cores which best perform the given application
• Increase performance
• Save power• Cores may be any
number of tiles in size
DCT
VLE
MemoryViterbiFIR
EncryptControl
Motion Estimationand Compensation
![Page 8: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/8.jpg)
aSoC: adaptive System on a Chip
• Tiled SoC architecture• Supports the use of
independently developed heterogeneous cores
• Connected with an interconnect mesh
• Restricted to near neighbor communications
• Creates pipeline• Decreases cycle time
DCT
VLE
MemoryViterbiFIR
EncryptControl
Motion Estimationand Compensation
![Page 9: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/9.jpg)
aSoC: adaptive System on a Chip
• Tiled SoC architecture• Supports the use of
independently developed heterogeneous cores
• Connected with a fixed interconnect mesh
• Using a communication interface (CI) to manage data
• Network port (Coreport) for each core
• Each CI uses a memory and FSM to repetitively process a predefined schedule of communications
• Crossbar
DCT
VLE
MemoryViterbiFIR
EncryptControl
Motion Estimationand Compensation
![Page 10: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/10.jpg)
Stream Control• Instruction memory
• Holds the predetermined schedule of communications
• PC • Selects and synchronizes
the communications• Decoder
• Sets crossbar• Controller
• Sets PC • Interprets incoming
configuration commands• Crossbar
• Any input to any set of outputs
NorthSouthEastWest
CoreNorthSouthEastWest
Core
Decoder/Controller
PC
InputsOutputs
Instruction
Memory
LocalConfig
.
![Page 11: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/11.jpg)
Example: Communication
• Stream A-D
• Core CCore BCore A
• A given application requires periodic communications from Core A to Core C
• aSoC uses a prescheduled communication STREAM• Core A places the data in a dedicated STREAM between
the two tiles• Core C pulls the data from that STREAM
• The tile to tile communication uses 3 cycles
![Page 12: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/12.jpg)
Example: Stream
CBA
1 Core to East
![Page 13: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/13.jpg)
Example: Stream
• Stream A-D
• CBA
2 West to East
![Page 14: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/14.jpg)
Example: Stream
CBA
West to Core3
![Page 15: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/15.jpg)
Example: Stream
• Stream A-D
• CBA
West to Core
1
3
2
Core to East
West to EastLoopBack
![Page 16: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/16.jpg)
Static Scheduled Communications
• Creates system scalability by “eliminating” network congestion
• Many interconnect segments managed with time division multiplexing
• lots of Bandwidth
• Improves SoC performance by up to
factor of 8
DCT
VLE
MemoryViterbiFIR
EncryptControl
Motion Estimationand Compensation
![Page 17: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/17.jpg)
Power Consumption?
• Provide reconfiguration methods for cores and CI
• Develop programmable clocking systems at each tile
![Page 18: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/18.jpg)
Power Aware Core
• Custom motion estimation core• Choose search method
• Full search• 960-600mW (bit width and pel sub-sampling)
• Spiral search• 76mW
• Three step search• 25mW
Data taken with SynopsysTM Power Compiler at the RTL level
![Page 19: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/19.jpg)
aSoC Support
• Multiple streams in and out through dedicated coreports
• Easy to manage on both sides of the port
• Schedule configuration streams in with the data
• Stream A: Input Frame• Stream B: Configuration
(Choose search mode and size)
• Stream C: Motion Vectors
Motion Estimation
Core
in1 in2 out2out1
Stream AStream B
Stream C
Coreports
![Page 20: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/20.jpg)
Reconfigurable Interconnect
• P-frame
• I-frame
ME MC
-
+
InputFrame
DCTInputFrame
DCT
![Page 21: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/21.jpg)
aSoC Support
• Lumped ME, MC and Summation into one double core
DCTMotion Estimation& Compensation
![Page 22: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/22.jpg)
aSoC Support: P-Frame
InputFrame
(Stream A)
DCTMotion Estimation& Compensation
DifferenceFrame
(Stream B)
![Page 23: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/23.jpg)
aSoC Support: Schedule Change
InputFrame
(Stream A)
DCTMotion Estimation& Compensation
DifferenceFrame
(Stream B)
Configuration Streams (C & D)
![Page 24: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/24.jpg)
aSoC Support: Schedule Change
InputFrame
(Stream A)
DCTMotion Estimation& Compensation
DifferenceFrame
(Stream B)
Configuration(Streams C)
Schedule 1
Schedule 2
PC
![Page 25: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/25.jpg)
aSoC Support: Schedule Change
InputFrame
(Stream A)
DCTMotion Estimation& Compensation
DifferenceFrame
(Stream B)
Configuration(Streams C)
Schedule 1
Schedule 2
PC
![Page 26: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/26.jpg)
aSoC Support: Schedule Change
InputFrame
(Stream A)
DCTMotion Estimation& Compensation
Configuration(Streams D)
Schedule 1
Schedule 2
PC
![Page 27: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/27.jpg)
aSoC Support: Schedule Change
InputFrame
(Stream A’)
DCTMotion Estimation& Compensation
Configuration(Streams D)
Schedule 1
Schedule 2
PC
![Page 28: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/28.jpg)
aSoC Support: I-Frame
InputFrame
(Stream A’)
DCTMotion Estimation& Compensation
OFF
![Page 29: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/29.jpg)
Operating Frequency?
• Interconnect synchronized• H-tree clock distribution
• Core frequencies depend on critical path• Tile provides clock reference• Coreport provides asynchronous boundary
• Dynamic core configuration requires dynamic clock configuration• aSoC clock reference provides multiples of
interconnect clock (… 4x, 2x, 1x, 0.5x, 0.25x, …)
• Configured through the tile controller
![Page 30: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/30.jpg)
Mixed vs. Fixed Core Frequencies
• Cores not designed with clock gating• Core power from Synopsys RTL simulation• Interconnect from SPICE• Assumes 10 cycle schedule, 4 pixels/word
Optimal Independent Frequencies
Fixed Worst Case 105MHz
Core: Mode
Frequency MHz
Power mW
Power mW
ME: Full Search
105 973 973
ME: Spiral
9.9 76 659
ME: Three Step Search
2.75 25 580
DCT 9.6 54 349 Interconnect 6.34 0.14 0.81
![Page 31: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/31.jpg)
Current Density and Clocking
• Red: fixed worst case clocking
• Short spikes of high current
• Green: optimal independent clocking
• Slow and low
• Optimal clocking eliminates current spikes (improved battery life)
DeadlineProcess Start
ME: Full Search
ME: Spiral
ME: Three Step Search
DCT
Time
Current
![Page 32: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/32.jpg)
Configuration Overhead• Configuration adds up
to 2 streams per tile• Only 2 required for
data• Total BW =5xTxN
• 5 streams/(cycle,tile)• T tiles• N cycles in schedule
• Single tile can support up to 50 different streams in 10 cycle schedule
DCT
TransformFrame
(Stream D)
InputFrame
(Stream B)
ConfigurationStreams
![Page 33: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/33.jpg)
Configuration Power Overhead
• Configuration streams used infrequently• Once/Macro block or Once/Frame
• Architecture disables unused streams• Data valid bit already used for flow control
• Only 4-9% of interconnect power is due to configuration streams
![Page 34: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/34.jpg)
Conclusion
• aSoC supports dynamic power management with Reconfiguration• Cores• Interconnect• Clocks
• Low configuration overhead in both• Communication Bandwidth• Power
![Page 35: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/35.jpg)
Future Work
• Add reconfigurable voltage supplies at each tile
• Finish test chip• Import larger applications
![Page 36: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/36.jpg)
Questions
![Page 37: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/37.jpg)
aSoC: adaptive System on a Chip
DCT
VLE
MemoryViterbiFIR
EncryptControl
Motion Estimationand Compensation Cores
Interconnect
Interface
Tile
![Page 38: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/38.jpg)
Example: Stream
• Stream A-D
• CBA
![Page 39: Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.](https://reader030.fdocuments.us/reader030/viewer/2022032522/56649d615503460f94a42ebb/html5/thumbnails/39.jpg)
Partitioning
• Automated partitioning a non trivial problem
• For small signal processing systems user defined partitioning may be possible
• Key: Perfectly partitioning the system may not be possible• How can the SoC mitigate the
penalty?