L35: Lower Power Voltage Scaling
description
Transcript of L35: Lower Power Voltage Scaling
SungKyunKwan Univ.
1VADA Lab.
L35: Lower Power Voltage Scaling
1999. 8.
성균관대학교 조 준 동 http://vada.skku.ac.kr
SungKyunKwan Univ.
2VADA Lab.
Voltage Scaling
• Merely changing a processor clock frequency is not an effective technique for reducing energy consumption. Reducing the clock frequency will reduce the power consumed by a processor, however, it does not reduce the energy required to perform a given task.
• Lowering the voltage along with the clock actually alters the energy-per-operation of the microprocessor, reducing the energy required to perform a fixed amount of work.
SungKyunKwan Univ.
3VADA Lab.
Dynamic Voltage Scaling(DVS)
SungKyunKwan Univ.
4VADA Lab.
Processor Usage Model
SungKyunKwan Univ.
5VADA Lab.
OS: Voltage Scaling
SungKyunKwan Univ.
6VADA Lab.
Scale Supply Voltage with fCLK
SungKyunKwan Univ.
7VADA Lab.
Adaptive Power Supply Voltages
SungKyunKwan Univ.
8VADA Lab.
Variable Supply Voltage Block Diagram
SungKyunKwan Univ.
9VADA Lab.
Typical MPEG IDCT Histogram
SungKyunKwan Univ.
10VADA Lab.
Voltage scheduling under timing constraints
– Energy consumption of a processor:• 10nJ/cycle at 2.5V
• 25nJ/cycle at 4 V
• 40nJ/cycle at 5V
– maximum clock frequencies:• 50MHz at 5V, 40MHz at 4V, 25MHz at 2.5V
– Given that an application needs 1000M cycles to finish and the timing constaint is 25sec.
SungKyunKwan Univ.
11VADA Lab.
Different Voltage Schedules
0 5 10 15 20 25 Time(sec)
5.021000Mcycles50MHz
40J
(A)
0 5 10 15 20 25 Time(sec)
5.02750Mcycles50MHz
32.5J
(B)
0 5 10 15 20 25Time(sec)
5.02
1000Mcycles40MHz
25J (C)
Timing constraint
2.52
250Mcycles25MHz
4.02
En
ergy
con
sum
pti
on (
Vd
d2 )
SungKyunKwan Univ.
12VADA Lab.
Example of Variable Supply
SungKyunKwan Univ.
13VADA Lab.
DVS Implementation
SungKyunKwan Univ.
14VADA Lab.
Variable Supply Voltage Block Diagram
• Computational work varies with time. An approach to reduce the energy consumption of such systems beyond shut down involves the dynamic adjustment of supply voltage based on computational workload.
• The basic idea is to lower power supply when the a fixed supply for some fraction of time.
• The supply voltage and clock rate are increased during high workload period.
SungKyunKwan Univ.
15VADA Lab.
Data Driven Signal Processing
The basic idea of averaging two samples are buffered and their work loads are averaged.
The averaged workload is then used as the effective workload to drive the power supply.
Using a pingpong buffering scheme, data samples In +2, In +3
are being buffered while In, In +1
are being processed.
SungKyunKwan Univ.
16VADA Lab.
Example of Buffering
SungKyunKwan Univ.
17VADA Lab.
Graphical Interpretation
SungKyunKwan Univ.
18VADA Lab.
Buffering Example: MPEG Decoder
SungKyunKwan Univ.
19VADA Lab.
DVS
SungKyunKwan Univ.
20VADA Lab.
DVS Scheduling Frameworkµ
Pro
c. S
peed
Time
Start Deadline Start Deadline
Idle time represents
wasted energy
Lower speed,Lower voltage, Lower energy
Energy ~ Work • Speed
WorkWork
• Use real-time framework toconstrain task voltage scheduling
SungKyunKwan Univ.
21VADA Lab.
DVS SimulationS
peed
Time
S1 S2 S3 D1 D3 D2 Task Variance
Weather
Interrupts
User Input
Cache Behavior
Scheduling Overhead
IntercomIntercom
RealityTheory ImplementationSimulate run-time scheduler to
fully understand voltage-scaling behavior
SungKyunKwan Univ.
22VADA Lab.
Simulation InfrastructureGUI
Run-timeScheduler
VoltageScheduler
Applicationsupport libraries
MPEG Priority 80GUI Priority 23
MPEG Priority 80GUI Priority 23
Speed Priority
{ Frame_Start(deadline); Decode_MPEG_Frame(); Frame_Finish();}
{ Frame_Start(deadline); Decode_MPEG_Frame(); Frame_Finish();}Windowing
Cryptography
I/O Support
lpARM
MPEG
Develop support environment tomodel complete software system
SungKyunKwan Univ.
23VADA Lab.
Run-Time Voltage Scaling
73%
58%
25%16%
65%
46%
15%20%
0%
20%
40%
60%
80%
100%
Audio GUI MPEG Audio &MPEG
To
tal
Sy
ste
m E
ne
rgy
DVS SimulationPost-Trace Optimal
Normalized to 3.3V
fixed-voltage processor
Combination of independent
benchmarks
• Dynamic Voltage Scalingsignificantly reduces energy dissipation!
SungKyunKwan Univ.
24VADA Lab.
Run-Time Performance AnalysisFrame Computation Histogram
0%
20%
40%
60%
80%
100%
Fixed-V Frame Execution Time
AudioGUIMPEG
DVS System Energy
0%
20%
40%
60%
80%
100%
To
tal S
ys
tem
En
erg
y
Basic AlgorithmAdjusted AlgorithmPost-Trace Optimal
Audio MPEG GUI
Software can automatically recognize and adjust for
bi-modal GUI distribution
0 2x deadline
Normalized to deadline at max processor speed
• Application characteristics strongly affectvoltage scaling performance
SungKyunKwan Univ.
25VADA Lab.
Compute ASAP+ System Shutdown
SungKyunKwan Univ.
26VADA Lab.
Another Approach: Reduce Clock Frequency
SungKyunKwan Univ.
27VADA Lab.
Voltage Scheduling II
SungKyunKwan Univ.
28VADA Lab.
Evaluation: Algorithms
SungKyunKwan Univ.
29VADA Lab.
AVG<weight>
• Computes an exponentially moving average of the previous intervals. At each interval the run-percent from the previous interval is combined with the previous running average, forming a long-term prediction of system behavior. <weight> is the relative weighting of past intervals relative of the current interval (larger value means a great weight on the past) using the equation (weight X old + new)/(weight+1). 3 can be used.
SungKyunKwan Univ.
30VADA Lab.
OS: Voltage Scheduling
SungKyunKwan Univ.
31VADA Lab.
Run-Time Scheduling Dynamicsµ
Pro
c. S
peed
Time
Thread accomplishing more than expected,
reduce speedDeadline exceeded,
increase speedHigher-priority
task
Run faster to make up lost time
Initial speed estimate
Optimal scheduleE(work)
Workload calculated to be average of previous frames
• Periodically re-evaluate schedule toadjust for unforeseen events
SungKyunKwan Univ.
32VADA Lab.
Vertical Layering
SungKyunKwan Univ.
33VADA Lab.
Optimal Scheduling• For a region spanned by a given task
specification, each point in time will either be scheduled at the minimum speed spanned by that task or else the task will not be scheduled to run at that point.
Algorithm• n tasks to schedule• O(n) speed settings to consider for each task• O(n) linked tasks requiring adjustment for
each setting: Total complexity: O(n 3 ) time.
SungKyunKwan Univ.
34VADA Lab.
Scheduling step0
SungKyunKwan Univ.
35VADA Lab.
Scheduling step1
SungKyunKwan Univ.
36VADA Lab.
Scheduling step2
SungKyunKwan Univ.
37VADA Lab.
Scheduling step3
SungKyunKwan Univ.
38VADA Lab.
Scheduling step4
SungKyunKwan Univ.
39VADA Lab.
Scheduling step5
SungKyunKwan Univ.
40VADA Lab.
References• [Lin97] Lin et al., "Scheduling Techniques for Variable Voltage Low Power Designs," ACM Transaction
s on Design Automation of Electronic Systems, vol. 2, no. 2, pp. 81-97, 1997.
• [Govil95] - Extended simulation with practical algorithms on traces of UNIX workstations• [Kuroda98] - Implementation of DVS processor to mitigate effects of process variation• [Ishihara98] - Dynamic voltage scaling with non- constant capacitances• S. Gary, et. al., "The PowerPC 603 Microprocessor: A Low-Power Design for Portable Applications," Pr
oceedings of the Thirty-Ninth IEEE Computer Society International Conference, Mar. 1994, pp. 307-15.
• A. Chandrakasan, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers, 1995.
• C. Nagendra, et.al., "A Comparison of the Power-Delay Characteristics of CMOS Adders,” Proceedings of the International Workshop on Low Power Design, Apr. 1994, pp. 231-6.
• T. Callaway and E. Swartzlander, "Optimizing Arithmetic Elements for Signal Processing," VLSI Signal Processing, Vol. 5, New York: IEEE Special Publications, 1992, pp. 91-100.
• T. Biggs, et. al., "A 1 Watt 68040-Compatible Microprocessor," Proceedings of the IEEE Symposium on Low Power Electronics, Oct. 1994, pp. 8-11.
• J. Lorch, A Complete Picture of the Energy Consumption of a Portable Computer, M.S. Thesis, University of California, Berkeley, 1995
SungKyunKwan Univ.
41VADA Lab.
References• S. Kunii, "Means of Realizing Long Battery Life in Portable PCs," Proceedings of the I
EEE Symposium on Low Power Electronics, Oct. 1995, pp. 12-3.
• M. Culbert, "Low Power Hardware for a High Performance PDA," Proceedings of the Thirty-Ninth IEEE Computer Society International Conference, Mar. 1994, pp. 144-7.
• T. Ikeda, "ThinkPad Low-Power Evolution," Proceedings of the IEEE Symposium on Low Power Electronics, Oct. 1995, pp. 6-7.
• A. Chandrakasan, A. Burstein, and R.W. Brodersen, "A Low Power Chipset for Portable Multimedia Applications," IEEE Journal of Solid State Circuits, Vol. 29, Dec. 1994, pp. 1415-28.
• M. Horowitz, T. Indermaur, and R. Gonzalez, "Low-Power Digital Design," Proceedings of the IEEE Symposium on Low Power Electronics, Oct. 1994, pp. 8-11.
• D. Lidsky and J. Rabaey, "Early Power Exploration - A World Wide Web Application," Proceedings of the Thirty-Third Design Automation Conference, June 1996.
• T. Burd, Low-Power CMOS Cell Library Design Methodology, M.S. Thesis, University of California, Berkeley, UCB/ERL M94/89, 1994.
SungKyunKwan Univ.
42VADA Lab.
• A. Chandrakasan, S. Sheng, and R.W. Brodersen, "Low-Power CMOS Digital Design," IEEE Journal of Solid State Circuits, Apr. 1992, pp. 473-84.
• Advanced RISC Machines, Ltd., ARM710 Data Sheet, Technical Document, Dec. 1994.
• Integrated Device Technology, Inc., Enhanced Orion 64-Bit RISC Microprocessor, Data Sheet, Sep. 1995.
• Intel Corp., Embedded Ultra-Low Power Intel486TM GX Processor, SmartDieTM Product Specification, Dec. 1995.
• A. Stratakos, S. Sanders, and R.W. Brodersen, "A Low-voltage CMOS DC-DC Converter for Portable Battery-operated Systems," Proceedings of the Twenty-Fifth IEEE Power Electronics Specialist Conference, June 1994, pp. 619-26.
• J. Bunda, et. al., "16-Bit vs. 32-Bit Instructions for Pipelined Architectures," Proceedings of the 20th International Symposium on Computer Architecture, May 1993, pp. 237-46.
• Advanced RISC Machines, Ltd., Introduction to Thumb, Developer Technical Document, Mar. 1995.
SungKyunKwan Univ.
43VADA Lab.
• J. Bunda, W.C. Athas, and D. Fussell, "Evaluating Power Implications of CMOS Microprocessor Design Decisions," Proceedings of the International Workshop on Low Power Design, Apr. 1994, pp. 147-52.
• P. Freet, "The SH Microprocessor: 16-Bit Fixed Length Instruction Set Provides Better Power and Die Size," Proceedings of the Thirty-Ninth IEEE Computer Society International Conference, Mar. 1994, pp. 486-8.
• T. Burd, B. Peters, A Power Analysis of a Microprocessor: A Study of an Implementation of the MIPS R3000 Architecture, ERL Technical Report, University of California, Berkeley, 1994.
• J. Montanaro, et. al., "A 160MHz 32b 0.5W CMOS RISC Microprocessor," Proceedings of the Thirty-Ninth IEEE International Solid-State Circuits Conference - Slide Supplement, Feb. 1996, pp. 170-1.
• J. Bunda, Instruction-Processing Optimization Techniques for VLSI Microprocessors, Ph.D. Thesis, The University of Texas at Austin, 1993.
• R. Gonzalez and M. Horowitz, "Energy Dissipation in General Purpose Processors," Proceedings of the IEEE Symposium on Low Power Electronics, Oct. 1995, pp. 12-3.