The$Curiosity$Rover$Landing$lu/cse467s/slides/mid_review.pdf · 2014-03-05 ·...
Transcript of The$Curiosity$Rover$Landing$lu/cse467s/slides/mid_review.pdf · 2014-03-05 ·...
The Curiosity Rover Landing
Ø http://www.youtube.com/watch?v=a4YqNoLkmxE
Chenyang Lu 1
Landing a Spacecra7 on Mars
Ø The control software onboard the spacecraft consists of about 3 MLOC. Mostly in C, with a small portion (mostly for surface navigation) in C++.
Ø The code runs on a radiation hardened CPU. The CPU is a version of an IBM PowerPC 750, called RAD750. It has 4 GB of flash memory, 128 MB of RAM, and runs at 133 MHz.
Ø ~75% of the code is autogenerated from formalisms, e.g., state-machine descriptions and XML files. The remainder was handwritten, in many cases building on code from earlier Mars missions.
Chenyang Lu 2
1
10
100
1,000
10,000
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
Line
s of
cod
e in
thou
sand
s (K
LOC)
Year
Code size—exponential growth trend
MSLRoverMars
exploration rovers
PhoenixPathfinder
Viking
FIGURE A. The amount of flight code that is flown to land spacecraft on Mars has grown exponentially in the last 36 years. Its Compound Annual Growth Rate comes out at roughly 1.20—close to the median value of 1.16 from previous columns.
Holzmann, Gerard J., "Landing a Spacecra7 on Mars," IEEE So7ware, 30(2), pp. 83, 86, March-‐April 2013
Midterm Demo
Ø 3/26, in class. Ø 20 min/team (including 3 min for discussion).
Ø Must show something real! Ø Test and set up your demo in advance.
Ø Email Rahav a summary of your demo and progress q Deadline: 11:59pm, 3/26. q Clearly state the contribution of each team member.
Midterm Exam
Ø Open book, note, papers. Ø Bring your calculator!
Ø Lecture slides override textbook when inconsistent.
Chenyang Lu 4
Scope
Ø Introduction Ø Power management Ø Program optimization
Ø TinyOS
Chenyang Lu 5
Embedded Systems
Ø Non-functional constraints q Real-time
q Power q Energy
q Memory q Cost
q Size
Ø Designed to tight deadlines by small teams
Chenyang Lu 6
Alterna@ve Technologies
Ø Application-Specific Integrated Circuits (ASIC) Ø Microprocessor
Ø Field-Programmable Gate Arrays (FPGA)
Chenyang Lu 7
ASIC
ü Performance ü Power: Fewer logic elements à low power û Development cost: Very high
q 2 million $ for starting production of a new ASIC q Needs a long time and a large team
û Reprogrammability: None! q Difficult to upgrade systems
q Single-purpose devices
Chenyang Lu 8
Microprocessor
– Performance û Programmable architecture is fundamentally slow!
• Fetch, decode instructions
ü Highly optimized architecture and manufacturing process
• Pipeline; cache; clock frequency; circuit density; multi-core…
û Power q Processors perform poorly in terms of performance/watt!
q Power management can alleviate the power problem.
ü Flexibility, development cost and time q Let software do the work!
Chenyang Lu 9
State of the Prac@ce Ø Microprocessor is the dominant player
q Flexibility + low development cost >> low performance/watt q Power management is crucial.
Ø Microprocessor + ASIC is common q Ex: cell phone
Ø FPGA is expected to improve
Chenyang Lu 10
Power vs. Energy
Ø Power = energy consumption per unit time
Ø Power à Heat
Ø Energy à Battery life
Chenyang Lu 11
Hardware Support
Ø Clock gating Ø Shut down power supply Ø Dynamic Voltage Scaling
Chenyang Lu 12
Requirements
Ø Minimize power under performance constraints q Real-time applications
Ø Optimize performance under power constraints q Battery lifetime constraint
Ø Different tradeoff points in design space
Chenyang Lu 13
Factors in Dynamic Power Management
Ø Device: Power State Machine (PSM) Ø Workload: distribution of active and idle intervals
Chenyang Lu 14
SA-‐1100 Power State Machine
Chenyang Lu 15
run
idle sleep
PON = 400 mW
POFF = 50 mW POFF = 0.16 mW
10 µs
10 µs 90 µs
160 ms 90 µs
PTR = PON
Analysis Ø Inherent exploitability
q No performance penalty
q Assume full knowledge of workload in advance
Ø Actual energy saving and performance penalty under a practical policy
Chenyang Lu 16
Break-‐Even Time TBE Ø Enter an inactive state is beneficial only if the idle
time is longer than the break-even time q PTR ≤ PON: TBE = TTR = TON,OFF + TOFF,ON q PTR > PON: Larger TBE to compensate for energy cost
Chenyang Lu 17
Inherent Exploitability
Chenyang Lu 18
Metrics Ø (Performance) Safety: Prob(p|o)
q If an observed event happens à the probability of Tidle > TBE q Overprediction à lower safety à higher performance penalty
Ø (Energy) Efficiency: Prob(o|p) q If Tidle > TBE à the probability of successfully predicting it. q Underprediction à lower efficiency à waste more energy.
Chenyang Lu 19
Fixed Timeout Policy Ø Enter inactive state when the system remains idle for TTO. Ø Wake up in response to activity. Ø Premise: If a system has been idle for TTO à remain idle for >TBE.
Chenyang Lu 20
Impact of Timeout Threshold
Chenyang Lu 21
Possible Improvement
Ø Predictive shutdown: shut down immediately when the processor becomes idle. q Avoid wasting energy before reaching timeout threshold
q More efficient, less safe
Ø Predictive wakeup: wake up before activity occurs. q Reduce performance penalty for wake up
q Less efficient, safer
Chenyang Lu 22
Advanced Configura@on and Power Interface
Open standard for power management.
Chenyang Lu 23
Hardware plaRorm devices, processor, chipset
device drivers
ACPI BIOS
OS kernel
applicaXons
power management
ACPI System Power States
Chenyang Lu 24
Used as contract between hardware and OS vendors
Power Consumers
Ø Instruction execution (CPU) Ø Cache (instruction, data)
Ø Main memory Ø Storage
Ø Display
Ø Network interface Ø I/O devices
Chenyang Lu 25
Energy Efficiency of Memory Opera@ons Relative energy per operation: register < cache < memory Ø memory transfer: 33 Ø external I/O: 10 Ø SRAM write: 9 Ø SRAM read: 4.4 Ø multiply: 3.6 Ø add: 1
Chenyang Lu 26
Power Op@miza@ons Ø Reduce memory footprint
q Reduce code and data size q Analyze footprint to find right size
Ø Find correct cache size q Analyze cache behavior (size of work set)
Ø Minimize memory and cache access q Use registers efficiently à fewer cache access q Eliminate cache conflicts à fewer memory access
Ø Shorter execution time à more idle time
Chenyang Lu 27
Program Op@miza@on and Analysis Ø Performance Ø Memory footprint
Chenyang Lu 28
Chenyang Lu 29
for (i=0; i<N; i++) for (j=0; j<M; j++) z[i][j] = b[i][j];
zptr = z; bptr = b; for (i=0; i<N; i++)
for (j=0; j<M; j++) { zind = i*M+j; bind = i*M+j; *(zptr+zind)=*(bptr+bind) }
zptr = z; bptr = b; for (i=0; i<N; i++)
for (j=0; j<M; j++) { zbind = i*M+j; *(zptr+zbind)=*(bptr+zbind); }
zptr = z; bptr = b; zbind = 0; for (i=0; i<N; i++)
for (j=0; j<M; j++) { zbind++;
*(zptr+zbind)=*(bptr+zbind); }
induction var elimination
strength reduction
Array Conflicts in Cache
Chenyang Lu 30
a[0,0]
b[0,0]
main memory cache
1024 4099
...
1024
4099
for (i=0; i<N; i++) for (j=0; j<M; j++)
a[i][j] = a[i][j] + b[i][j];
256
More
Ø Function Inlining q Cost of function calls
q Code size vs. performance
Ø Register allocation
Chenyang Lu 31
Nested Func@on Calls (ARM) int main() { f1(x); } void f1(int a) { f2(a); }
; f1 is called by main() LDR r0, [r13] ; load para. into r0 from stack STR r14, [r13] ; store f1’s return addr.
; f1 calls f2() STR r0, [r13, #4]! ; push para. for f2 to stack BL f2 ; branch and link to f2
; f1 receives return from f2() SUB r13, #4 ; pop f2’s para. off stack
; f1 returns to main() LDR r15, [r13] ; restore register and return
Chenyang Lu 32
Execu@on Time Analysis
Ø Execution time is affected by both program path and instruction timing q Program path depends on input data values.
q Instruction timing depends on pipelining, cache behavior…
Ø Accurate execution time is unknown a priori
Ø Compile-time analysis vs. measurement
Chenyang Lu 33
Reducing Code Size
Ø Function inlining? Ø Avoid loop unrolling. Ø Use processors with dense instruction sets.
Ø Use compact instruction set. Ø Hardware support for code compression.
Chenyang Lu 34
TinyOS Two-‐level Scheduling
Ø Tasks do intensive computations q Non-preemptive FIFO scheduling q Bounded number of pending tasks
Ø Events handle interrupts q Interrupts trigger lowest level events
q Events can signal events, call commands, or post tasks
Ø Two priorities q Event/command
q Tasks
Chenyang Lu 35
Hardware
Interrupts
even
ts
commands
FIFO Tasks
POST Preempt
Time
commands
Good luck! J
Chenyang Lu 36