EECS 753: Embedded Real-Time Systems -...
Transcript of EECS 753: Embedded Real-Time Systems -...
EECS 753: Embedded Real-Time Systems
Heechul Yun
1
Welcome to EECS753
• About the course
2
About Instructor
• Heechul Yun– Assistant Prof., Dept. of EECS, University of Kansas (Aug.’13 ~ )
– Office: 3040 Eaton, 236 Nichols
– Email: [email protected]
• Educations– Ph.D. (CS), University of Illinois at Urbana-Champaign
– M.S. (CS) and B.S (CS), KAIST
• Professional Experiences– Senior software engineer @ Samsung Electronics
• Research Areas– Operating systems, embedded/real-time systems
• More Information– http://ittc.ku.edu/~heechul
3
About This Class
• Topics– Embedded Real-Time Systems. Cyber Physical Systems
• Prerequisite– EECS 645 Computer Architecture.– EECS 678 Introduction to Operating Systems.
• No textbook!• Course website
– http://ittc.ku.edu/~heechul/courses/eecs753
• Audience– Grad students (senior undergraduate) who are interested in research
• Times– Lecture: M/W/F 10:00 – 11:00 LEA 1131– Office hour: M/F 11:00 - 11:50 @ 3040 Eaton
4
About This Class
• Seminar style– I and YOU will present (more on later)
• Goals– Learn and discuss advanced topics in real-time embedded
systems– Improve your research skills
• Research skills– Learning skills
• Quickly learn from papers, books, and the internet!
– Communication skills• Written form = paper, oral form = presentation
– Programming skills• Need to build some “interesting” things
5
Topics
• Introduction to Real-Time Systems, CPS
• CPS Applications: Intelligent Vehicles
• Real-time Hardware Architecture
• Real-time OS and Middleware
• Fault tolerance and Security
6
Amazon prime air
Methodology
• Learning by reading research papers
• Learning by building an actual system
7
Reading Papers
• You are expected to
– Read assigned papers (~two/week)
– Summarize them
• Reading paper well is an important skill
– A good reference: “How to Read a Paper”
8
Written Summary
• Each summary should include:
– Summary of main ideas
– What you liked
– What you disliked
• Submit via Blackboard
9
Written Summary: Example
• [Summary] This paper presents a kernel level page allocator which is DRAM Bank-Aware. This allocator is able to allocate pages across cores in a way that causes banks to be shared or partitioned depending on user configuration. This can be used to provide more predictable memory access to multicore software. The authors implemented their memory allocator in a recent version of the Linux Kernel and compared its performance with the existing buddy allocator.
• [The good] This paper is well written. The issue of DRAM banks was not familiar to me at the time of reading but was well explained which motivated the rest of the paper well. The algorithm used is quite straightforward and the explanation is easy to follow.
• [The bad] While the authors acknowledge that the approach they take bears similarity to multi-core page coloring[1,2,3,4] the novelty of their work is not well established. This work appears to be a relatively straightforward application of rudimentary page coloring techniques. The related work section touches on these similarities but does not establish any particular novelty aside from the fact that this paper is addressing the problem of shared DRAM banks for the sake of isolation and not shared caches.
10
Lecture Organization
• Typical week
– Mon: Lecture on the week’s topic
– Wed/Fri: Paper presentations
• Paper presentation– I will Introduce the paper
– I (or you) will present the paper
– We will discuss the paper
• You are required to present
– One (or two) paper per semester
– May change depending on the class size
11
Reading List
• Posted on the class website
– Subject to change
– Mostly recent papers and some classic ones
• Sign-up process
– Email me the paper in the list you want to present
– I will update the schedule on a First Come First Serve (FCFS) basis
12
Paper Presentation & Discussion
• Suggested structure (30min)– Motivation & Background
• Ask why the authors write this paper?
– Explain the main ideas • From your perspective. Careful about their assumptions
– Discussion topics• Questions: “I don’t understand XXX.” • Critiques: “This approach seems bad because …”
• Submission– Draft: by 5:00 p.m. the day before your presentation– Final version: before the class begins
13
Final Exam
• No midterm exam
• Early final exam on April 15
14
Homework & Mini Projects
• Plan
– Two homework assignments on Linux PC
– Two mini projects using a Raspberry Pi 3
• About
– Basic real-time scheduling
– Basic AI and self-driving car
15
• Use a Convolutional Neural Network (CNN) to drive a car.
• Trained with human driving data
• Could successfully drive a car on public roads w/o human
16
Source: https://devblogs.nvidia.com/deep-learning-self-driving-cars/DAVE-2 CNN: 9 layers, ~250K parameters, ~27M connections
NVIDIA DAVE-2 Self-Driving Car
Video: https://www.youtube.com/watch?v=NJU9ULQUwng
Term Project
• 2nd half of the semester
• DeepPicar Competition
– Build a self-driving car
– Based on DeepPicar
– Competition format
17
DeepPicar Competition
• Goal
– Safely drive autonomously on a given track
– Using camera and Deep Neural Network (DNN)
• Metrics
– Distance and time
• Your tasks
– Build a car (instruction, materials will be given)
– Develop/tune the AI (basic code will be given)
18
(Tentative) Project Schedule
• 3/18: Materials ready (build start)
• 4/22: Manual driving check
• 4/29: Autonomous driving check
• 5/06: Competition
• 5/15: Final report due
– 5 pages
– Must be written using Latex
19
Latex
• Everybody in CS uses it to write papers
– Final report must be prepared using Latex
• Overleaf
– https://www.overleaf.com
• Ubuntu
– Install texlive-full
• Window
– Install MikTex.
20
Grading
• Paper summaries (20%)
• Student presentations (10%)
• Final exam (30%)
• Mini project(5%)
• Homework (5%)
• Project (30%)– Competition: 20%
– Final report: 10%
21
Grading
• 90+ : A
• 80-89: B
• 70-79: C
• 50-69: D
• 0-49: F
22
Office Hours
• M/F 11:00 – 11:50 at 3040 Eaton
• By appoint at 236 Nichols
23
Introduce Yourselves
• Name
• Status: grad/undergrad, year
• Relevant background
• Interests
– What do you want to learn in this class?
24
Today
• Course overview
25
Embedded Systems
• Computing systems designed for specific purpose.
• Embedded systems are everywhere
26
Today’s Car
• Quiz. How many embedded processors are in a car?
– A: ~100s
27
Simon Fürst, BMW, EMCC2015 Munich, adopted from OSPERT2015 keynote
Future Automotive Systems
28
A. Hamann. “Industrial challenges: Moving from classical to high performance real-time systems.” In International Workshop on Analysis Tools and Methodologies for Embedded and Real-time Systems (WATERS), July 2018
Trends
• More powerful and cheaper computing
• More connected
29
Internet of Things (IoT)
• IoT ~= Internet connected embedded systems
30
Cyber-Physical Systems (CPS)
• Cyber system (Computer) + Physical system (Plant)
• Still embedded systems, but integration of physical systemsis emphasized.
31
Real-Time Systems
• The correctness of the system depends on not only on the logical result of the computation but also on the time at which the results are produced
• A correct value at a wrong time is a fault.
• CPS are often real-time systems
– Because physical process depends on time
32
CPS Requirements
• Real-time performance– Meet deadlines in processing large amounts of real-time data
from various sensors (e.g., autonomous cars)– Many constraints: size, weight, and power (SWaP); cost
• Safety– Interact with the environment, human, in real-time– Can hurt humans, destroy things, blow up (e.g., Nuclear plants)– Need both logical and temporal (time) correctness
• Security– Communicate over the internet (cloud servers etc.)– Remote software update (fix bugs, …)– Run untrusted 3rd party software (e.g., Apple CarPlay)
33
Performance
• Many cyber-physical systems (CPS) need:
– More performance
– Less cost, size, weight, and power
34
CMU’s “Boss” Self-driving car, circa 200710 dual-processor blade servers on the trunk
Audi’s zFAS platform. 2016-2018A single-board computer with multiple CPUs, GPU, FPGA
Audi A8
Compute Performance Demand
35Intel, “Technology and Computing Requirements for Self-Driving Cars”
Real-Time Data
• from many sensors needs powerful computers
36
Source: http://on-demand.gputechconf.com/gtc/2015/presentation/S5870-Daniel-Lipinski.pdf
Size, Weight, and Power (SWaP) Constraints
• Maximum performance with minimal resources
– Cannot afford too many or too power hungry ECUs
37Figure source: OSPERT 2015 Keynote by Leibinger
Mobileye EveQ4
• Real-time vision processor w/ DNN
• 2.5 teraflops @ 3W
• 8 cameras @ 36 fps
• Tesla uses EveQ3
• 14 cores
– 4 MIPS cores
– 10 vector cores
38
Nvidia’s Drive PX2 Platform
• 12 CPU + 2 GPU
– 8 Tegraflops @250W
• Real-time processing of
– Up to 12 cameras, radar, ..
– Deep Neural Network (DNN) for detection, classification
39http://www.nvidia.com/object/drive-px.html
Safety Failures
40
• Computer controlled medical X-ray treatments
• Six people died/injured due to massive overdoses (1985-1987)
• Caused by synchronization mistakes
• 7 billion dollar rocket was destroyed after 40 secs (6/4/1996)
• “caused by the complete loss of guidance and altitude information ” Caused by 64bit floating to 16bit integer conversion
Therac 25 Arian 5
Air France 447 (2009)
• Airbus A330 crashed into the Atlantic Ocean in 2009
• Caused in part by computer’s misguidance– Pitot tube (speed sensor) failure Flight Director (FD) malfunction
(shows “head up”) pilots follow the faulty FD enter stall
41http://www.spiegel.de/international/world/experts-say-focus-on-manual-flying-skills-needed-after-air-france-crash-a-843421.htmlhttp://www.slate.com/blogs/the_eye/2015/06/25/air_france_flight_447_and_the_safety_paradox_of_airline_automation_on_99.html
Stall
Normal
Lion Air Flight 610 (2018)
• Boeing 737 crashed into the Java See in 2018
• Caused by stall prevention system (MCAS)
– sensor error (plane is “stall”) nose down (to the ocean)
42
Tesla Autopilot (2016)
43http://www.nytimes.com/interactive/2016/07/01/business/inside-tesla-accident.html
• Tesla autopilot failed to recognize a trailer resulting in a death of the driver
NHTSA Report
• Both the radar and camera sub-systems are designed for front-to-rear collision prediction mitigation or avoidance.
• The system requires agreement from both sensor systems to initiate automatic braking.
• The camera system uses Mobileye’s EyeQ3 processing chip which uses a large dataset of the rear images of vehicles to make its target classification decisions.
• Complex or unusual vehicle shapes may delay or prevent the system from classifying certain vehicles as targets/threats
44
https://static.nhtsa.gov/odi/inv/2016/INCLA-PE16007-7876.PDF
NHTSA Report
• Object classification algorithms in the Tesla and peer vehicles with AEB technologies are designed to avoid false positive brake activations.
• The Florida crash involved a target image (side of a tractor trailer) that would not be a “true” target in the EyeQ3 vision system dataset and
• The tractor trailer was not moving in the same longitudinal direction as the Tesla, which is the vehicle kinematic scenario the radar system is designed to detect
45
https://static.nhtsa.gov/odi/inv/2016/INCLA-PE16007-7876.PDF
Uber Self-Driving Car (2018)
46
• Kill a pedestrian crossing a road in Arizona
https://www.nytimes.com/2018/03/19/technology/uber-driverless-fatality.html
NTSB Report
• The system first registered radar and LIDAR observations of the pedestrian about 6 seconds before impact
• Software classified the pedestrian as an unknown object, as a vehicle, and then as a bicycle with varying expectations of future travel path.
• At 1.3 seconds before impact,the system determined that an emergency braking maneuver was needed
• Emergency braking maneuvers are not enabled while the vehicle is under computer control, to reduce the potentialfor erratic vehicle behavior
47https://www.ntsb.gov/investigations/AccidentReports/Reports/HWY18MH010-prelim.pdf
Failures in CPS have consequences
Security
• Interconnected CPS are open to attacks
• Examples– Stuxnet: Iranian nuclear power
plant hacking
– Vermont power grid hack by Russia
– Remote hack into cars (Jeep)
– Police drone hacking
– Sensor hacking: GPS spoofing. IMU spoofing
48
Challenges
• Time Predictability
• Complexity
• Reliability
• Security
49
Time Predictability
• At low-level, hardware is deterministic timing
• At higher-levels, not so much ignore timing– Pipeline, caches, Out-of-order execution,
speculation, ISA
– Process, thread, lock, interrupt
• Focus on average case, not worst-case. No guarantees– Fine in cyber world
– Real-world doesn’t work that way
50
Timing Predictability
• Q. Can you tell exactly how long a piece of code will take to execute on a computer?
– Used to be (relatively) easy to do so.
• Measure timing. Use the timing for analysis.
– Very difficult to answer in today’s computers
• Pipeline, cache, out-of-order and speculative execution, multicore, shared cache/dram very high variance.
51
Denial-of-Service Attack
• Delay execution time of time sensitive code– E.g., real-time control software of a car– Observed >21X execution time increase on Odroid XU4 (*)
• Even after cache partitioning is applied
– Observed >10X increase on RPi 3 (**)
• Of a realistic DNN-based real-time control program
52
LLC
Core1 Core2 Core3 Core4
bench co-runner(s)
(*) Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi. “Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems.” In RTAS, IEEE, 2016. Best Paper Award(**) Michael Garrett Bechtel, Elise McEllhiney, Minje Kim, Heechul Yun. “DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car.” In RTCSA, IEEE, 2018
Denial-of-Service Attack
53[C] Michael Garrett Bechtel and Heechul Yun. Denial-of-Service Attacks on Shared Cache in Multicore: Analysis and Prevention. IEEE Intl. Conference on Real-Time and Embedded Technology and Applications Symposium (RTAS), IEEE, 2019. (to appear)
> 300X slowdown !!!
Complexity
• Software complexity increases
54
Lines of Code in Typical GM Car
1
10
100
1000
10000
100000
1970 1990 2010
Model Year
KL
OC
Figures are from NASA JPL. “Flight Software Complexity,” 2008
Growth in Software Size
0
200
400
600
800
1000
1200
1400
Apollo 1968 Space Shuttle Orion (est.)
Flight Vehicle
K S
LO
C
Linux Kernel Code Size
• Linux: > 15M SLOC, multithreaded
Software bugs are hard to weed out
55
https://www.quora.com/How-many-lines-of-code-are-in-the-Linux-kernel
Reliability
• Transient hardware faults (soft errors)
– Single event upset (SEU) in SRAM, logic• Due to alpha particle, cosmic radiation
– Manifested as software failures• Crashes, wrong output: silent data corruption
– Bigger problem in advanced CPU• Increased density, freq higher soft error
• Hardware bugs
– Pentium floating point bug (FDIV bug)
– Intel CPU bugs in 2015: http://danluu.com/cpu-bugs/• “Certain Combinations of AVX Instructions May Cause Unpredictable System Behavior”
• “Processor May Experience a Spurious LLC-Related Machine Check During Periods of High Activity”
56
http://www.cotsjournalonline.com/articles/view/102279
Security
57https://meltdownattack.com/
Micro-Architectural Side-Channels
• Many micro-architectural components contain hidden state which leaks secret– often via observable timing variations
• Known to exist in cache, DRAM bank, OoO speculation, branch predictor, etc.
• Logically correct, proven software is also vulnerable
58
Example: Spectre Attack
• Wrong branch is speculatively taken.
• x is maliciously chosen by the attacker.
• The attacker probes arrary2 to recover secret: array1[x]
59
(Cache) Timing Channel Attack
• By measuring access timing differences of a memory location, an attacker can determine whether the memory is cached or not.
• This can be used to leak secret information
• Methods: Flush + Reload, Prime + Probe, etc.
60Image source: M. Lipp et al., “Meltdown,” arXiv Prepr., 2018.
CPS: Related Areas
• CPS requires inter disciplinary approach
– EECS
• Computer architecture
• Real-time systems
• Formal method
• Software engineering
• Control
– Aerospace, and other engineering
• Physical systems (plant/actuator) modeling/control
61
Topics
• Introduction to Real-Time Systems, CPS
• CPS Applications
• Real-time multicore architecture
• Real-time OS and middleware
• Fault tolerance, safety, security
62Amazon prime air
Topics
• Introduction to Real-Time Systems, CPS
– Background on Real-time scheduling theory, timing analysis, server, priority inversion
• CPS Applications
• Real-time architecture
• Real-time OS and middleware
• Fault tolerance, safety, security
63
Topics
• Introduction to Real-Time Systems, CPS
• CPS Applications
– More detailed look at individual CPS applications
– Intelligent vehicle development techniques
• Real-time architecture
• Real-time OS and middleware
• Fault tolerance, safety, security
64
Topics
• Introduction to Real-Time Systems, CPS
• CPS Applications
• Real-time architecture
– Real-time cache, DRAM controller designs
– Predictable microarchitecture designs
– Real-time support for GPU/FPGA
• Real-time OS and middleware
• Fault tolerance, safety, security
65
Topics
• Introduction to Real-Time Systems, CPS
• CPS Applications
• Real-time architecture
• Real-time OS and middleware
– RTOS, ARINC 653, AUTOSAR, ROS, DDS
• Fault tolerance, safety, security
66
Topics
• Introduction to Real-Time Systems, CPS
• CPS Applications
• Real-time architecture
• Real-time OS and middleware
• Fault tolerance, safety, security
– CPS specific security issues, case studies
– Simplex architecture,
– CPS modeling and verification
67
APPENDIX
68
69
Links
• Linux kernel related
– PALLOC
– MemGuard
• Self-driving car related
– DeepTraffichttps://selfdrivingcars.mit.edu/deeptraffic/
– DeepTeslahttps://selfdrivingcars.mit.edu/deeptesla/
70
Embedded Systems
• More embedded systems than PC/servers
– 10 billion chips in 2013 by ARM
71http://jbpress.ismedia.jp/articles/-/36814