EECS 753: Embedded Real-Time Systems -...

Post on 07-Oct-2020

3 views 0 download

Transcript of EECS 753: Embedded Real-Time Systems -...

EECS 753: Embedded Real-Time Systems

Heechul Yun

1

Welcome to EECS753

• About the course

2

About Instructor

• Heechul Yun– Assistant Prof., Dept. of EECS, University of Kansas (Aug.’13 ~ )

– Office: 3040 Eaton, 236 Nichols

– Email: heechul.yun@ku.edu

• Educations– Ph.D. (CS), University of Illinois at Urbana-Champaign

– M.S. (CS) and B.S (CS), KAIST

• Professional Experiences– Senior software engineer @ Samsung Electronics

• Research Areas– Operating systems, embedded/real-time systems

• More Information– http://ittc.ku.edu/~heechul

3

About This Class

• Topics– Embedded Real-Time Systems. Cyber Physical Systems

• Prerequisite– EECS 645 Computer Architecture.– EECS 678 Introduction to Operating Systems.

• No textbook!• Course website

– http://ittc.ku.edu/~heechul/courses/eecs753

• Audience– Grad students (senior undergraduate) who are interested in research

• Times– Lecture: M/W/F 10:00 – 11:00 LEA 1131– Office hour: M/F 11:00 - 11:50 @ 3040 Eaton

4

About This Class

• Seminar style– I and YOU will present (more on later)

• Goals– Learn and discuss advanced topics in real-time embedded

systems– Improve your research skills

• Research skills– Learning skills

• Quickly learn from papers, books, and the internet!

– Communication skills• Written form = paper, oral form = presentation

– Programming skills• Need to build some “interesting” things

5

Topics

• Introduction to Real-Time Systems, CPS

• CPS Applications: Intelligent Vehicles

• Real-time Hardware Architecture

• Real-time OS and Middleware

• Fault tolerance and Security

6

Amazon prime air

Methodology

• Learning by reading research papers

• Learning by building an actual system

7

Reading Papers

• You are expected to

– Read assigned papers (~two/week)

– Summarize them

• Reading paper well is an important skill

– A good reference: “How to Read a Paper”

8

Written Summary

• Each summary should include:

– Summary of main ideas

– What you liked

– What you disliked

• Submit via Blackboard

9

Written Summary: Example

• [Summary] This paper presents a kernel level page allocator which is DRAM Bank-Aware. This allocator is able to allocate pages across cores in a way that causes banks to be shared or partitioned depending on user configuration. This can be used to provide more predictable memory access to multicore software. The authors implemented their memory allocator in a recent version of the Linux Kernel and compared its performance with the existing buddy allocator.

• [The good] This paper is well written. The issue of DRAM banks was not familiar to me at the time of reading but was well explained which motivated the rest of the paper well. The algorithm used is quite straightforward and the explanation is easy to follow.

• [The bad] While the authors acknowledge that the approach they take bears similarity to multi-core page coloring[1,2,3,4] the novelty of their work is not well established. This work appears to be a relatively straightforward application of rudimentary page coloring techniques. The related work section touches on these similarities but does not establish any particular novelty aside from the fact that this paper is addressing the problem of shared DRAM banks for the sake of isolation and not shared caches.

10

Lecture Organization

• Typical week

– Mon: Lecture on the week’s topic

– Wed/Fri: Paper presentations

• Paper presentation– I will Introduce the paper

– I (or you) will present the paper

– We will discuss the paper

• You are required to present

– One (or two) paper per semester

– May change depending on the class size

11

Reading List

• Posted on the class website

– Subject to change

– Mostly recent papers and some classic ones

• Sign-up process

– Email me the paper in the list you want to present

– I will update the schedule on a First Come First Serve (FCFS) basis

12

Paper Presentation & Discussion

• Suggested structure (30min)– Motivation & Background

• Ask why the authors write this paper?

– Explain the main ideas • From your perspective. Careful about their assumptions

– Discussion topics• Questions: “I don’t understand XXX.” • Critiques: “This approach seems bad because …”

• Submission– Draft: by 5:00 p.m. the day before your presentation– Final version: before the class begins

13

Final Exam

• No midterm exam

• Early final exam on April 15

14

Homework & Mini Projects

• Plan

– Two homework assignments on Linux PC

– Two mini projects using a Raspberry Pi 3

• About

– Basic real-time scheduling

– Basic AI and self-driving car

15

• Use a Convolutional Neural Network (CNN) to drive a car.

• Trained with human driving data

• Could successfully drive a car on public roads w/o human

16

Source: https://devblogs.nvidia.com/deep-learning-self-driving-cars/DAVE-2 CNN: 9 layers, ~250K parameters, ~27M connections

NVIDIA DAVE-2 Self-Driving Car

Video: https://www.youtube.com/watch?v=NJU9ULQUwng

Term Project

• 2nd half of the semester

• DeepPicar Competition

– Build a self-driving car

– Based on DeepPicar

– Competition format

17

DeepPicar Competition

• Goal

– Safely drive autonomously on a given track

– Using camera and Deep Neural Network (DNN)

• Metrics

– Distance and time

• Your tasks

– Build a car (instruction, materials will be given)

– Develop/tune the AI (basic code will be given)

18

(Tentative) Project Schedule

• 3/18: Materials ready (build start)

• 4/22: Manual driving check

• 4/29: Autonomous driving check

• 5/06: Competition

• 5/15: Final report due

– 5 pages

– Must be written using Latex

19

Latex

• Everybody in CS uses it to write papers

– Final report must be prepared using Latex

• Overleaf

– https://www.overleaf.com

• Ubuntu

– Install texlive-full

• Window

– Install MikTex.

20

Grading

• Paper summaries (20%)

• Student presentations (10%)

• Final exam (30%)

• Mini project(5%)

• Homework (5%)

• Project (30%)– Competition: 20%

– Final report: 10%

21

Grading

• 90+ : A

• 80-89: B

• 70-79: C

• 50-69: D

• 0-49: F

22

Office Hours

• M/F 11:00 – 11:50 at 3040 Eaton

• By appoint at 236 Nichols

– heechul.yun@ku.com

23

Introduce Yourselves

• Name

• Status: grad/undergrad, year

• Relevant background

• Interests

– What do you want to learn in this class?

24

Today

• Course overview

25

Embedded Systems

• Computing systems designed for specific purpose.

• Embedded systems are everywhere

26

Today’s Car

• Quiz. How many embedded processors are in a car?

– A: ~100s

27

Simon Fürst, BMW, EMCC2015 Munich, adopted from OSPERT2015 keynote

Future Automotive Systems

28

A. Hamann. “Industrial challenges: Moving from classical to high performance real-time systems.” In International Workshop on Analysis Tools and Methodologies for Embedded and Real-time Systems (WATERS), July 2018

Trends

• More powerful and cheaper computing

• More connected

29

Internet of Things (IoT)

• IoT ~= Internet connected embedded systems

30

Cyber-Physical Systems (CPS)

• Cyber system (Computer) + Physical system (Plant)

• Still embedded systems, but integration of physical systemsis emphasized.

31

Real-Time Systems

• The correctness of the system depends on not only on the logical result of the computation but also on the time at which the results are produced

• A correct value at a wrong time is a fault.

• CPS are often real-time systems

– Because physical process depends on time

32

CPS Requirements

• Real-time performance– Meet deadlines in processing large amounts of real-time data

from various sensors (e.g., autonomous cars)– Many constraints: size, weight, and power (SWaP); cost

• Safety– Interact with the environment, human, in real-time– Can hurt humans, destroy things, blow up (e.g., Nuclear plants)– Need both logical and temporal (time) correctness

• Security– Communicate over the internet (cloud servers etc.)– Remote software update (fix bugs, …)– Run untrusted 3rd party software (e.g., Apple CarPlay)

33

Performance

• Many cyber-physical systems (CPS) need:

– More performance

– Less cost, size, weight, and power

34

CMU’s “Boss” Self-driving car, circa 200710 dual-processor blade servers on the trunk

Audi’s zFAS platform. 2016-2018A single-board computer with multiple CPUs, GPU, FPGA

Audi A8

Compute Performance Demand

35Intel, “Technology and Computing Requirements for Self-Driving Cars”

Real-Time Data

• from many sensors needs powerful computers

36

Source: http://on-demand.gputechconf.com/gtc/2015/presentation/S5870-Daniel-Lipinski.pdf

Size, Weight, and Power (SWaP) Constraints

• Maximum performance with minimal resources

– Cannot afford too many or too power hungry ECUs

37Figure source: OSPERT 2015 Keynote by Leibinger

Mobileye EveQ4

• Real-time vision processor w/ DNN

• 2.5 teraflops @ 3W

• 8 cameras @ 36 fps

• Tesla uses EveQ3

• 14 cores

– 4 MIPS cores

– 10 vector cores

38

Nvidia’s Drive PX2 Platform

• 12 CPU + 2 GPU

– 8 Tegraflops @250W

• Real-time processing of

– Up to 12 cameras, radar, ..

– Deep Neural Network (DNN) for detection, classification

39http://www.nvidia.com/object/drive-px.html

Safety Failures

40

• Computer controlled medical X-ray treatments

• Six people died/injured due to massive overdoses (1985-1987)

• Caused by synchronization mistakes

• 7 billion dollar rocket was destroyed after 40 secs (6/4/1996)

• “caused by the complete loss of guidance and altitude information ” Caused by 64bit floating to 16bit integer conversion

Therac 25 Arian 5

Air France 447 (2009)

• Airbus A330 crashed into the Atlantic Ocean in 2009

• Caused in part by computer’s misguidance– Pitot tube (speed sensor) failure Flight Director (FD) malfunction

(shows “head up”) pilots follow the faulty FD enter stall

41http://www.spiegel.de/international/world/experts-say-focus-on-manual-flying-skills-needed-after-air-france-crash-a-843421.htmlhttp://www.slate.com/blogs/the_eye/2015/06/25/air_france_flight_447_and_the_safety_paradox_of_airline_automation_on_99.html

Stall

Normal

Lion Air Flight 610 (2018)

• Boeing 737 crashed into the Java See in 2018

• Caused by stall prevention system (MCAS)

– sensor error (plane is “stall”) nose down (to the ocean)

42

Tesla Autopilot (2016)

43http://www.nytimes.com/interactive/2016/07/01/business/inside-tesla-accident.html

• Tesla autopilot failed to recognize a trailer resulting in a death of the driver

NHTSA Report

• Both the radar and camera sub-systems are designed for front-to-rear collision prediction mitigation or avoidance.

• The system requires agreement from both sensor systems to initiate automatic braking.

• The camera system uses Mobileye’s EyeQ3 processing chip which uses a large dataset of the rear images of vehicles to make its target classification decisions.

• Complex or unusual vehicle shapes may delay or prevent the system from classifying certain vehicles as targets/threats

44

https://static.nhtsa.gov/odi/inv/2016/INCLA-PE16007-7876.PDF

NHTSA Report

• Object classification algorithms in the Tesla and peer vehicles with AEB technologies are designed to avoid false positive brake activations.

• The Florida crash involved a target image (side of a tractor trailer) that would not be a “true” target in the EyeQ3 vision system dataset and

• The tractor trailer was not moving in the same longitudinal direction as the Tesla, which is the vehicle kinematic scenario the radar system is designed to detect

45

https://static.nhtsa.gov/odi/inv/2016/INCLA-PE16007-7876.PDF

Uber Self-Driving Car (2018)

46

• Kill a pedestrian crossing a road in Arizona

https://www.nytimes.com/2018/03/19/technology/uber-driverless-fatality.html

NTSB Report

• The system first registered radar and LIDAR observations of the pedestrian about 6 seconds before impact

• Software classified the pedestrian as an unknown object, as a vehicle, and then as a bicycle with varying expectations of future travel path.

• At 1.3 seconds before impact,the system determined that an emergency braking maneuver was needed

• Emergency braking maneuvers are not enabled while the vehicle is under computer control, to reduce the potentialfor erratic vehicle behavior

47https://www.ntsb.gov/investigations/AccidentReports/Reports/HWY18MH010-prelim.pdf

Failures in CPS have consequences

Security

• Interconnected CPS are open to attacks

• Examples– Stuxnet: Iranian nuclear power

plant hacking

– Vermont power grid hack by Russia

– Remote hack into cars (Jeep)

– Police drone hacking

– Sensor hacking: GPS spoofing. IMU spoofing

48

Challenges

• Time Predictability

• Complexity

• Reliability

• Security

49

Time Predictability

• At low-level, hardware is deterministic timing

• At higher-levels, not so much ignore timing– Pipeline, caches, Out-of-order execution,

speculation, ISA

– Process, thread, lock, interrupt

• Focus on average case, not worst-case. No guarantees– Fine in cyber world

– Real-world doesn’t work that way

50

Timing Predictability

• Q. Can you tell exactly how long a piece of code will take to execute on a computer?

– Used to be (relatively) easy to do so.

• Measure timing. Use the timing for analysis.

– Very difficult to answer in today’s computers

• Pipeline, cache, out-of-order and speculative execution, multicore, shared cache/dram very high variance.

51

Denial-of-Service Attack

• Delay execution time of time sensitive code– E.g., real-time control software of a car– Observed >21X execution time increase on Odroid XU4 (*)

• Even after cache partitioning is applied

– Observed >10X increase on RPi 3 (**)

• Of a realistic DNN-based real-time control program

52

LLC

Core1 Core2 Core3 Core4

bench co-runner(s)

(*) Prathap Kumar Valsan, Heechul Yun, Farzad Farshchi. “Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems.” In RTAS, IEEE, 2016. Best Paper Award(**) Michael Garrett Bechtel, Elise McEllhiney, Minje Kim, Heechul Yun. “DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car.” In RTCSA, IEEE, 2018

Denial-of-Service Attack

53[C] Michael Garrett Bechtel and Heechul Yun. Denial-of-Service Attacks on Shared Cache in Multicore: Analysis and Prevention. IEEE Intl. Conference on Real-Time and Embedded Technology and Applications Symposium (RTAS), IEEE, 2019. (to appear)

> 300X slowdown !!!

Complexity

• Software complexity increases

54

Lines of Code in Typical GM Car

1

10

100

1000

10000

100000

1970 1990 2010

Model Year

KL

OC

Figures are from NASA JPL. “Flight Software Complexity,” 2008

Growth in Software Size

0

200

400

600

800

1000

1200

1400

Apollo 1968 Space Shuttle Orion (est.)

Flight Vehicle

K S

LO

C

Linux Kernel Code Size

• Linux: > 15M SLOC, multithreaded

Software bugs are hard to weed out

55

https://www.quora.com/How-many-lines-of-code-are-in-the-Linux-kernel

Reliability

• Transient hardware faults (soft errors)

– Single event upset (SEU) in SRAM, logic• Due to alpha particle, cosmic radiation

– Manifested as software failures• Crashes, wrong output: silent data corruption

– Bigger problem in advanced CPU• Increased density, freq higher soft error

• Hardware bugs

– Pentium floating point bug (FDIV bug)

– Intel CPU bugs in 2015: http://danluu.com/cpu-bugs/• “Certain Combinations of AVX Instructions May Cause Unpredictable System Behavior”

• “Processor May Experience a Spurious LLC-Related Machine Check During Periods of High Activity”

56

http://www.cotsjournalonline.com/articles/view/102279

Security

57https://meltdownattack.com/

Micro-Architectural Side-Channels

• Many micro-architectural components contain hidden state which leaks secret– often via observable timing variations

• Known to exist in cache, DRAM bank, OoO speculation, branch predictor, etc.

• Logically correct, proven software is also vulnerable

58

Example: Spectre Attack

• Wrong branch is speculatively taken.

• x is maliciously chosen by the attacker.

• The attacker probes arrary2 to recover secret: array1[x]

59

(Cache) Timing Channel Attack

• By measuring access timing differences of a memory location, an attacker can determine whether the memory is cached or not.

• This can be used to leak secret information

• Methods: Flush + Reload, Prime + Probe, etc.

60Image source: M. Lipp et al., “Meltdown,” arXiv Prepr., 2018.

CPS: Related Areas

• CPS requires inter disciplinary approach

– EECS

• Computer architecture

• Real-time systems

• Formal method

• Software engineering

• Control

– Aerospace, and other engineering

• Physical systems (plant/actuator) modeling/control

61

Topics

• Introduction to Real-Time Systems, CPS

• CPS Applications

• Real-time multicore architecture

• Real-time OS and middleware

• Fault tolerance, safety, security

62Amazon prime air

Topics

• Introduction to Real-Time Systems, CPS

– Background on Real-time scheduling theory, timing analysis, server, priority inversion

• CPS Applications

• Real-time architecture

• Real-time OS and middleware

• Fault tolerance, safety, security

63

Topics

• Introduction to Real-Time Systems, CPS

• CPS Applications

– More detailed look at individual CPS applications

– Intelligent vehicle development techniques

• Real-time architecture

• Real-time OS and middleware

• Fault tolerance, safety, security

64

Topics

• Introduction to Real-Time Systems, CPS

• CPS Applications

• Real-time architecture

– Real-time cache, DRAM controller designs

– Predictable microarchitecture designs

– Real-time support for GPU/FPGA

• Real-time OS and middleware

• Fault tolerance, safety, security

65

Topics

• Introduction to Real-Time Systems, CPS

• CPS Applications

• Real-time architecture

• Real-time OS and middleware

– RTOS, ARINC 653, AUTOSAR, ROS, DDS

• Fault tolerance, safety, security

66

Topics

• Introduction to Real-Time Systems, CPS

• CPS Applications

• Real-time architecture

• Real-time OS and middleware

• Fault tolerance, safety, security

– CPS specific security issues, case studies

– Simplex architecture,

– CPS modeling and verification

67

APPENDIX

68

69

Links

• Linux kernel related

– PALLOC

– MemGuard

• Self-driving car related

– DeepTraffichttps://selfdrivingcars.mit.edu/deeptraffic/

– DeepTeslahttps://selfdrivingcars.mit.edu/deeptesla/

70

Embedded Systems

• More embedded systems than PC/servers

– 10 billion chips in 2013 by ARM

71http://jbpress.ismedia.jp/articles/-/36814