CSE 466 – Fall 2000 - Introduction - 1 Physical Layer API (same as Lab) enq(), deq(), scan() ...

CSE 466 – Fall 2000 - Introduction - 1

Physical Layer API (same as Lab)

enq(), deq(), scan()

Periodically if not currently master select non-empty outgoing

queue send start condition (capture bus)

On Interrupt if bus master and not end of frame send next

byte, otherwise send stop condition if bus slave

get byte and process header information, or place into appropriate incoming queue based on previous header information.

Deal w/ flow control somehow (control frames for acknowledgement)

Deal w/ Security

dst

port

Timer Interrupt andExternal Interrupt Routines

Function Calls


Debugging

A Complete Development Environment

Networking

FileI/O

DeviceI/O

Kernel

KernelTask SchedulingMemory ManagementResource ProtectionInterrupt HandlingTask-message PassingTask event management

File I/OFile Creation and DeletionFile I/ODirectory Structures and Navigation

Device I/ODevice Drivers

NetworkingProtocols and device driver support fordata communication. Task to Task message passing over a network.

DebuggingSimulationMonitor based DebuggingHardware Emulator


Debugging

Simulation you’ve seen the limits of that approach.

In-System Debug (w/ a Monitor) Ability to download code and upload state

information (RAM, Regs) Breakpoints

replace instruction w/ SWI or JSR SWI waits for user commands On “continue” monitor executes the

trapped instruction and returns control to the app

Not good for final test unless monitor is part of shipped system

Requires system resources: interrupts, RAM, ROM.

HW Emulation A circuit board w/ a socket connector that

acts just like real processor. Allows debugging without a monitor

Monitor Program

Debugging HostserialinterfaceR/W code

and dataspace

SW basedsingle stepping

Single Stepping in the 8051ISR for External Interrupt 0JNB P3.2,$ ;Wait Here Till INT0 Goes HighJB P3.2,$ ;Now Wait Here Till it Goes LowRETI ;Go Back and Execute One Instruction


Projects/Reports

Option 1: A final lab assignment … to be negotiated. Some improvement on music player (I think we can do 20K music at 11 MHz

anyway). Maybe Streaming from NT through one player to another player over I2C. No upload needed.

Chat A PCB for the Music Player w/I2C and Serial ports

Option 2: A report/presentation design approaches for some existing system

fuzzy logic controller design automotive apps and networking (ECU, Airbag, CAN) Embedded TCP (IP?/I2C?) Embedded web-servers Robot architectures Convergence devices Object technology for 8-bit microcontrollers

hw that we didn’t cover motors watchdog circuits DAC, ADC. audio codecs display…vga/lcd radios Memory devices and interfaces


Safety

Examples

Terms and Concepts

Safety Architectures

Safe Design Process

Software Specific Stuff

Sources Hard Time by Bruce Powell Douglass, which references Safeware by

Nancy Leveson


What is a Safe System?

BrakePedal

PedalSensor

Processor Bus

Brakew/ local

controller

Enginew/ local

controllerIs it safe?

What does “safe” mean?

How can we make it safe?

Addelectronic watch dogbetween brake and bus

Add mechanical linkagefrom separate brake pedaldirectly to brake

Add a third mechanical linkage….


Reliability of component i can be expressed as the probability that component i is still functioning at some time t.

Is system reliability Ps (t) = Pi(t) ?

Assuming that all components have the same component reliability, Is a system w/ fewer components always more reliable ?

Does component failure system failure ?

burnin

period

Terms and Concepts

time

Pi(t) =Probabilityof beingoperationalat time t

Low failure rate means nearly constant probability1/(failure rate) = MTBF


A Safety System

A system is safe if it’s deployment involves assuming an acceptable amount of risk…acceptable to whom?

Risk factors Probability of something bad happing Consequences of something bad happening (Severity)

Example Airplane Travel – high severity, low probability Electric shock from battery powered devices – hi probability, low severity

severity

probability

danger zone(we don’t all have the same risk tolerance!)

airplane autopilot

mp3 player

PC

safezone


More Precise Terminology

Accident or Mishap: (unintended) Damage to property or harm to persons. Economic impact of failure to meet warranted performance is outside of the scope of safety.

Hazard: A state of the the system that will inevitably lead to an accident or mishap Release of Energy Release of Toxins Interference with life support functions Supplying misleading information to safety personnel or control systems.

This is the desktop PC nightmare scenario. Bad information Failure to alarm when hazardous conditions exist


Faults

A fault is an “unsatisfactory system condition or state”. A fault is not necessarily a hazard. In fact, assessments of safety are based on the notion of fault tolerance.

Systemic faults Design Errors (includes process errors such as failure to test or failure to

apply a safety design process) Faults due to software bugs are systemic Security breech

Random Faults Random events that can cause permanent or temporary damage to the

system. Includes EMI and radiation, component failure, power supply problems, wear and tear.


Component v. System

Reliability is a component issue

Safety and Availability are system issues

A system can be safe even if it is unreliable!

If a system has lots of redundancy the likelihood of a component failure (a fault) increases, but so may increase the safety and availability of that system.

Safety and Availability are different and sometimes at odds. Safety may require the shutdown of a system that may still be able to perform its function. A backup system that can fully operate a nuclear power plant might

always shut it down in the event of failure of the primary system. The plant could remain available, but it is unsafe to continue operation


Single Fault Tolerance (for safety)

The existence of any single fault does not result in a hazard

Single fault tolerant systems are generally considered to be safe, but more stringent requirements may apply to high risk cases…airplanes, power plants, etc.

BackupH2 ValveControl

MainH2 ValveControl

watchdogprotocol

If the handshakefails, then either oneor both can shut off the gassupply. Is this a single faulttolerant system?


Is This?


MainH2 ValveControl

watchdoghandshake

commonmodefailures


Now Safe?


MainH2 ValveControl

watchdoghandshake

•Separate Clock Source•Power Fail-Safe (non-latching) Valves

What about power spike that confuses both processors at the same time? Maybe the watchdog can’t be software based.

Does it ever end?


Time is a Factor

The TUV Fault Assessment Flow Chart T0: Fault tolerance time of the first failure T1: Time after which a second fault is likely Captures time, and the notion of “latent faults”

T0 – tolerance time for first fault

T1 – Time after which a second fault is likely Based on MTBF data

Safety requires that Ttest<T0<T1

1st Fault

hazard afterT0?

SystemUnsafe

yes

no

2nd

FaultSystem

Safe

FaultDetectedbefore T1?

yes

no

yes

no

hazard?


Latent Faults

Any fault this is not detectable by the system during operation has a probability of 1 – doesn’t count in single fault tolerance assessment

Detection might not mean diagnosis. If system can detect secondary affect of device failure before a hazard arises, then this could be considered safe


MainH2 ValveControl

watchdoghandshake

stuck valves couldbe latent if the controllers cannotcheck their state.

May as well assume thatthey are stuck!


Fail-Safe Design (just an example)

On reset processor checks status. If bad, enter “safe mode” power off reduced/altered functionality alarm restart

Safe mode is application dependent

Processor Watchdogprotocol

protocolfailure

reset

status


Safety Architectures

Self Checking (Single Channel Protected Design)

Redundancy

Diversity or Heterogeneity

BrakePedal

PedalSensor

ComputerComputer

Bus

Brake

EngineControl

watchdogprotocol

parity/crcPeriodic internalCRC/Checksumcomputation(code/data corruption)


Single Channel Protection

Self Checking perform periodic checksums on code and data How long does this take? Is T0<T1?

Feasibility of Self Checking Fault Tolerance Time Speed of the processor Amount of ROM/RAM Redundant Memory Subsystems: special hw Recurring cost v. Development cost tradeoff

Computer(code

corruption)

ComputerBus

Brake

EngineControl

parity/crc on the bus


Redundancy

Homogeneous Redundancy Low development cost…just duplicate High recurring cost No protection against systemic faults

Computer(code

corruption)

Brake

EngineControl

Computer

ComputerVoting

Bus

could be implemented similar to collisiondetection


Multi-Channel Protection

Heterogeneous Redundancy Protects against random and

some systemic faults. Best if implementation teams are kept

separated

Space shuttle: five computers, 4 same 1 different

Proc/SW1

Brake

EngineControlProc/SW

2

VotingBus


Design for Safety

1. Hazard Identification and Fault Tree Analysis

2. Risk Assessment

3. Define Safety Measures

4. Create Safe Requirements

5. Implement Safety

6. Assure Safety Process

7. Test,Test,Test,Test,Test


1. Hazard Identification – Ventilator Example

Hazard Severity Tolerance Time

Fault Example

Likelihood Detection Time

Mechanism Exposure Time

Hypo-ventilation

Severe 5 min. Vent Fails Rare 30sec Indep. pressure sensor w/ alarm

40sec

Esophageal intubation

Medium 30sec C02 sensor alarm

40sec

User mis-attaches breathing hoses

never N/A Different mechanical fittings for intake and exhaust

N/A

Over-pressurization

Sever 0.05sec Release valve failure

Rare 0.01sec Secondary valve opens

0.01sec

Humanin Loop


FMEA Example – notably missing time factor!System Fault Hazard Consequence Design Considerations

1 Secondary main hydrogen solenoid(non-latching)

(A) Hydrogen leaks(B) Stuck open(C) Stuck closed

(A) Fire(B) None(C) None

(A) Vents into chassis(B) None(C) System doesn't come online

(A) Added anaerobic sealant Chassis venting(B) Primary solenoid backup(C) None

2 Primary main hydrogen solenoid(latching)

(A) Hydrogen leaks(B) Stuck open(C) Stuck closed

(A) Fire(B) None(C) None

(A) Vents into chasis(B) None(C) System doesn't come online

(A) Chassis venting(B) Secondary solenoid backup(C) None

3 MEA breach Hydrogen leaks Fire Hydrogen enters airstream Low voltage detection close solenoidExtended toray paper

4 Bleed hose (inside main electronics bay) Hydrogen leaks Fire Vents into chassis Hydrogen sensorsChassis venting

5 Bleed hose (outside main electronics bay) Hydrogen leaks Fire Vents into chassis None

6 Bleed solenoid (A) Stuck open(B) Stuck closed(C) Hydrogen leaks

(A) None(B) None(C) Fire

(A) Continual bleed of hydrogen(B) Longterm system degradation(C) Vents into chassis

(A) None(B) None(C) Chassis venting

7 Stepper Motor / Baffles (A) Too open(B) Too closed

(A) None(B) None

(A) Loss of efficiency(B) Loss of efficiency

(A) Home/Not home detection Temp monitoring8 Home sensor Stuck on/off None No change in state of baffles detected Software

9 Air filterExhaust / Inlet Blockage

Gradual cloggingNo or reduced cooling air

None Loss of efficiencyOverheating

Thermistors

10 Hydrogen sensor (A) High impedance(B) Low impedance(C) Poisoned(D) Heater failure(E) Disconnection

(A) None(B) None(C) None(D) None(E) None

(A) None(B) None(C) None(D) None(E) None

Valid high and low impedance levels


Fault Tree Analysis – working backwards

Pacemaker Example

single fault hazard


FMEA – Working Forward

Failure Mode: how a device can fail Battery: never voltage spike, only low voltage Valve: Stuck open? Stuck Closed? Motor Controller: Stuck fast, stuck slow? Hydrogen sensor: Will it be latent or mimic the presence of hydrogen?

FMEA For each mode of each device perform hazard analysis as in the

previous flow chart Huge search space


2. Risk Assessment

Determine how risky your system is S: Extent of DamageSlight injurySingle DeathSeveral DeathsCatastrophe

E: Exposure Timeinfrquentcontinuous

G: PrevenabilityPossibleImpossible

W: Probabilitylowmediumhigh

1

2

3

4

5

6

7

8

3

4

5

7

6

-

1

2

2

3

4

6

5

-

-

1

W3 W2 W1

S1

S3

S2

G2

G1

G2

G1

S4

E2

E1

E2

E1


Example Risk Assessment

Device Hazard Extent of Damage

Exposure Time

Hazard Prevention

Probability

TUV Risk Level

Microwave Oven

Irradiation S2 E2 G2 W3 5

Pacemaker Pacing too slowly

Pacing too fast

S2 E2 G2 W3 5

Power station burner control

Explosion S3 E1 -- W3 6

Airliner Crash S4 E2 G2 W2 8


3. Define the Safety Measures

Obviation: Make it physically impossible (mechanical hookups, etc).

Education: Educate users to prevent misuse or dangerous use.

Alarming: Inform the users/operators or higher level automatic monitors of hazardous conditions

Interlocks: Take steps to eliminate the hazard when conditions exist (shut off power, fuel supply, explode, etc.

Restrict Access. High voltage sources should be in compartments that require tools to access, w/ proper labels.

Labeling

Consider Tolerance time Supervision of the system: constant, occasional, unattended. Airport

People movers have to be design to a much higher level of safety than attended trains even if they both have fully automated control


4. Create Safe Requirements: Specifications

Document the safety functionality eg. The system shall NOT pass more than 10mA through the ECG lead. Typically the use of NOT implies a much more general requirement

about functionality…in ALL CASES

Create Safe Designs Start w/ a safe architecture Keep hazard/risk analysis up to date. Search for common mode failures Assign responsibility for safe design…hire a safety engineer. Design systems that check for latent faults

Use safe design practices…this is very domain specific, we will talk about software


5. Implement Safety – Safe Software

Language Features

Compile Time Checking

Run Time Checking

Exception Handling

Re-use

Objects

Operating Systems

Protocols

Testing

Regression Testing

Exception Testing (Fault Seeding)


Language Features

Compile Time Checking: Lint like tools No overhead at run time (code size, execution time)Pascal, example of a strongly typed language

Program WontCompile1;type

MySubRange = 10 .. 20; Day = {Mo, Tu, We, Th, Fr, Sa, Su};var MyVar: MySubRange; MyDate: Day;begin

MyVar := 9; {will not compile – range error}MyDate := 0; {will not compile – wrong type)

But, compile time checking can’t catch range errors


Error Checking in Software


Testing

CSE 466 – Fall 2000 - Introduction - 1 Physical Layer API (same as Lab) enq(), deq(), scan() ...

Documents

Transcript of CSE 466 – Fall 2000 - Introduction - 1 Physical Layer API (same as Lab) enq(), deq(), scan() ...