Self-* Networks of Unmanned Vehicles CSE 597c, Fall 2006 Introduction to Self-* Systems Sept. 14,...

Self-* Networks of Unmanned Vehicles

CSE 597c, Fall 2006

Introduction to Self-* SystemsSept. 14, 2006

Bhuvan Urgaonkar

Definition

Self-* Systems A regular expression

Self-tuning, self-configuring, self-healing, self-stabilizing, …

Autonomic computing [IBM] Inspired by the autonomous central

nervous system in a living organism In humans and other vertebrates, the part of the nervous

system that regulates the involuntary activity of the heart, intestines and glands.

Some History What do you think the first Self-* system was?

Wind/water mill? Emergence of (semi-) autonomous systems

starting with the industrial revolution Steam engine, printing press, car, … Could carry out certain tasks without human intervention Development of feedback-control theory, signal processing

Thermostat (Albert Butz of the Thermo-Electric Regulator Co., Minneapolis, 1885)

Cruise control

Early 20th century onwards Major advances in engineering & emergence of computing Now you could program a mechanical/electrical/… system

More complex autonomous systems

More History

Artificial Intelligence Make a machine/computer do what a (smart/able)

human can do Learn like a human does Sometimes easy, very often not!

Turing Test A computer that can pose as a human passes the Turing

test A definition of Self-* ness?

Would imitating human behavior alone be enough?

Complexity of Modern Systems

Computer systems grew in complexity Others as well, but lets talk about CS

NYTimes: All science is computer science Complex h/w, s/w, Distributed systems, Heterogeneity, …

Can’t be managed by housewives who are given a manual – WW II !!

IBM’s DB2 database server has about 80 parameters! Modern systems operate in highly dynamic conditions Human-intervention based operation often infeasible

Error-prone Slow Expensive …

Operating Environments that Prohibit Human Participation

Robots or machines operating in mines, under oceans, volcanic areas, … Must “take care” of themselves

Defining Self-* ness The Turing test doesn’t quite capture

Self-* ness Sometimes we want better than what

even the smartest/fastest human can do! Not quite the same as the original AI goal And not a superset of it Some intersection, but also some

orthogonal requirements

Outline Motivation and history Examples Self-* networks/distributed systems Relevant areas/useful techniques Summary

Example 1:General-purpose Operating Systems

CPU scheduling and memory management First computers did batch processing of jobs A human would schedule the jobs

Multi-programming came up Dynamically changing set of processes Interleaving of computation and I/O Response time sensitive processes such as editors The CPU scheduler had to adapt to these dynamics

Self-tuning behavior was desired Same for memory manager

Self-tuning Systems

External environmentincluding inputs

System output(e.g., performance)

Feedback

System components

Keep output within desired bounds even when the external environment is changing

Example 2:Mission-critical Operating Systems

OSes running on space-crafts System had to discover errors and recover on its own

Self-healing systems Initial/simple solutions: High degree of redundancy

Introduce redundancy to deal with failures Implement mechanisms to quickly discover failures

OK for a space-craft, but not for a more “down-to-earth” system Could be very expensive How can a system self-heal without excessive redundancy?

Later: Software became very complex S/w failures far more serious problem than h/w failures!

Software engineering, programming languages

Self-healing Systems

Keep output within reasonable bounds even when internal components fail

What’s different from a self-tuning system? Failures are internal events; changes in operating

environment are external events Note: Failures might be induced by external events

System output(e.g., performance)

Feedback

External environmentincluding inputs

ComponentFailure

Self-Stabilization

Green=good, Blue=bad Guaranteed to return to a good state, eventually, on its

own Related to fault tolerance

How?

Classification of Self-* Systems

Self-tuning Performance

Self-healing Failure handling

Self-stabilizing Convergence

Is this a good classification? Note: Not necessarily a non-intersecting

classification

Defining Self-* ness (contd.)

First define for each member of our classification

Quantifying Self-tunability How good is the system at meeting performance targets under

dynamic operating conditions? E.g., Can the system ensure response time degradation is always at

most proportional to increase in request arrival? Note: The system can change its internal state (e.g., increase its capacity

dynamically) to achieve its goal

Quantifying the Goodness of a Self-healing System

How good is the system at maintaining functionality under failures? E.g. 1, Can the system continue functioning even after N failures? E.g. 2, Can the system continue to offer the same response time even

after N failures?

Quantifying the Goodness of a Self-stabilizing System

How long does it take the system to return to a good state after a perturbation?

Defining Self-* ness (contd.)

One approch: Define a vector whose individual elements characterize self-tunability, goodness of self-healing, and self-stabilization

E.g., <ST=excellent, SH=poor, SS=good> Conflicting goals!

E.g., maintaining performance might require fewer components; dealing with failures might require redundancy

Need to understand what is more important Context dependent

Relative importance of various self-* properties vary across systems

Distributed Systems How do things change? Cons: Problems associated with a distributed system

Data consistency Larger communication delays Heterogeneity More failures, more kinds of failures …

Pros: More sources of redundancy might mean better self-

healing More resources might mean more options to self-tune Any more?

Example 3:Networking: TCP/IP

Simple AIMD based congestion control De-centralized, only at end-points Has worked pretty well!

Scaled to current Internet I consider TCP a good Self-tuning protocol

What about link failures and how IP handles them?

Example 4:Enterprise/Utility Computing

Varying workloads, complex applications Human management infeasible, error-prone

How to manage resources to maximize revenue while meeting client requirements

Example 5:Search Engine: Google

Web content highly dynamic Self-tuning:

How good is the search engine at keeping up with changes in Web content?

Self-healing: Thousands of servers and disks in their data

center, failures every few hours! Does google.com keep working despite these

failures? How much human intervention does this need?

Relevant areas/useful techniques

Multi-criteria Optimization Techniques (economics) Analytical modeling (e.g., to infer resource needs of an app) Measurement techniques Feedback-control theory (reactive) Statistical techniques for prediction, learning (reactive+proactive) Biological, ecological, social networks

How do termites with pinhead-sized brains build air-conditioned colonies?

Theoretical CS: online algorithms, approximation algorithms Distributed computing Systems issues

Efficient & bug-free software, prototyping, simulation, experiment design)

Summary: Key Principles Keep is simple, silly!

Occam’s razor E.g., Partial automation vs complete automation

Understand and define system goals clearly Which Self-* properties are essential, which are not?

Understand system properties, operating environments

One size may not fit all Measurements Prediction, classification, learning, feed-back control

Design for agility (assuming online operation) Efficient algorithms & systems mechanisms

Self-* Networks of Unmanned Vehicles CSE 597c, Fall 2006 Introduction to Self-* Systems Sept. 14,...

Documents

Transcript of Self-* Networks of Unmanned Vehicles CSE 597c, Fall 2006 Introduction to Self-* Systems Sept. 14,...