Smart Home Technologies Decision Making. Motivation Intelligent Environments are aimed at improving...

Smart Home Technologies

Decision Making

Motivation Intelligent Environments are aimed at

improving the inhabitants’ experience and task performance Provide appropriate information Automate functions in the home

Prediction techniques can only determine what would happen next, not what should happen next. Automated functions can be different from

inhabitant actions Computer has to determine actions that

would optimize inhabitant experience

Decision Making Decision Making attempts to determine

the actions the system should take in the current situation Should a function be automated ? What should be done next ?

Decisions should be based on the current context and the requirements of the inhabitants Just programmed timers for automation are

not sufficient Decision maker has to take into account the

stream of data

Decision Making in Intelligent Environments Example Decision Making Tasks in

Intelligent Environments: Automation of physical devices

Turn on lights Regulate heating and air conditioning Control media devices Automate lawn sprinklers Automate robotic components (vacuum cleaner, etc)

Control of information devices Provide recipe services in the kitchen Construct shopping lists Decide which types of alarms to display (and where)

Decision Making inIntelligent Environments Objectives of decision making:

Optimize inhabitant productivity Minimize operating costs Maximize inhabitant comfort

Decision making process has to be safe Decisions made can never endanger

inhabitants or cause damage Decisions should be within the range

accepted by the inhabitants

Example Task Should a light be turned on ?

Decision Factors: Inhabitant’s location (current and future) Inhabitant’s task Inhabitant’s preferences Time of the day Other inhabitants Energy efficiency Security

Possible Decisions Turn on Do not automate

Decision Making Approaches Pre-programmed decisions

Timer-based automation Reactive decision making systems

Decisions are based on condition-action rules

Decisions are driven by the available facts Goal-based decision making systems

Decisions are made in order to achieve a particular outcome

Utility-based decision making systems Decisions are made in order to maximize a

given performance measure

Reactive Decision Making

Goal-Based Decision Making

Utility-Based Decision Making

Qualities of a Decision Making

Ideal Complete: always makes a decision

Correct: decision is always right

Natural: knowledge easily expressed

Efficient

Rational Decisions made to maximize performance

Decision-Making Techniques Reactive Decision Making

Rule-based expert system Goal-Based Decision Making

Planning Decision theoretic Decision Making

Belief Networks Markov decision process

Learning Techniques Neural Networks Reinforcement Learning

Rule-Based Decision Making Decisions are made based on rules and

facts Facts represent the state of the environment

Represented as first-order predicate logic Condition-Action rules represent heuristic

knowledge about what to do Rules represent implications that imply actions

from logic sentences about facts Inference mechanism

Deduction: {A, A B} B The left hand side of rules are matched against the

set of facts Rules where the left hand side matches are active

Rule-Based Inference Rules define what actions should be

executed for a given set of conditions (facts) Actions can either be external actions

(“automation”) or internal updates of the set of facts (“state update”)

Rules are often heuristics provided by an expert Multiple rules can be active at any given

time Conflict resolution to decide which rule to fire Scheduling of active rules to perform sequence

of actions

Example Facts:

CurrentTime = 6:30 Location(CurrentTime,bedroom) CurrentDay = Monday

Rules: Internal actions:

(CurrentDay=Monday)^(CurrentTime>6:00)

^(CurrentTime<7:00)^(Location(CurrentTime,bedroom)) ->Set(Location(NextTime,bathroom))

External actions:(Location(NextTime,X)) -> Action(TurnOnLight,X)

Rule-Based Expert Systems Intended to simulate (and automate)

human reasoning process Domain is modeled in first-order logic

State is represented by a set of facts Internal rules model behavior of the environment

Experts provide sets of heuristic condition-action rules

Rules with internal actions can model reasoning process

Rules with external actions indicate decisions the expert would make

The system can optionally be provided with queries by including them in the facts set.

Internal Rules Internal rules have to model the behavior

of the system Persistence over time

E.g.: (Location(CurrentTime,X))^(NoMove(CurrentTime)) -> Set(Location(NextTime,X))

Dynamic behavior of devicesE.g.: (Temperature(CurrentTime,X))^(HeatingOn)

-> Set(Temperature(NextTime,X+2)) Behavior of the inhabitants

E.g.: (Location(CurrentTime,bedroom)) ^(CurrentTime>23:00) ^(LightOn(CurrentTime, bedroom))

-> Action(TurnOffLight, bedroom)

Rule-Based Expert Systems

WORKINGMEMORY

(Facts)

RULE BASE

EXECUTIONENGINE

INFERENCEENGINE

PATTERNMATCHER

AGENDA

Rule-Based Expert System Architecture

Logic Inference Systems and Expert System Shells Logic programming systems provide

inference capabilities. Examples:

Prolog OTTER

Expert system shells provide the infrastructure to build complete expert systems Examples:

CLIPS (for C) JESS (for Java)

Example System: IRoom [Kul02]

Initial versions of the MIT IRoom project used JESS as an inference engine to make decisions about activating devices For example:

If a person enters the room and the room is empty then turn on the light

Rules are programmed by the system designer before the room is used and then refined based on experience

[Kul02] Ajay Kulkarni. Design Principles of a Reactive Behavioral System for the Intelligent Room.. 2002.

Rule-Based Decision Making Characteristics

Complete and correct (given complete rules) Natural (given expert specified rules)

Advantages Permits the system to be programmed

relatively efficiently by an expert Can address relatively complex systems

Problems Quality of the rules is essential Behavior of the environment mimics the expert Anticipating all possible contexts is difficult

Planning Decisions A planning system searches for a sequence

of actions which can achieve a defined goal. States can be represented as logic sequences Actions are defined as operators (symbolic

representations of the effect and conditions of actions) which contain:

Preconditions of actions Effects of actions

A goal is a set of states Planning system uses constraints to

efficiently search for a sequence of operators that lead from the start state to a goal state.

Example Initial State : (Location(bedroom))^(Light(bathroom,off)) Goal: Happy(Inhabitant) Action 1: MakeHappy

Precondition: (Location(X))^(Light(X,on))Effect: Add: Happy(Inhabitant)

Action 2: TurnOnLight(X)Precondition: Light(X,off)Effect: Delete: Light(X,off), Add: Light(X,on)

Action 3: Move(X, Y)Precondition: (Location(X))^(Light(Y,on))Effect: Delete: Location(X), Add: Location(Y)

Plan: Action 2, Action 3, Action 1

ExampleStart

Location(bedroom) Light(bathroom,off)

Finish

Happy(Inhabitant)

MoveToLocation(bedroom)

Location(bathroom)

Light(bathroom,on)TurnOnLight

Light(bathroom,off)

Light(bathroom,on)

MakeHappyLocation(bathroom)

Happy(Inhabitant)

Light(bathroom,on)

Example Planning Systems Partial Order Planners

Derive plans without requiring to find actions in sequence

SNLP (Univ. of Washington) GraphPlan (CMU)

Builds and prunes graph of possible plans

Conditional Planners Derive plans under uncertainty by

constructing plans that work under given conditions

UCPOP (Univ. of Washington) Partial Order Planner with Universal quanitification

and Conditional effects CPOP Sensory GraphPlan (CMU)

Planning Decisions Characteristics

Complete and correct (given complete rules) Relatively natural formulation

Advantages Permits a sequence of actions to be found that

performs a given task Goals can be defined easily

Problems Requires complete description of the system Uncertainty is difficult to handle Planning is generally very complex

Decision Theory Decision theory addresses rational decision

making under uncertainty Uncertainty is represented using probabilities

Uncertainty due to incomplete observability Uncertainty due to nondeterministic action outcomes Uncertainty due to nondeterministic system behavior

Utility theory is used to achieve rational decisions Utility is a measure of the expected “value” of a given

situation or decision Rational decisions are the ones that yield the highest

expected utility in the current situation

Modeling Uncertainty The current situation can be represented as a

Belief state, i.e. as a probability distribution over the states indicating the likelihood that any given state xi is the current state{(x1, P(x1)), (x2, P(x2)),…, (xn, P(xn))} The probability of a state can be expressed as the

probability of all state attributes P(x)=P(a1,a2,…,an) Uncertainties from incomplete observability

and nondeterminism can be modeled as conditional probabilities State transition model: Observation model: P(o | x) ),|( 1 dxxP t

jti

Bayes Rule Bayes rule permits to invert cause and

effect when calculating probabilities

It is often easier to estimate P(e | c) Probability of a state given a set of

sensor readings, P(x | o) , can be calculated knowing the observation probabilities P(o | x)

)(

)()|()|(

eP

cPcePecP

Utility Theory Utilities U(A) represent the “value” of a

given situation or decision A and model preferences The utility function for a particular system is

not unique Only relative differences between utility

values are important U(A) > U(B) A preferred to B U(A) = U(B) agent indifferent to A and B

Utilities for uncertain situations can be calculated as the expected value of the utility of all possibilities

U({(x1,P(x1)),…,(xn,P(xn))) = i P(xi)* U(xi)

Rational Decisions The rational decision is the one that leads to

the highest utility

Rational decisions in Decision theory requires Complete causal model of the environment

P(xi | xj, d) Complete knowledge of the observation (sensor)

modelP(o | xi)

Knowledge of the Utility function for all statesU(xi)

)(),,,|(maxarg )()0(i

t

ii

bxUbooxPd

Markov Decision Processes Markov Decision Processes (MDPs) form a

probabilistic model of all possible system behavior MDPs can be described by a tuple <S, A, T, R>

representing states, actions, transition probabilities, and reinforcements.

System has to obey the Markov assumptionP(xt+1|xt, dt, xt-1, dt-1, …, x0) = P(xt+1 | xt, dt)

Reinforcement represents the instantaneous change in utility obtained in a given state

Models costs and payoffs Are generally sparse and delayed

Utility Function for MDPs In an MDP, the utility of a state under a given

policy can be defined as the expected sum of discounted reinforcements

The optimal utility function U* can be computed using Value iteration

Optimal policy (decision strategy) can be extracted from the utility function

)()(

t

tt xRxU

jxj

tij

dii

t xUdxxPxRxU )(),|(max)()(1

jx

jijid

i xUdxxPxRx ))(),|()((maxarg)( **

MDP Example S = {(1,1), (1,2), … (4, 3)} A = {,,,} T: P(intended direction) = 0.8, P(right angle to intended) = 0.1 R: +1 at goal, -1 at trap, 0.04 in all other states = 1

MDP Example

Optimal PolicyOptimal Utilities

Markov Decision Processes Characteristics

Complete and Correct Advantages

Takes into account transition uncertainty Makes optimal decisions Automatically calculates the utility function

Problems Requires complete probabilistic description

of the system Requires complete observability of the state

Partially Observable MDPs Partially Observable Markov Decision

Processes (POMDPs) extend MDP by permitting states to be only partially observable. Systems can be represented by a tuple

<S, A, T, R, O, V> where <S, A, T, R> is an MDP and O, V are mapping observations about the state to probabilities of a given state

O = {oi} is the set of observations V: V(x, o) = P(o | x)

To determine an optimal policy, an optimal utility function for the belief states has to be computed

POMDPs Characteristics

Complete and Correct Advantages

Takes into account all uncertainty Makes optimal decisions

Problems Requires complete probabilistic description of

the system Optimal solution is so far intractable (dynamic

decision networks and approximation techniques exist and work for small state spaces)

Learning Decisions Learning techniques permit decisions to be

learned from past experience and feedback from the inhabitants or the environment. Supervised learning

Requires the desired decision to be specified during training

Reinforcement learning Learns by experimentation from scalar reward

feedback Inhabitant feedback (e.g. device interactions) Explicit environment feedback (e.g. energy consumption) Implicit feedback (e.g. prediction of comfort of inhabitant)

Feedforward Neural Networks Neural networks are a supervised learning

mechanism that can be trained to make decisions based on a set of training examples. Training for reactive decisions involves the

presentation of a set of examples (xi, d(xi)) ,where d(xi) is the desired decision to be made in configuration xi.

Training for goal-based or utility-based decisions involves learning a model that maps input (xi, d(xi)) to the outcome of the action f(xi, d(xi)) and then selecting the decision with the best outcome.

Example System: Regulation in the Adaptive House [DLRM94]

Neural network learns to regulate the lights in the house to maintain a given light intensity.

1. Learns a network that predicts the light intensity if a given set of lights are turned on Input:

The current light device levels (7 inputs) The current light sensor levels (4 inputs) The new light device levels (7 inputs)

Output: The new light sensor levels (4 outputs)

[DLRM94] Dodier, R. H., Lukianow, D., Ries, J., & Mozer, M. C. (1994).

A comparison of neural net and conventional techniques for lighting control. Applied Mathematics and Computer Science, 4, 447-462.

::

:

:

Example System: Regulation in the Adaptive House continued

2. Decisions are made by comparing the output of the network for all possible decisions (i.e. combinations of lights to be turned on) with the desired light intensity and taking the decision that most closely matches it.

Decision:

State xi

Decision d

Prediction f(xi, d)

Set point p

),(minarg)( dxfpxd id

i

Neural Networks Characteristics

Efficient Advantages

Can learn arbitrary decision functions from training data

Generalizes to new situations Fast decision making

Problems Requires training data that contains desired

decision or a goal/objective Requires design of sufficient input

representation

Reinforcement Learning Reinforcement learning learns an

optimal decision strategy from trial and error and sparse reward feedback. On-line method to solve Markov Decision

Processes (or, with extensions, POMDPs). Reward, R, is a signal encoding the

instantaneous feedback to the system. System learns a mapping from states to

decisions, *(xi), which optimizes the

expected utility.

Q-Learning Q-learning is the most popular

reinforcement learning technique for MDPs. Learns a utility function for state-action pairs

Q(x, d) Utility U(x) = maxa Q(x,d)

Learns by experimentation. Update Q(xi ,d) after each observed transition from

state xi by comparing the expected utility of (xi,d) with the expectation computed after observing the actual outcome xj.

Q(xi,d) = Q(xi,d) + * (R(xi) + maxd’ Q(xj,d’) - Q(xi,d)) Decisions are made to optimize Q-values

(x) = argmaxd Q(x,d)

Example System: Regulation in the Adaptive House [Moz98]

Neural network regulators can control lighting and heating to achieve a given set point

Set point is learned using reinforcement Energy usage Inhabitant interactions

with light switches or thermostats

[Moz98] Mozer, M. C. The neural network house: An environment that adapts to its inhabitants. In Proc. AAAI Spring Symposium on Intelligent Environments (pp. 110-114). Menlo, Park, CA, 1998.

Example System: MavHome

Uses Q-learning on a state space including device status and the Active LeZi prediction. State st at time t

st = (xt, pt) Reinforcement includes multiple metrics

Energy usage Number of inhabitant-device interactions

Decisions are device interactions and an action representing the decision not to perform an action.

System operates event-driven, making a decision every time an event happens.

Learner is pre-trained using the Active LeZi predictor.


Example task: getting up in the morning and taking a shower.


Home learns to automate light activations such as to minimize energy usage without increasing the number of inhabitant interactions

Reinforcement Learning Characteristics

Optimal policies (given enough training) Advantages

Can learn optimal decision strategies without explicit training

Can deal with multiple objectives Problems

Trial and error learning can lead to spurious actions leading to potential safety issues

Requires complete state space representations Can be very complex

Conclusions Decision making is an integral component of

intelligent environments. Automates devices Determines information to inhabitants

Different decision making approaches apply to different conditions based on the available information.

Reactive / Goal-based / Utility-based Programmed / Learning

Decision-making approaches can be “mixed”. Many open issues remain:

How to deal with complexity of intelligent environments?(Hierarchical systems, multi-agent systems, etc)

How to assure safety and acceptability of learning decision makers ?

Smart Home Technologies Decision Making. Motivation Intelligent Environments are aimed at improving...

Documents

Transcript of Smart Home Technologies Decision Making. Motivation Intelligent Environments are aimed at improving...