Smart Home Technologies Decision Making. Motivation Intelligent Environments are aimed at improving...
-
Upload
georgiana-stewart -
Category
Documents
-
view
216 -
download
0
Transcript of Smart Home Technologies Decision Making. Motivation Intelligent Environments are aimed at improving...
Smart Home Technologies
Decision Making
Motivation Intelligent Environments are aimed at
improving the inhabitants’ experience and task performance Provide appropriate information Automate functions in the home
Prediction techniques can only determine what would happen next, not what should happen next. Automated functions can be different from
inhabitant actions Computer has to determine actions that
would optimize inhabitant experience
Decision Making Decision Making attempts to determine
the actions the system should take in the current situation Should a function be automated ? What should be done next ?
Decisions should be based on the current context and the requirements of the inhabitants Just programmed timers for automation are
not sufficient Decision maker has to take into account the
stream of data
Decision Making in Intelligent Environments Example Decision Making Tasks in
Intelligent Environments: Automation of physical devices
Turn on lights Regulate heating and air conditioning Control media devices Automate lawn sprinklers Automate robotic components (vacuum cleaner, etc)
Control of information devices Provide recipe services in the kitchen Construct shopping lists Decide which types of alarms to display (and where)
Decision Making inIntelligent Environments Objectives of decision making:
Optimize inhabitant productivity Minimize operating costs Maximize inhabitant comfort
Decision making process has to be safe Decisions made can never endanger
inhabitants or cause damage Decisions should be within the range
accepted by the inhabitants
Example Task Should a light be turned on ?
Decision Factors: Inhabitant’s location (current and future) Inhabitant’s task Inhabitant’s preferences Time of the day Other inhabitants Energy efficiency Security
Possible Decisions Turn on Do not automate
Decision Making Approaches Pre-programmed decisions
Timer-based automation Reactive decision making systems
Decisions are based on condition-action rules
Decisions are driven by the available facts Goal-based decision making systems
Decisions are made in order to achieve a particular outcome
Utility-based decision making systems Decisions are made in order to maximize a
given performance measure
Reactive Decision Making
Goal-Based Decision Making
Utility-Based Decision Making
Qualities of a Decision Making
Ideal Complete: always makes a decision
Correct: decision is always right
Natural: knowledge easily expressed
Efficient
Rational Decisions made to maximize performance
Decision-Making Techniques Reactive Decision Making
Rule-based expert system Goal-Based Decision Making
Planning Decision theoretic Decision Making
Belief Networks Markov decision process
Learning Techniques Neural Networks Reinforcement Learning
Rule-Based Decision Making Decisions are made based on rules and
facts Facts represent the state of the environment
Represented as first-order predicate logic Condition-Action rules represent heuristic
knowledge about what to do Rules represent implications that imply actions
from logic sentences about facts Inference mechanism
Deduction: {A, A B} B The left hand side of rules are matched against the
set of facts Rules where the left hand side matches are active
Rule-Based Inference Rules define what actions should be
executed for a given set of conditions (facts) Actions can either be external actions
(“automation”) or internal updates of the set of facts (“state update”)
Rules are often heuristics provided by an expert Multiple rules can be active at any given
time Conflict resolution to decide which rule to fire Scheduling of active rules to perform sequence
of actions
Example Facts:
CurrentTime = 6:30 Location(CurrentTime,bedroom) CurrentDay = Monday
Rules: Internal actions:
(CurrentDay=Monday)^(CurrentTime>6:00)
^(CurrentTime<7:00)^(Location(CurrentTime,bedroom)) ->Set(Location(NextTime,bathroom))
External actions:(Location(NextTime,X)) -> Action(TurnOnLight,X)
Rule-Based Expert Systems Intended to simulate (and automate)
human reasoning process Domain is modeled in first-order logic
State is represented by a set of facts Internal rules model behavior of the environment
Experts provide sets of heuristic condition-action rules
Rules with internal actions can model reasoning process
Rules with external actions indicate decisions the expert would make
The system can optionally be provided with queries by including them in the facts set.
Internal Rules Internal rules have to model the behavior
of the system Persistence over time
E.g.: (Location(CurrentTime,X))^(NoMove(CurrentTime)) -> Set(Location(NextTime,X))
Dynamic behavior of devicesE.g.: (Temperature(CurrentTime,X))^(HeatingOn)
-> Set(Temperature(NextTime,X+2)) Behavior of the inhabitants
E.g.: (Location(CurrentTime,bedroom)) ^(CurrentTime>23:00) ^(LightOn(CurrentTime, bedroom))
-> Action(TurnOffLight, bedroom)
Rule-Based Expert Systems
WORKINGMEMORY
(Facts)
RULE BASE
EXECUTIONENGINE
INFERENCEENGINE
PATTERNMATCHER
AGENDA
Rule-Based Expert System Architecture
Logic Inference Systems and Expert System Shells Logic programming systems provide
inference capabilities. Examples:
Prolog OTTER
Expert system shells provide the infrastructure to build complete expert systems Examples:
CLIPS (for C) JESS (for Java)
Example System: IRoom [Kul02]
Initial versions of the MIT IRoom project used JESS as an inference engine to make decisions about activating devices For example:
If a person enters the room and the room is empty then turn on the light
Rules are programmed by the system designer before the room is used and then refined based on experience
[Kul02] Ajay Kulkarni. Design Principles of a Reactive Behavioral System for the Intelligent Room.. 2002.
Rule-Based Decision Making Characteristics
Complete and correct (given complete rules) Natural (given expert specified rules)
Advantages Permits the system to be programmed
relatively efficiently by an expert Can address relatively complex systems
Problems Quality of the rules is essential Behavior of the environment mimics the expert Anticipating all possible contexts is difficult
Planning Decisions A planning system searches for a sequence
of actions which can achieve a defined goal. States can be represented as logic sequences Actions are defined as operators (symbolic
representations of the effect and conditions of actions) which contain:
Preconditions of actions Effects of actions
A goal is a set of states Planning system uses constraints to
efficiently search for a sequence of operators that lead from the start state to a goal state.
Example Initial State : (Location(bedroom))^(Light(bathroom,off)) Goal: Happy(Inhabitant) Action 1: MakeHappy
Precondition: (Location(X))^(Light(X,on))Effect: Add: Happy(Inhabitant)
Action 2: TurnOnLight(X)Precondition: Light(X,off)Effect: Delete: Light(X,off), Add: Light(X,on)
Action 3: Move(X, Y)Precondition: (Location(X))^(Light(Y,on))Effect: Delete: Location(X), Add: Location(Y)
Plan: Action 2, Action 3, Action 1
ExampleStart
Location(bedroom) Light(bathroom,off)
Finish
Happy(Inhabitant)
MoveToLocation(bedroom)
Location(bathroom)
Light(bathroom,on)TurnOnLight
Light(bathroom,off)
Light(bathroom,on)
MakeHappyLocation(bathroom)
Happy(Inhabitant)
Light(bathroom,on)
Example Planning Systems Partial Order Planners
Derive plans without requiring to find actions in sequence
SNLP (Univ. of Washington) GraphPlan (CMU)
Builds and prunes graph of possible plans
Conditional Planners Derive plans under uncertainty by
constructing plans that work under given conditions
UCPOP (Univ. of Washington) Partial Order Planner with Universal quanitification
and Conditional effects CPOP Sensory GraphPlan (CMU)
Planning Decisions Characteristics
Complete and correct (given complete rules) Relatively natural formulation
Advantages Permits a sequence of actions to be found that
performs a given task Goals can be defined easily
Problems Requires complete description of the system Uncertainty is difficult to handle Planning is generally very complex
Decision Theory Decision theory addresses rational decision
making under uncertainty Uncertainty is represented using probabilities
Uncertainty due to incomplete observability Uncertainty due to nondeterministic action outcomes Uncertainty due to nondeterministic system behavior
Utility theory is used to achieve rational decisions Utility is a measure of the expected “value” of a given
situation or decision Rational decisions are the ones that yield the highest
expected utility in the current situation
Modeling Uncertainty The current situation can be represented as a
Belief state, i.e. as a probability distribution over the states indicating the likelihood that any given state xi is the current state{(x1, P(x1)), (x2, P(x2)),…, (xn, P(xn))} The probability of a state can be expressed as the
probability of all state attributes P(x)=P(a1,a2,…,an) Uncertainties from incomplete observability
and nondeterminism can be modeled as conditional probabilities State transition model: Observation model: P(o | x) ),|( 1 dxxP t
jti
Bayes Rule Bayes rule permits to invert cause and
effect when calculating probabilities
It is often easier to estimate P(e | c) Probability of a state given a set of
sensor readings, P(x | o) , can be calculated knowing the observation probabilities P(o | x)
)(
)()|()|(
eP
cPcePecP
Utility Theory Utilities U(A) represent the “value” of a
given situation or decision A and model preferences The utility function for a particular system is
not unique Only relative differences between utility
values are important U(A) > U(B) A preferred to B U(A) = U(B) agent indifferent to A and B
Utilities for uncertain situations can be calculated as the expected value of the utility of all possibilities
U({(x1,P(x1)),…,(xn,P(xn))) = i P(xi)* U(xi)
Rational Decisions The rational decision is the one that leads to
the highest utility
Rational decisions in Decision theory requires Complete causal model of the environment
P(xi | xj, d) Complete knowledge of the observation (sensor)
modelP(o | xi)
Knowledge of the Utility function for all statesU(xi)
)(),,,|(maxarg )()0(i
t
ii
bxUbooxPd
Markov Decision Processes Markov Decision Processes (MDPs) form a
probabilistic model of all possible system behavior MDPs can be described by a tuple <S, A, T, R>
representing states, actions, transition probabilities, and reinforcements.
System has to obey the Markov assumptionP(xt+1|xt, dt, xt-1, dt-1, …, x0) = P(xt+1 | xt, dt)
Reinforcement represents the instantaneous change in utility obtained in a given state
Models costs and payoffs Are generally sparse and delayed
Utility Function for MDPs In an MDP, the utility of a state under a given
policy can be defined as the expected sum of discounted reinforcements
The optimal utility function U* can be computed using Value iteration
Optimal policy (decision strategy) can be extracted from the utility function
)()(
t
tt xRxU
jxj
tij
dii
t xUdxxPxRxU )(),|(max)()(1
jx
jijid
i xUdxxPxRx ))(),|()((maxarg)( **
MDP Example S = {(1,1), (1,2), … (4, 3)} A = {,,,} T: P(intended direction) = 0.8, P(right angle to intended) = 0.1 R: +1 at goal, -1 at trap, 0.04 in all other states = 1
MDP Example
Optimal PolicyOptimal Utilities
Markov Decision Processes Characteristics
Complete and Correct Advantages
Takes into account transition uncertainty Makes optimal decisions Automatically calculates the utility function
Problems Requires complete probabilistic description
of the system Requires complete observability of the state
Partially Observable MDPs Partially Observable Markov Decision
Processes (POMDPs) extend MDP by permitting states to be only partially observable. Systems can be represented by a tuple
<S, A, T, R, O, V> where <S, A, T, R> is an MDP and O, V are mapping observations about the state to probabilities of a given state
O = {oi} is the set of observations V: V(x, o) = P(o | x)
To determine an optimal policy, an optimal utility function for the belief states has to be computed
POMDPs Characteristics
Complete and Correct Advantages
Takes into account all uncertainty Makes optimal decisions
Problems Requires complete probabilistic description of
the system Optimal solution is so far intractable (dynamic
decision networks and approximation techniques exist and work for small state spaces)
Learning Decisions Learning techniques permit decisions to be
learned from past experience and feedback from the inhabitants or the environment. Supervised learning
Requires the desired decision to be specified during training
Reinforcement learning Learns by experimentation from scalar reward
feedback Inhabitant feedback (e.g. device interactions) Explicit environment feedback (e.g. energy consumption) Implicit feedback (e.g. prediction of comfort of inhabitant)
Feedforward Neural Networks Neural networks are a supervised learning
mechanism that can be trained to make decisions based on a set of training examples. Training for reactive decisions involves the
presentation of a set of examples (xi, d(xi)) ,where d(xi) is the desired decision to be made in configuration xi.
Training for goal-based or utility-based decisions involves learning a model that maps input (xi, d(xi)) to the outcome of the action f(xi, d(xi)) and then selecting the decision with the best outcome.
Example System: Regulation in the Adaptive House [DLRM94]
Neural network learns to regulate the lights in the house to maintain a given light intensity.
1. Learns a network that predicts the light intensity if a given set of lights are turned on Input:
The current light device levels (7 inputs) The current light sensor levels (4 inputs) The new light device levels (7 inputs)
Output: The new light sensor levels (4 outputs)
[DLRM94] Dodier, R. H., Lukianow, D., Ries, J., & Mozer, M. C. (1994).
A comparison of neural net and conventional techniques for lighting control. Applied Mathematics and Computer Science, 4, 447-462.
::
:
:
Example System: Regulation in the Adaptive House continued
2. Decisions are made by comparing the output of the network for all possible decisions (i.e. combinations of lights to be turned on) with the desired light intensity and taking the decision that most closely matches it.
Decision:
State xi
Decision d
Prediction f(xi, d)
Set point p
),(minarg)( dxfpxd id
i
Neural Networks Characteristics
Efficient Advantages
Can learn arbitrary decision functions from training data
Generalizes to new situations Fast decision making
Problems Requires training data that contains desired
decision or a goal/objective Requires design of sufficient input
representation
Reinforcement Learning Reinforcement learning learns an
optimal decision strategy from trial and error and sparse reward feedback. On-line method to solve Markov Decision
Processes (or, with extensions, POMDPs). Reward, R, is a signal encoding the
instantaneous feedback to the system. System learns a mapping from states to
decisions, *(xi), which optimizes the
expected utility.
Q-Learning Q-learning is the most popular
reinforcement learning technique for MDPs. Learns a utility function for state-action pairs
Q(x, d) Utility U(x) = maxa Q(x,d)
Learns by experimentation. Update Q(xi ,d) after each observed transition from
state xi by comparing the expected utility of (xi,d) with the expectation computed after observing the actual outcome xj.
Q(xi,d) = Q(xi,d) + * (R(xi) + maxd’ Q(xj,d’) - Q(xi,d)) Decisions are made to optimize Q-values
(x) = argmaxd Q(x,d)
Example System: Regulation in the Adaptive House [Moz98]
Neural network regulators can control lighting and heating to achieve a given set point
Set point is learned using reinforcement Energy usage Inhabitant interactions
with light switches or thermostats
[Moz98] Mozer, M. C. The neural network house: An environment that adapts to its inhabitants. In Proc. AAAI Spring Symposium on Intelligent Environments (pp. 110-114). Menlo, Park, CA, 1998.
Example System: MavHome
Uses Q-learning on a state space including device status and the Active LeZi prediction. State st at time t
st = (xt, pt) Reinforcement includes multiple metrics
Energy usage Number of inhabitant-device interactions
Decisions are device interactions and an action representing the decision not to perform an action.
System operates event-driven, making a decision every time an event happens.
Learner is pre-trained using the Active LeZi predictor.
Example System: MavHome
Example task: getting up in the morning and taking a shower.
Example System: MavHome
Home learns to automate light activations such as to minimize energy usage without increasing the number of inhabitant interactions
Reinforcement Learning Characteristics
Optimal policies (given enough training) Advantages
Can learn optimal decision strategies without explicit training
Can deal with multiple objectives Problems
Trial and error learning can lead to spurious actions leading to potential safety issues
Requires complete state space representations Can be very complex
Conclusions Decision making is an integral component of
intelligent environments. Automates devices Determines information to inhabitants
Different decision making approaches apply to different conditions based on the available information.
Reactive / Goal-based / Utility-based Programmed / Learning
Decision-making approaches can be “mixed”. Many open issues remain:
How to deal with complexity of intelligent environments?(Hierarchical systems, multi-agent systems, etc)
How to assure safety and acceptability of learning decision makers ?