Renewal, Modernization and Optimization of Air-quality Monitoring in Lithuania
Dynamic Optimization and Learning for Renewal Systems
description
Transcript of Dynamic Optimization and Learning for Renewal Systems
Dynamic Optimization and Learning for Renewal Systems
Michael J. Neely, University of Southern CaliforniaAsilomar Conference on Signals, Systems, and Computers, Nov. 2010
PDF of paper at: http://ee.usc.edu/stochastic-nets/docs/renewal-systems-asilomar2010.pdfSponsored in part by the NSF Career CCF-0747525, ARL Network Science Collaborative Tech. Alliance
tT/R
T/R
T/R
T/R
T/RNetwork
Coordinator
Task 1
Task 2
Task 3
T[0] T[1] T[2]
A General Renewal System
tT[0] T[1] T[2]
y[2]y[1]y[0]
•Renewal Frames r in {0, 1, 2, …}.•π[r] = Policy chosen on frame r.•P = Abstract policy space (π[r] in P for all r).•Policy π[r] affects frame size and penalty vector on frame r.
π[r]•y[r] = [y0(π[r]), y1(π[r]), …, yL(π[r])]
•T[r] = T(π[r]) = Frame Duration
A General Renewal System
tT[0] T[1] T[2]
y[2]y[1]y[0]
•Renewal Frames r in {0, 1, 2, …}.•π[r] = Policy chosen on frame r.•P = Abstract policy space (π[r] in P for all r).•Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]):
π[r]•y[r] = [y0(π[r]), y1(π[r]), …, yL(π[r])]
•T[r] = T(π[r]) = Frame Duration
A General Renewal System
tT[0] T[1] T[2]
y[2]y[1]y[0]
•Renewal Frames r in {0, 1, 2, …}.•π[r] = Policy chosen on frame r.•P = Abstract policy space (π[r] in P for all r).•Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]):
π[r]•y[r] = [1.2, 1.8, …, 0.4]
•T[r] = 8.1 = Frame Duration
A General Renewal System
tT[0] T[1] T[2]
y[2]y[1]y[0]
•Renewal Frames r in {0, 1, 2, …}.•π[r] = Policy chosen on frame r.•P = Abstract policy space (π[r] in P for all r).•Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]):
π[r]•y[r] = [0.0, 3.8, …, -2.0]
•T[r] = 12.3 = Frame Duration
A General Renewal System
tT[0] T[1] T[2]
y[2]y[1]y[0]
•Renewal Frames r in {0, 1, 2, …}.•π[r] = Policy chosen on frame r.•P = Abstract policy space (π[r] in P for all r).•Policy π[r] affects frame size and penalty vector on frame r. These are random functions of π[r] (distribution depends on π[r]):
π[r]•y[r] = [1.7, 2.2, …, 0.9]
•T[r] = 5.6 = Frame Duration
Example 1: Opportunistic Scheduling
S[r] = (S1[r], S2[r], S3[r])
•All Frames = 1 Slot•S[r] = (S1[r], S2[r], S3[r]) = Channel States for Slot r•Policy p[r]: On frame r: First observe S[r], then choose a channel to serve (i.,e, {1, 2, 3}).•Example Objectives: thruput, energy, fairness, etc.
Example 2: Markov Decision Problems
•M(t) = Recurrent Markov Chain (continuous or discrete)•Renewals are defined as recurrences to state 1.•T[r] = random inter-renewal frame size (frame r).•y[r] = penalties incurred over frame r.•π[r] = policy that affects transition probs over frame r.
•Objective: Minimize time average of one penalty subj. to time average constraints on others.
1
2
3
4
Example 3: Task Processing over Networks
T/R
T/R
T/R
T/R
T/R
Network Coordinator
•Infinite Sequence of Tasks.•E.g.: Query sensors and/or perform computations.•Renewal Frame r = Processing Time for Frame r.•Policy Types:• Low Level: {Specify Transmission Decisions over Net}• High Level: {Backpressure1, Backpressure2, Shortest Path}
•Example Objective: Maximize quality of information per unit time subject to per-node power constraints.
Task 1Task 2Task 3T/R
Quick Review of Renewal-Reward Theory(Pop Quiz Next Slide!)
Define the frame-average for y0[r]:
The time-average for y0[r] is then:
*If i.i.d. over frames, by LLN this is the same as E{y0}/E{T}.
Pop Quiz: (10 points)
•Let y0[r] = Energy Expended on frame r.•Time avg. power = (Total Energy Use)/(Total Time)•Suppose (for simplicity) behavior is i.i.d. over frames.
To minimize time average power, which one should we minimize?
(a) (b)
Pop Quiz: (10 points)
•Let y0[r] = Energy Expended on frame r.•Time avg. power = (Total Energy Use)/(Total Time)•Suppose (for simplicity) behavior is i.i.d. over frames.
To minimize time average power, which one should we minimize?
(a) (b)
Two General Problem Types:
1) Minimize time average subject to time average constraints:
2) Maximize concave function φ(x1, …, xL) of time average:
Solving the Problem (Type 1):
Define a “Virtual Queue” for each inequality constraint:
Zl[r] clT[r]yl[r]
Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0]
Lyapunov Function and “Drift-Plus-Penalty Ratio”:
Z2(t)
Z1(t)
L[r] = Z1[r]2 + Z2[r]2 + … + ZL[r]2
Δ(Z[r]) = E{L[r+1] – L[r] | Z[r]} = “Frame-Based Lyap. Drift”
•Scalar measure of queue sizes:
•Algorithm Technique: Every frame r, observe Z1[r], …, ZL[r]. Then choose a policy π[r] in P to minimize:
Δ(Z[r]) + VE{y0[r]|Z[r]}
E{T|Z[r]}“Drift-Plus-Penalty Ratio” =
The Algorithm Becomes:
•Observe Z[r] = (Z1[r], …, ZL[r]). Choose π[r] in P to solve:
•Then update virtual queues:
Δ(Z[r]) + VE{y0[r]|Z[r]}
E{T|Z[r]}
Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0]
Theorem: Assume the constraints are feasible. Then under this algorithm, we achieve:
Δ(Z[r]) + VE{y0[r]|Z[r]}
E{T|Z[r]}DPP Ratio:
(a)
(b)
For all frames r in {1, 2, 3, …}
Solving the Problem (Type 2):
We reduce it to a problem with the structure of Type 1 via:• Auxiliary Variables γ[r] = (γ1[r], …, γL[r]).• The following variation on Jensen’s Inequality:
For any concave function φ(x1, .., xL) and any (arbitrarily correlated) vector of random variables (x1, x2, …, xL, T), where T>0, we have:
E{Tφ(X1, …, XL)}
E{T}E{T(X1, …, XL)}
E{T}φ( )≤
The Algorithm (type 2) Becomes:
•On frame r, observe Z[r] = (Z1[r], …, ZL[r]).•(Auxiliary Variables) Choose γ1[r], …, γL[r] to max the below deterministic problem:
•(Policy Selection) Choose π[r] in P to minimize:
•Then update virtual queues:Zl[r+1] = max[Zl[r] – clT[r] + yl[r], 0], Gl[r+1] = max[Gl[r] + γl[r]T[r] - yl[r], 0]
Example Problem – Task Processing:
T/R T/R
T/R
T/R
T/R
Network Coordinator
Task 1Task 2Task 3
•Every Task reveals random task parameters η[r]: η[r] = [(qual1[r], T1[r]), (qual2[r], T2[r]), …, (qual5[r], T5[r])]•Choose π[r] = [which node to transmit, how much idle] in {1,2,3,4,5} X [0, Imax] •Transmissions incur power•We use a quality distribution that tends to be better for higher-numbered nodes.•Maximize quality/time subject to pav≤ 0.25 for all nodes.
Setup Transmit Idle I[r]Frame r
Minimizing the Drift-Plus-Penalty Ratio:
•Minimizing a pure expectation, rather than a ratio, is typically easier (see Bertsekas, Tsitsiklis Neuro-DP).
•Define:
•“Bisection Lemma”:
Learning via Sampling from the past:
•Suppose randomness characterized by: {η1, η2, ..., ηW} (past random samples)
•Want to compute (over unknown random distribution of η):
•Approximate this via W samples from the past:
Simulation:
Sample Size W
Qua
lity
of In
form
ation
/ U
nit T
ime
Drift-Plus-Penalty Ratio Alg. With Bisection
Alternative Alg. With Time Averaging
Concluding Sims (values for W=10):
Quick Advertisement: New Book: M. J. Neely, Stochastic Network Optimization with Application to Communication and Queueing Systems. Morgan & Claypool, 2010.
http://www.morganclaypool.com/doi/abs/10.2200/S00271ED1V01Y201006CNT007
• PDF also available from “Synthesis Lecture Series” (on digital library)• Lyapunov Optimization theory (including these renewal system problems)• Detailed Examples and Problem Set Questions.