Online Learning for Energy-Efficient Multimedia...
Transcript of Online Learning for Energy-Efficient Multimedia...
Multimedia Communications and Systems Laboratory 1
Online Learning for Energy-EfficientMultimedia Systems
Nick Mastronarde
PhD Defense
May 6, 2011
2
• Old: Higher multimedia quality is better
– Optimize rate-distortion performance
• H.264/AVC
– Minimize delay
– Minimize distortion
– …
• New: Quality costs power
SurveillanceVideo conferencing Sensor networks Data centersIn home
Resource intensive multimedia applications are booming over a variety of resource constrained networks and systems
Delay,Distortion
Energy
My Focus!Energy-efficient resource management
Multimedia Communications and Systems Laboratory
Performance Metrics and High-level System Model
• Performance metric depends on the system and application
– Minimize energy subject to QoS constraint
– Optimize QoS subject to energy budget
– …
• For example:
– E[Cost] = E[Energy] + µE[Delay]
3
QoSDelay,
Distortion
Buffer
ServerSource
Service AdaptationSource Adaptation
I P
B
P
B
Multimedia Data
Scheduling
Multimedia Communications and Systems Laboratory
Two types of optimization objectives
• Myopic:
– Minimize expected immediate cost
• Foresighted:
– Minimize expected immediate cost + expected future cost
– Why?
• Power & Delay: Time to transmit current packet impacts time available (and
power required) to transmit future packets before their deadlines
• Multimedia Utility: Scheduling decisions at the current time impact future
scheduling decisions due to source-coding dependencies
4
E[Cost] = E[Energy] + µE[Delay]
Suboptimal!
My Focus!
Multimedia Communications and Systems Laboratory
Foresighted Optimization
• How does foresighted optimization work?
– In time slot n, take transmission action to minimize:
( )( )( , ) , ,wc s a V f s a w+ E
Dynamics:wAction: a
Current cost Expected future cost
State: s State: ( ), ,s f s a w′ =
Time n Time n+1
ChannelBuffer backlogMM Data state
SchedulingAMC
ChannelData arrivals
Tx errors
Myopic solutions are suboptimal because they
ignore the expected future utility
5
Multimedia Communications and Systems Laboratory 6
Challenges
• Challenge 1: Unknown dynamic environments
– Dynamic traffic and channel conditions
– Lack of statistical knowledge of dynamics
– Fast learning algorithms
• Challenge 2: Heterogeneous multimedia data
– Different deadlines, priorities, dependencies
• Challenge 3: Multi-user
– Coupling due to shared resources
– Curse of dimensionality
Multimedia Communications and Systems Laboratory 7
Existing Solutions (1/2)
• Cross-layer optimization in multimedia communications and systems
– Myopic: Ignore the impact of current decisions on the future performance. [Nahrstedt 2006, 2007, He 2005, Sachs 2003, Mohapatra 2005, van der Schaar 2003, 2007]
• Single-layer optimizations
– Hardware layer (dynamic power management): [Benini 1999, Chung 2002, Marculescu 2005]
• Learning solutions require too much memory or are too complex
– Physical layer (transmission power-control)• Optimal solutions require statistical knowledge of dynamics [Berry 2002]
• Learning solutions are slow to converge [Borkar 2008]
– Application layer (multimedia rate-control) [Ortega 1994]
• Rate-distortion characteristics are assumed to be known
Multimedia Communications and Systems Laboratory 8
Existing solutions (2/2)
• Multi-user network optimization
– Network utility maximization [Chiang 2007]
• Static utility function
• Ignores network dynamics
• Ignores packet deadlines, priorities, and dependencies
• No learning for unknown environments
– Stability-constrained optimization [Neely 2006]
• Guarantees queue stability, but achieves suboptimal power consumption in
low delay region
• Ignores packet deadlines, priorities, and dependencies
Multimedia Communications and Systems Laboratory 9
Improvement over state-of-the-art
Problem setting Previous state-of-the-art Achieved improvement
Point-to-point energy-efficient wireless
communication
[Mastronarde 2011b]
Heuristic policy
[Nahrstedt 2007]
Reinforcement learning
[Borkar, 2008]
Reduce power by up to 33% for same delay
(in non-stationary environment)
Reduce delay and power by up to 50%and 23%, respectively, after 3000 learning steps
Cooperative multi-user video transmission
[Mastronarde 2011a]
Non-cooperative multi-user video transmission
[Fu, van der Schaar, 2010]
Improve 5 – 10 dB PSNR for nodes with feeble direct signals
Cross-layer multimedia system optimization*
[Mastronarde 2010, 2009b]
Cross-layer adaptation
[Nahrstedt 2005]
Improve up to 7 dB PSNR and reduce power by 21%
The proposed framework achieves...
*Prior work presented during Qualifying Exam
Multimedia Communications and Systems Laboratory 10
Overview
• Part I: Fast reinforcement learning for energy-efficient wireless communication [Mastronarde, 2011b]
– Post-decision state learning
– Virtual experience learning
• Part II: A distributed cross-layer approach to cooperative video transmission [Mastronarde, 2011a]
– Multi-user Markov decision process formulation
– Mitigating the curse of dimensionality
Multimedia Communications and Systems Laboratory 11
Overview
• Part I: Fast reinforcement learning for energy-efficient wireless communication [Mastronarde, 2011b]
– Post-decision state learning
– Virtual experience learning
• Part II: A distributed cross-layer approach to cooperative video transmission [Mastronarde, 2011a]
– Multi-user Markov decision process formulation
– Mitigating the curse of dimensionality
Multimedia Communications and Systems Laboratory 12
The Solved Energy-efficient Wireless Communication Problem (1/2)
• Point-to-point time-slotted wireless communication system
• Minimize power consumption subject to buffer delay constraint
– Little’s law: Average buffer delay is proportional to average buffer occupancy
nb
nh
nBEP
nxny nz
nl nf
Multimedia Communications and Systems Laboratory 13
The Solved Energy-efficient Wireless Communications Problem (2/2)
• System variables
– Buffer occupancy state:
– Channel state: -- Finite state Markov chain (e.g. Rayleigh fading)
– Power management state:
– Data arrivals: -- i.i.d.
• Decision variables (actions)
– Packet throughput:
– Bit-error probability:
– Power management action:
{ }0, ,nb B∈ …
{ }on,offnx ∈
nh
nl
, 0n n nz z b≤ ≤nBEP
{ }s_on, s_offny ∈
Goodput , 0n n nf f z≤ ≤
nb
nh
nBEP
nxny nz
nl nf
Multimedia Communications and Systems Laboratory 14
Buffer Model
• Buffer state: ,
– Buffer recursion
– Controlled Markov chain with transition probabilities:
{ }: 0,1,nb n∈ =B … { }0,1, ,B=B …
[ ]( )| , , , , ,bp b b h x BEP y z′
( )( )
0init
1 min , , ,n n n n n n
b b
b b f BEP z l B+
=
= − +
( ) ( )( ) ( )
0
0
| , , if
| , , if
z l f
f
z l f
f l B b f
p b b f p f BEP z b Bp b b h x BEP y z
p l p f BEP z b B
=∞
= = − −
′ ′− − < = ′ =
∑∑ ∑
Multimedia Communications and Systems Laboratory 15
Power Management Model
• Power management state:
– Controlled Markov chain with transition probabilities [Benini 1999]
{ }: 0,1,nx n∈ =X …
Switch “on”
Switch “off”
( ) ( )[ ],
| ,x x
x xy p x x y ′
′=P
• Switching wireless card “on” or “off”
– Incurs transition power penalty (watts):
– Incurs expected transition delay:
trP
t∆
( )
( )
on off
on 1 0s_on
off 1 0
on off
on 0 1s_off
off 0 1
x
x
=
=
P
P
Multimedia Communications and Systems Laboratory 16
Costs
We want to achieve the optimal power subject to a buffer constraint
• Power cost:
• Buffer cost:
[ ] [ ] [ ]( ),
holding cost overflow cost
( , , , , ) max ,0f lg b x BEP y z b f b f l Bη = − + − + −
E ������� �������������������������
Proportional to the delay
(by Little’s law)
Provides incentive to tx packetsinstead of dropping them
[ ]( )
( )[ ]on tx
off
tr
, , , if on, s_on
, , , , , if off, s_off
, otherwise,
P P h BEP z x y
h x BEP y z P x y
P
ρ
+ = == = =
tr on off 0PP P≥ > ≥
Multimedia Communications and Systems Laboratory 17
Formulation as Markov Decision Process (MDP)
• State:
• Action:
• Policy:
• Cost:
• Transition probability:
( ), ,s b h x�
( ), ,a BEP y z�
( ) [ ]( ) [ ]( )Buffer costPower cost
, , , , , , , , ,c s a h x BEP y z g b x BEP y zρ µ= + ����������������������������������
( ) [ ]( ) ( ) ( )Buffer state Power stateChannel state
| , , , , , , | | ,b h xp s s a p b h x BEP y z p h h p x x y′ ′ ′= ��������������������� ��������� �����������
: s aπ →
Multimedia Communications and Systems Laboratory
Value Functions
• State-value function:
• Optimal state-value function:
• Optimal policy:
( ) argmin ( , ), a
s Q s a sπ∗ ∗
∈= ∀ ∈
A
S
( ) ( )( ) ( )( ) ( )Current cost Expected future cost
, | ,s
V s c s s p s s s V sπ ππ γ π′∈′ ′= + ∑ S��������� ���������������������������
( ) ( ) ( ) ( ){ }( ),
min , | ,sa
Q s a
V s c s a p s s a V sγ
∗
∗ ∗′∈
′ ′= + ∑ S�����������������������������������
If and are known, this is a simple numerical problem…( ),c s a ( )| ,p s s a′
18
Multimedia Communications and Systems Laboratory
Conventional Reinforcement Learning Algorithm:Q-Learning
19
( )1, , ,n n n n na cs sσ += ( )1
( , )
| ,
n n n
n n n
E c c s a
s p s s a+
= ′∼
( ) ( ) ( )( )
1
1
, 1 ,
min ,
n n n n n n n
n n n n
a
Q s s
c s
a Q a
Q a
α
α γ
+
+
′∈
← −
′+ + A
( )max
1
onto 0,
: average delay constraint
0,1 : learning
: projec
rat
ts
e
n
n n n ng
µ µ
δ
µ µ
β
β δ+ = Λ + −
Λ
∈
Initialization at time n=0
Take Action
(Exploration vs. Exploitation)
Observe Experience
Update the Action-Value Function
Update the Lagrange Multiplier
n=n+1
( )0 ,, , as sQ a∀ ∈ ∀ ∈S A
( ) ( )( )
argmin , , with probability 1
rand , with probability
nan
n n n
n
Q s aa
−= A
ε
ε
Problem
Problem Problem
Multimedia Communications and Systems Laboratory 20
Post-Decision State Definition
Definition: An intermediate state after the known dynamics take place, but before the unknown dynamics take place.
State(time n)
State(time n+1)
Post-decision state(time n)
( ), ,n n n ns b h x=
( ), ,n n n na BEP y z= ( )lp l ( )|hp h h′
( )[ ]( )1, ,
, ,
n n n n
n n n n
s b h x
b f h x +
=
= −
� �� �
Known Unknown
Deterministic •PM state transition
•Power cost
•N/A
Stochastic •Goodput distribution
•Holding cost
•Traffic arrival distribution
•Channel state distribution
•Overflow cost
( )[ ]( )
1 1 1 1
1 1
, ,
, ,
n n n n
n n n n n
s b h x
b f l h x
+ + + +
+ +
=
= − +
Multimedia Communications and Systems Laboratory 21
Post-Decision State Generalization
• Transition probability function
• Cost function
( )k k u( , ) ( , ) | , ( , )s
c s a c s a p s s a c s a= + ∑�
� �
s s ′→� s s→ �
s s ′→� s s→ �
KnownUnknown
Known Unknown
( ) ( ) ( )u k| , | , | ,s
p s s a p s s a p s s a′ ′= ∑ �� �
In a large class of wireless systems
( )u u| , |p s s a p s s′ ′=� �
u u( , ) ( )c s a c s=� �
Multimedia Communications and Systems Laboratory 22
Post-Decision State Value Function
State(time n)
State(time n+1)
Post-decision state(time n)
n n n ns b h x
n n n na BEP y z
n n n ns b h x� � 1ns +
( )k | , | , | ,p s s a p x x y p b b BEP z I h h� � ( )u |p s s′ �
( )uc s� k( , ) | , , , , ,c s a p f BEP z b f h x BEP y z
( )V s∗ ′( )V s∗� �( )V s∗
( ) ( ) ( ) ( )u u |s
V s c s p s s V sγ∗ ∗
′′ ′= + ∑� � � �
( ) ( ) ( )k kmin ( , ) | ,
as
V s c s a p s s a V s∗ ∗∈
= + ∑
A�
�� �
(a)(b)
(a)
(b)
known unknown
The PDS value function must
be learned
Multimedia Communications and Systems Laboratory
Post-Decision State Learning
23
( ) ( )0 0, V s V s s∀ ∈ S�
( ) ( )k kargmin ( , ) | ,n n n n
a s
a c s a p s s a V s∈
= + ∑
A �
�� �
( ) ( ) ( )1 1 1k kmin ( , ) | ,n n n n n
as
V s c s a p s s a V s+ + +
∈
= + ∑
A�
�� �
( ) ( ) ( ) ( )1 1u1n n n n n n n n nV s V s c V sα α γ+ + ← − + +
� �� �
( )max
1
onto 0,
: average delay constraint
0,1 : learning
: projec
rat
ts
e
n
n n n ng
µ µ
δ
µ µ
β
β δ+ = Λ + −
Λ
∈
( )1u, , , ,n n n n n ns a s c sσ +=� �( )1
( )
|
n nu u
n nu
E c c s
s p s s+
= ′
�
�∼
No Exploration!
Integrates known information!
Problem
Multimedia Communications and Systems Laboratory 24
Virtual Experience Learning
• Problem: PDS learning only updates one PDS in each time slot
• Observation: unknown dynamics and are independent of the buffer and power management states
– Learn about all buffer and power management states in each time slot!
– Improve adaptation speed at the expense of increased complexity.
( )lp l ( )|hp h h′
( )1u, , , ,n n n n n ns a s c sσ += ��
( ) ( )( ) ( ){ }1u, , , , , ; , , , | ,n n n n n n ns a b h x c b l b l h x b xσ + ∑ = + ∀ ∈ × B X� � � � �� � ��
• Actual PDS experience tuple:
• Set of virtual experience tuples:
Current VE state Next VE state
( ) ( )u ; max , 0c b l b l Bη= + −� �VE cost
Multimedia Communications and Systems Laboratory 25
...
...Throughput
...
...
...
...Throughput
...
...Throughput
...
...
Comparison of Learning Algorithms
Action Selection Complexity Learning Update Complexity
Q-learning ( )O A ( )O A
PDS learning ( )O S A� ( )O S A�
Virtual experience learning ( )O S A� ( )O ∑ S A�
=S B�
∑ = ×B X
Multimedia Communications and Systems Laboratory 26
Simulation Setup
• PHY layer: QAM square constellations + Gray code
• Unknown channel transition and packet arrival distributions
Simulation Parameters
Parameter Value Parameter Value
Arrival rate λ 200 packets/second Packet loss rates PLR {1, 2, 4, 8, 16} %
Buffer size B 25 packets Power management actions y ∈ Y { }s_on, s_off
Channel states h ∈ H {-18.82, -13.79, -11.23, -9.37,
-7.80, -6.30, -4.68, -2.08} dB Power management states x ∈ X { }on, off
Holding cost constraint 4 packets Time slot duration t∆ 10 ms
“Off” power offP 0 watts Transmission actions*
z ∈ Z
{0, 1, 2, … , 10}
packets/time slot
“On” power onP 80 mW, 160mW, or 320 mW Discount factor γ 0.98
Transition power trP Set equal to onP Noise power spectral density
10
112 0N −×=
watts/Hz
*Symbol rate 31 / 50 100sT = × symbols/s
Packet size 5000= bits
Bits per symbol { }1,2, ,10β ∈ …
Multimedia Communications and Systems Laboratory
Learning Algorithm Performance Comparison (1/2)
27
0 2 4 6
x 104
0
5
10
15
20
25
Time slot (n)
Ho
ldin
g C
ost
0 2 4 6
x 104
200
250
300
Time slot (n)
Po
we
r (m
W)
0 2 4 6
x 104
0
0.1
0.2
0.3
0.4
Time slot (n)
θo
ff PDS Learning PDS Learning (No DPM) Q-learning
PDS + Virtual Experience (update period = 1) PDS + Virtual Experience (update period = 10)
*
****[Borkar, 2008]
Multimedia Communications and Systems Laboratory
Learning Algorithm Performance Comparison (2/2)
28
0 2 4 6
x 104
0
5
10
15
20
25
Time slot (n)
Ho
ldin
g C
ost
0 2 4 6
x 104
200
250
300
Time slot (n)
Po
we
r (m
W)
0 2 4 6
x 104
0
0.1
0.2
0.3
0.4
Time slot (n)
θo
ff
PDS + Virtual Experience (update period = 1) PDS + Virtual Experience (update period = 10) PDS + Virtual Experience (update period = 25) PDS + Virtual Experience (update period = 125)PDS Learning
Multimedia Communications and Systems Laboratory 29
Comparison to State-of-the-Art
• Threshold-k [Nahrstedt, ’07]
– If backlog exceeds k, then turn on wireless card and transmit all packets
– After transmitting, turn card off
– Ignore channel conditions
• Non-stationary dynamics
– Markov modulated arrival process using unobservable 5-state Markov chain
– Time-varying channel transition probabilities
1 2 3 4 5 6 7 8
50
100
150
200
250
Holding cost (packets)P
ow
er
(mW
)
ProposedThreshold-k
11% – 33% improvement for same holding cost
*Update period for proposed: T = 50 time slots
Multimedia Communications and Systems Laboratory 30
Summary:Fast learning for energy-efficient wireless communications
• Proposed first unified power management framework for delay-sensitive wireless communication
– Integrate system-level and physical-layer centric power management
• Exploited structure of the problem to improve learning performance
– Post-decision state
• Separation of known and unknown dynamics
• Eliminate need for exploration
– Virtual experience learning
• Independence of unknown dynamics and components of state
Multimedia Communications and Systems Laboratory 31
Overview
• Part I: Fast reinforcement learning for energy-efficient wireless communication [Mastronarde, 2011b]
– Post-decision state learning
– Virtual experience learning
• Part II: A distributed cross-layer approach to cooperative video transmission [Mastronarde, 2011a]
– Multi-user Markov decision process formulation
– Mitigating the curse of dimensionality
Multimedia Communications and Systems Laboratory 32
Multi-user Wireless Video Network With Cooperation
Cooperative phase II uses randomized
space-time block coding rule
• Direct mode: Transmit at data rate• Cooperative mode:
– Phase I: transmit at data rate
– Phase II: transmit at data rate
– Cooperative: data rate
0 (bits/s/Hz) for (s)it t
iRxβ
,1 (bits/s/Hz) for (s)i it t t
iR xβ ρ
( ),2 (bits/s/Hz) for 1 (s)i it t
itR xβ ρ−
( ),coop ,1 ,2= 1 (bits/s/Hz) i i i i it t t t tβ ρ β ρ β+ −
12th
Cooperative - Phase I
Cooperative - Phase II
13th
30th
20th
10th
( )
( )
1
1,2 20 30
2,3
,
t
t tt h h
=
=h
C
1 10t, tRx β
1,11 1, t t tR xρ β ( ) 1,21 11 , t t tR xρ β−
: Time slot duration (seconds)
: Transmission time fraction in [0,1]
: Phase I time fraction in [0,1]
R
itx
itρ
Multimedia Communications and Systems Laboratory 33
Prior Work
• Throughput-maximizing (opportunistic) multiple access policies [Knopp 1995], [Viswanath 2002], [Tse 2005]
– Schedule nodes with good fades
– Ignore delay deadlines, priorities, and dependencies
• Cross-layer solutions [Katsaggelos 2007, 2008], [Su 2007], [van der Schaar 2010], [Melodia 2010]
– Balance between scheduling easy nodes and most important nodes
– Underlying inefficiency in network resource usage• Users with high priority data, but worse fades, get access to the channel
Cooperation reduces inefficiency!
Enables users with feeble direct signals, but high priority data, to exploit channel diversity
Multimedia Communications and Systems Laboratory 34
A Sophisticated Traffic Model for Video
• Traffic state:
– Schedulable frame set:
– Buffer state:
Simple IBPB IBPB... GOP structure Illustrative Traffic State
( ),i i it t t= bT F
( )|i itt jib j ∈=b F
{ }( )| , 1, ,i it jd t t t Wj ∈ + …= +F
Multimedia Communications and Systems Laboratory 35
Traffic State Transition Illustration
tttt
t t t t +1+1+1+1
t t t t +2+2+2+2
Ft = (1,2,3)
bbbbt ====(4,3,2)
Ft+1 = (2,3,1,4)
bbbbt+1 = = = = (3,2,6,1)
Ft+2 = (1,4,2,3)
bbbbt+2 = = = = (4,-1,4,1)
yyyyt = = = = (4444,0,0)
yyyyt+1 = = = = (0,0,2222,0)
yyyyt+2 = = = = (4444,0,1111,0)
Traffic StateScheduling
Action
Multimedia Communications and Systems Laboratory 36
Multi-User Markov Decision Process Formulation
• States
– Channel state (i.i.d.):
– Traffic state:
• Actions
– Scheduling action:
– Cooperation decision:
• Utility and Transition Probability
{ } { }, for 0,1,2 , it ti
h i M≠ ∈ …=H
( ),i i it t t= bT F
( ), |i it t j
ity j= ∈y F
{ }0 direct, 1 cooperativeitz ∈ − −
( ) ,, it
i i t it t j jj
it tu s q y
∈= ∑y
F( ) ( ) ( )1 1
11| , | ,M
i i it t tt t t
itp p p+ ++
=
= ∏s s y H yT T
Distortion reduction for packets belonging to frame j
Multimedia Communications and Systems Laboratory
Feasible Scheduling Actions
• Constraint set:
– Buffer constraint:
– Packet constraint
– Dependency constraint:
37
( ), ,i ii it t tz∈y HP T
, ,0 i it j t jy b≤ ≤
( )1
i it ti
t
s
zR
PT
β≤y
( ), , ,if , then 0i i it k t k t jk j b y y− =≺
Multimedia Communications and Systems Laboratory 38
Optimization Objective
• Decision variables:
– Scheduling action:
– Cooperation decision:
• Dynamic programming equation:
• Subject to
• Challenges:
– Complexity is quadratic in , which scales exponentially in and
– Traffic state information is local to users
, where
S M 2M
( ) ( ) ( ) ( ) ( )1
,1
max , | , ,M
i i
i
Mi i i i
i
U u p p Uα′ =∈
∗ ∗
=
′ ′= + ′ ∀∑ ∑ ∏
ys
zHs y y s s
S
T T T
( )21 ,, ,t t t t
TM= …y y y y
( )21 ,, ,t t t t
TM= …z z z z
( )1
, , and 1i i iM
i i
i
z x=
∈ ≤∑y HP T1( )
i
i i
i sPTx
zRβ= y
Multimedia Communications and Systems Laboratory
Mitigating the Curse of Dimensionality
• Problem 1: Complexity scales exponentially in
– Theorem: Cooperation decision that maximizes immediate throughput is long-term optimal [Mastronarde, 2011a]
– Implications of theorem: • Instead of tracking track maximum transmission rates
• Use an opportunistic cooperation scheme for cooperation decision
• Problem 2: Complexity scales exponentially in
– Solution [Fu, van der Schaar, 2010]: Lagrangian relaxation with a resource price
• The resulting MU-MDP can be decomposed into one local MDP per user
• Optimal resource price can be determined using subgradient method
M
2M
λ
tH
( ){ }max ,i
i i i
z
zβ β∗ = H
39
1
ii tt
s
R
PT
β ∗
≤y⇒1
i s it ti
tR
PTx
β ∗= y⇒
Multimedia Communications and Systems Laboratory 40
Simulation Setup
• Scenarios:
– Homogeneous
• Foreman (CIF, 30 Hz, 1.5 Mb/s)
– Heterogeneous 1
• Coastguard (CIF, 30 Hz, 1.5 Mb/s)
• Mobile (CIF, 30 Hz, 2.0 Mb/s)
• Foreman (CIF, 30 Hz, 1.5 Mb/s)
– Heterogeneous 2
• Coastguard (CIF, 30 Hz, 1.5 Mb/s)
• Foreman (CIF, 30 Hz, 1.5 Mb/s)
• Mobile (CIF, 30 Hz, 2.0 Mb/s)
Parameter Description Value
L Length of the STBC 2
cR Rate of orthogonal STBC rule 1
ξ Self-selection parameter 0.20
P Packet size 8000 bits
BEP Bit error probability target 310−
δ Path loss exponent 3
cellR WLAN coverage radius
(5 dB SNR at boundary) 100 m
M Number of nodes
(excluding the AP) 50
α Discount factor 0.80
1/ sT Symbol rate
(symbols per second)
625000 or
1250000
0 1if will self-select itself as a coop , then nod re laye
it c
tict
R
R
βξ
β
+≤
****
****
Multimedia Communications and Systems Laboratory
Network Topology
-100 -50 0 50 100-100
-50
0
50
100
APVideo SourcesPotential Relays
-100 -50 0 50 100-100
-50
0
50
100
-100 -50 0 50 100-100
-50
0
50
100
41
Source Distance to AP Angle
1 20 m 25º
2 45 m -30º
3 80 m 0º
Multimedia Communications and Systems Laboratory 42
Transmission Rates
• A: Feeble direct
• B: Strong direct
• C: Cooperative gains
• D: Homogeneous allocation
• E: Heterogeneous allocation
Cooperative (Low Congestion)Direct (Low Congestion)Cooperative (High Congestion)Direct (High Congestion)
1 2 30
200
400
600
800
1000
1200
1400
1600
1800Homogeneous (Foreman)
1 2 30
200
400
600
800
1000
1200
1400
1600
1800Heterogeneous 1 (Coastguard, Mobile, Foreman)
1 2 30
200
400
600
800
1000
1200
1400
1600
1800Heterogeneous 2 (Coastguard, Foreman, Mobile)
Avg. Transmission Rate (Kbps)
Avg. Transmission Rate (Kbps)
Avg. Transmission Rate (Kbps)
Multimedia Communications and Systems Laboratory 43
Video Quality Comparison
• A: Feeble direct � video undecodable at receiver
• B: Cooperation achieves 5-10 dB PSNR improvement for nodes with feeble direct signals
• C: Cooperation minimally impacts nodes with strong direct signals
Streaming
Scenario
Transmission
Mode
Video User 1 @ 20 m
(Low / High)
Video User 2 @ 45 m
(Low / High)
Video User 3 @ 80 m
(Low / High)
Homogeneous
Foreman Foreman Foreman
Direct 36.82 dB / 36.51 dB 35.85 dB / 30.20 dB 29.89 dB / --- dB
Cooperative 36.69 dB / 35.82 dB 36.58 dB / 34.83 dB 36.04 dB / 27.12 dB
Change -0.13 dB / -0.69 dB 0.73 dB / 4.63 dB 6.15 dB / --- dB
Heterogeneous
1
Coastguard Mobile Foreman
Direct 32.30 dB / 31.09 dB 26.74 dB / 24.53 dB 25.94 dB / --- dB
Cooperative 31.94 dB / 30.89 dB 27.14 dB / 25.8 dB 35.69 dB / 27.12 dB
Change -0.36 dB / -0.20 dB 0.4 dB / 1.27 dB 9.75 dB / --- dB
Heterogeneous
2
Coastguard Foreman Mobile
Direct 31.91 dB / 31.72 dB 35.16 dB / 32.75 dB 21.85 dB / --- dB
Cooperative 31.56 dB / 30.97 dB 35.72 dB / 32.39 dB 26.53 dB / 22.03 dB
Change 0.35 dB / -0.75 dB 0.56 dB / -0.36 dB 4.68 dB / --- dB
Multimedia Communications and Systems Laboratory
Video Quality Example
• Video user 3 @ 80 m
• Low congestion
44
Original
Direct Transmission26.9 dB PSNR
Cooperative Transmission34.7 dB PSNR
Multimedia Communications and Systems Laboratory 45
Optimal Resource Price
Streaming
Scenario
Transmission
Mode
Resource Price
(Low / High)
Homogeneous
Direct 45.79 / 42.97
Cooperative 38.72 / 52.56
Change -6.93 / 9.59
Heterogeneous
1
Direct 51.01 / 53.17
Cooperative 48.02 / 71.94
Change -2.99 / 18.77
Heterogeneous
2
Direct 68.24 / 41.48
Cooperative 62.61 / 72.86
Change -5.63 / 31.38
Multimedia Communications and Systems Laboratory 46
Summary:Multi-user cooperative video transmission
• Multi-user MDP based approach
– Enables high priority nodes to exploit diversity of channel fading states in the network
– Improves video quality of feeble (distant nodes) by 5-10 dB PSNR• Reduces quality of nodes with strong direct signals by < 1 dB
– Resource price for managing congestion• Increases in congested networks
• Decreases in uncongested networks
• Mitigate complexity
– Opportunistic cooperation is long-term optimal
– Decompose problem into local MDPs for each user
Multimedia Communications and Systems Laboratory
Impact in Industry
Company Impact
Sanyo (i) Energy-efficient point-to-point wireless communication(ii) Cooperative video transmission
Intel Optimal video encoder mode decisions
IBM Learning for data exploration
Skype Rigorous modeling and optimization using MDP and reinforcement learning
47
Multimedia Communications and Systems Laboratory 48
Thank you!
http://www.ee.ucla.edu/~nhmastro/
http://medianetlab.ee.ucla.edu/
Multimedia Communications and Systems Laboratory 49
My Journal Papers
1. [Mastronarde, 2011b] N. Mastronarde and M. van der Schaar, “Fast reinforcement learning for energy efficient wireless communications,” in review.
2. [Mastronarde, 2011a] N. Mastronarde, F. Verde, D. Darsena, A. Scaglione, and M. van der Schaar, “Transmitting important bits and sailing high radio waves: a decentralized cross-layer approach to cooperative video transmission,” in review.
3. [Mastronarde, 2010] N. Mastronarde and M. van der Schaar, “Online reinforcement learning for dynamic multimedia systems,” IEEE Trans. on Image Processing, vol. 19, no. 2, pp. 290-305, Feb. 2010.
4. [Mastronarde, 2009c] N. Mastronarde and M. van der Schaar, “Designing autonomous layered video coders,”Elsevier Journal Signal Processing: Image Communication – Special Issue on Scalable Coded Media Beyond Compression, vol. 24, no. 6, pp. 417-436, July 2009.
5. [Mastronarde, 2009b] N. Mastronarde and M. van der Schaar, “Towards a General Framework for Cross-Layer Decision Making in Multimedia Systems,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 19, no. 5, pp. 719-732, May 2009.
6. [Mastronarde, 2009a] N. Mastronarde and M. van der Schaar, “Automated bidding for media services at the edge of a content delivery network,” IEEE Trans. on Multimedia, vol. 11, no. 3, pp. 543-555, Apr. 2009.
7. [Mastronarde, 2008] N. Mastronarde and M. van der Schaar, “A bargaining theoretic approach to quality-fair system resource allocation for multiple decoding tasks,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 3, Mar. 2008.
8. [Mastronarde, 2007b] N. Mastronarde and M. van der Schaar, "A queuing-theoretic approach to task scheduling and processor selection for video decoding applications," IEEE Trans. Multimedia, vol. 8, no. 7, pp. 1493-1507, Nov. 2007.
9. [Mastronarde, 2007a] N. Mastronarde, D. S. Turaga, and M. van der Schaar. “Collaborative resource exchanges for peer-to-peer video streaming over wireless mesh networks,” IEEE J. on Select. Areas in Communications Peer-to-peer Communications and Applications, vol. 25, no. 1, pp. 108-118, Jan. 2007.
10. [Mastronarde, 2006] Y. Andreopoulos, N. Mastronarde, and M. van der Schaar, “Cross-layer optimized video streaming over wireless multi-hop mesh networks,” IEEE J. on Select. Areas in Communications Multi-Hop Wireless Mesh Networks, vol. 24, no. 11, pp. 2104-2115, Nov. 2006.
Multimedia Communications and Systems Laboratory 50
References
• Myopic Cross-Layer (Multimedia systems and communications)
– [He, 2005] Z. He, Y. Liang, L. Chen, I. Ahmad, and D. Wu, “Power-rate-distortion analysis for wireless video communication under energy constraints,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, no. 5, pp. 645-658, May 2005.
– [Sachs, 2003] D. G. Sachs, S. Adve, D. L. Jones, “Cross-layer adaptive video coding to reduce energy on general-purpose processors,” in Proc. International Conference on Image Processing, vol. 3, pp. III-109-112 vol. 2, Sept. 2003.
– [Nahrstedt, 2006] W. Yuan, K. Nahrstedt, S. V. Adve, D. L. Jones, R. H. Kravets, “GRACE-1: cross-layer adaptation for multimedia quality and battery energy,” IEEE Trans. on Mobile Computing, vol. 5, no. 7, pp. 799-815, July 2006.
– [Nahrstedt, 2007] K. Nahstedt, W. Yuan, S. Shah, Y. Xue, and K. Chen, “QoS support in multimedia wireless environments,” in Multimedia Over IP and Wireless Networks, ed. M. van der Schaar and P. Chou, Academic Press, 2007.
– [Mohapatra, 2005] S. Mohapatra, R. Cornea, H. Oh, K. Lee, M. Kim, N. Dutt, R. Gupta, A. Nicolau, S. Shukla, N. Venkatasubramanian, “A cross-layer approach for power-performance optimization in distributed mobile systems,” 19th IEEE International Parallel and Distributed Processing Symposium, 2005.
– [Pillai, 2003] P. Pillai, H. Huang, and K.G. Shin, “Energy-Aware Quality of Service Adaptation,” Technical Report CSE-TR-479-03, Univ. of Michigan, 2003.
– [van der Schaar 2003] M. van der Schaar, S. Krishnamachari, S. Choi, and X. Xu, “Adaptive cross-layer protection strategies for robust scalable video transmission over 802.11 WLANs,” IEEE JSAC, vol. 21, no. 10, pp. 1752-1763.
– [van der Schaar 2007] M. van der Schaar, Y. Andreopoulos, and Z. Hu, “Optimized scalable video streaming over 802.11 a/e HCCA wireless networks under delay constraints,” IEEE Trans. on Mobile Computing, vol. 5, no. 6, pp. 755-768, June 2006.
• Foresighted Single-Layer (no learning, or heuristic)
– [Benini, 1999] L. Benini, A. Bogliolo, G. A. Paleologo, G. D. Micheli, “Policy optimization for dynamic power management,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 6, pp. 813-833, June 1999.
– [Ortega, 1994] A. Ortega, K. Ramchandran, M. Vetterli, “Optimal trellis-based buffered compression and fast approximations,”IEEE Trans. on Image Processing, vol. 3, no. 1, pp. 26-40, Jan. 1994.
– [Berry, 2002] R. Berry and R. G. Gallager, “Communications over fading channels with delay constraints,” IEEE Trans. Info. Theory, vol. 48, no. 5, pp. 1135-1149, May 2002.
Multimedia Communications and Systems Laboratory 51
References
• Foresighted Single Layer (with learning)
– [Chung, 2002] E.-Y. Chung, L. Benini, A. Bogliolo, Y.-H. Lu, and G. De Micheli, “Dynamic power management for nonstationary service requests,” IEEE Trans. on Computers, vol. 51, no. 11, Nov. 2002.
– [Marculescu, 2005] Z. Ren, B. H. Krogh, R. Marculescu, “Hierarchical adaptive dynamic power management,” IEEE Trans. on Computers, vol. 54, no. 4, Apr. 2005.
– [Borkar, 2008] N. Salodkar, A. Bhorkar, A. Karandikar, V. S. Borkar, “An on-line learning algorithm for energy efficient delay constrained scheduling over a fading channel,” IEEE JSAC, vol. 26, no. 4, pp. 732-742, Apr. 2008.
– [Krishnamurthy] M. H. Ngo and V. Krishnamurthy, “Monotonicity of constrained optimal transmission policies in correlated fading channels with ARQ,” IEEE Trans. on Signal Processing, vol. 58, no. 1, pp. 438-451, Jan. 2010.
• Multiuser network optimization
– [Neely, 2010] L. Huang, S. Moeller, M. J. Neely and B. Krishnamachari, “LIFO-Backpressure Achieves Near Optimal Utility-Delay Tradeoff,” Aug. 2010, ArXiv Technical Report, arXiv:1008.4895v1.
– [Neely, 2009] M. J. Neely and R. Urgaonkar, "Optimal Backpressure Routing in Wireless Networks with Multi-Receiver Diversity,"Ad Hoc Networks (Elsevier), vol. 7, no. 5, pp. 862-881, July 2009.
– M. J. Neely, "Energy Optimal Control for Time Varying Wireless Networks", IEEE Trans. On Information Theory, vol. 52, no. 7, pp. 2915-2934, July 2006.
– [Fu, van der Schaar, 2010] F. Fu and M. van der Schaar, “A systematic framework for dynamically optimizing multi-user video transmission,” IEEE JSAC, vol. 28, pp. 308-320, Apr. 2010.
– [Chiang 2007] M. Chiang, S. H. Low, A. R. Caldbank, and J.C. Doyle, “Layering as optimization decomposition: A mathematical theory of network architectures,” Proc. of IEEE, vol. 95, no. 1, 2007.
Multimedia Communications and Systems Laboratory 52
References
• Other
– [Katsaggelos 2008] J. Huang, Z. Li, M. Chiang, and A.K. Katsaggelos, “Joint Source Adaptation and Resource Allocation for Multi-User Wireless Video Streaming,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, issue 5, 582-595, May 2008.
– [Katsaggelos 2007] E. Maani, P. Pahalawatta, R. Berry, T.N. Pappas, and A.K. Katsaggelos, “Resource Allocation for Downlink Multiuser Video Transmission over Wireless Lossy Networks,” IEEE Transactions on Image Processing, vol. 17, issue 9, 1663-1671, September 2008.
– [Su 2007] G.-M. Su, Z. Han, M. Wu, and K.J.R. Liu, “Joint Uplink and Downlink Optimization for Real-Time Multiuser Video Streaming Over WLANs,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 2, pp. 280-294, August 2007.
– [Knopp 1995] R. Knopp and P. A. Humblet, “Information capacity and power control in single-cell multiuser communications,” Proc. IEEE ICC, 1995.
– [Viswanath 2002] P. Viswanath, D. N. C. Tse, R. Laroia, “Opportunistic beamforming using dumb antennas,” IEEE Trans. on Information Theory, vol. 48, no. 6, June 2002.
– [Tse 2005] D. N. C. Tse and P. Viswanath, Fundamentals of wireless communication. Cambridge, U.K.: Cambridge Univ. Press, 2005.
– [Alay 2009] O. Alay, P. Liu, Z. Guo, L. Wang, Y. Wang, E. Erkip, and S. Panwar, “Cooperative layered video multicast using
randomized distributed space time codes”, IEEE INFOCOM Workshops 2009, Rio de Janeiro, Brazil, Oct. 2009, pp. 1–6.
– [Laneman 2003] J.N. Laneman and G.W. Wornell, “Distributed space-time block coded protocols for exploiting cooperative
diversity in wireless networks,” IEEE Trans. Inf. Theory, vol. 49, pp. 2415–2425, Oct. 2003.
– [Sendonaris 2003] A. Sendonaris, E. Erkip, and B. Aazhang, “User cooperation diversity – Part I & II,” IEEE Trans. Commun., vol.
51, pp. 1927–1948, Nov. 2003.
– [Melodia, 2010] T. Melodia and W. Heinzelmann, “Cross-layer optimization in video sensor networks,” IEEE COMSOC MMTC E-
Letter, vol. 5, no. 3, May 2010.
Multimedia Communications and Systems Laboratory 53
Supplementary Slides
Multimedia Communications and Systems Laboratory 54
Multimedia Application Characteristics
• Characteristics
– Stringent delay constraints
– Sophisticated source-coding
dependency structures
– Mixed priorities
– Intense resource requirements
(a) Sequential Dependencies (a) Sequential Dependencies
(b) Typical Hybrid Coder Dependencies (MPEG-2, H.264/AVC)
[Chou, 2006]
(c) Scalable Coding Dependencies
0 1 2 3 4 5 6 7 8 90
1000
2000
3000
4000
5000
6000
7000
8000Complexity profile over time for decoding four layers -- Silent.CIF at 1.5 Mb/s
Time (sec)
Norm
aliz
ed P
rocessor
Tic
ks
0 1 2 3 4 5 6 7 8 90
1000
2000
3000
4000
5000
6000
7000
8000Complexity profile over time for decoding four layers -- Silent.CIF at 1.5 Mb/s
Time (sec)
Norm
aliz
ed P
rocessor
Tic
ks
(c)
Decoding complexity (Silent sequence)
Time (seconds)
N
orm
aliz
ed C
om
ple
xit
y
Multimedia Communications and Systems Laboratory
Reinforcement Learning Architecture
55
Policy
Traffic and Channel
Dynamics
cost
actionstate
Error
Value
Function
Multimedia Communications and Systems Laboratory 56
Conventional Reinforcement Learning Algorithm
• Q-learning
– Experience tuple:
– Q-learning update:
• Exploration vs. Exploitation
– Given state , how do we choose the action ?
– Exploitation: Take greedy action w.r.t. action-value function estimate• Prevents the discovery of potentially better actions
– Exploration: Take suboptimal action w.r.t. action-value function estimate
• Sacrifice immediate performance for possibly improved future performance
( )1, , ,n n n n ns a c sσ +=
ns na
( )1 1( , ) 1 ( , ) min ( , )n n n n n n n n n n n
aQ s a Q s a c Q s aα α γ+ +
′∈
′← − + + A
starting estimate new samplerevised estimate
( )1( , ), | ,n n n n n nE c c s a s p s s a+ ′= ∼
Problem
Problem
Problem
Multimedia Communications and Systems Laboratory 57
Partially Known Dynamics
• Known dynamics
– Goodput:
– PM state transition distribution
– Power cost
– Holding cost
• Unknown dynamics
– Packet arrival distribution:
– Channel state transition:
– Overflow cost
• Post-decision state
– An intermediate state• After known dynamics take place
• Before unknown dynamics take place
( ) ( )| , bin ,1f n n n n np f BEP z z PLR= −
( )lp l
( )|hp h h′
Multimedia Communications and Systems Laboratory 58
Decomposition into Known and Unknown Components
( ) ( ) ( ) ( )k | , | , | ,x fp s s a p x x y p b b BEP z I h h= − =� �� �
( ) ( ) ( ) ( )u | | ,h lp s s p h h p b b I x x′ ′ ′ ′= − =� �� �
( )
( )( ) ( )u
0overflow cost
max ,0 .1
l
l
c s p l b l Bγ
γ
∞
== + −
− ∑ ���������������������
• Known and unknown transition probabilities
• Known and unknown costs
known
unknown
known
unknown
( ) [ ] [ ]( )k
0 power costholding cost
( , ) | , , , , ,z
f
f
c s a p f BEP z b f h x BEP y zµ ρ=
= − +∑�
������� �������������������
The unknown components do not depend on the action
Multimedia Communications and Systems Laboratory 59
Post-Decision State Learning
• Post-decision state learning (online):
– PDS experience tuple: ( )1u, , , ,n n n n n ns a s c sσ += ��
( ) ( ) ( ) ( )u u |s
V s c s p s s V sγ∗ ∗
′′ ′= + ∑� � � �
( ) ( ) ( )k kmin ( , ) | ,
as
V s c s a p s s a V s∗ ∗∈
= + ∑
A �
�� �
(a)
(b)
The PDS value function must
be learned
( ) ( ) ( ) ( )[ ]1 1u1n n n n n n n n nV s V s c V sα α γ+ +← − + +� �� �
starting estimate new samplerevised estimate
[ ] ( )1( ), |n n n nu u uE c c s s p s s+ ′= � �∼
Multimedia Communications and Systems Laboratory 60
Comparison to Prior Work using Post-Decision States [N. Salodkar, 2010]*
Salodkar Proposed
DPM No Yes
AMC Yes Yes
Power-control Yes Yes
Packet losses No Yes
Post-decision state Deterministic Stochastic
Costs Known only Known and unknown
State transitions Known and unknown Known and unknown
Optimization Criteria Undiscounted Discounted
Virtual Experience No Yes
* Differences in the proposed work are highlighted in red
Multimedia Communications and Systems Laboratory 61
Learning Algorithm Performance Comparison
0 2 4 6
x 104
0
5
10
15
20
25
Time slot (n)
Ho
ldin
g C
ost
(a)
x 10
PDS + Virtual Experience (update period = 1) PDS + Virtual Experience (update period = 10) PDS + Virtual Experience (update period = 25) PDS + Virtual Experience (update period = 125)PDS Learning PDS Learning (No DPM) Q-learning
0 2 4 6
x 104
0
0.1
0.2
0.3
0.4
Time slot (n)
θo
ff
(e)
0 2 4 6
x 104
200
250
300
Time slot (n)
Po
we
r (m
W)
(b)
*[Borkar, 2008]
*
Multimedia Communications and Systems Laboratory 62
Comparison to Optimal Policy With Imperfect Statistics
100
101
102
103
104
105
0
2
4
6
8
10
12
Time slot (n)
Hold
ing
Co
st
100
101
102
103
104
105
0
50
100
150
200
250
300
350
Time slot (n)
Pow
er
(mW
)
PDS + Virtual Experience (update period = 1)Optimal policy (imperfect statistics)
Multimedia Communications and Systems Laboratory 63
Non-Stationary Arrivals
0 2 4 6
x 104
0
100
200
300
400
Time slot (n)Exp
ect
ed a
rriv
al r
ate
(pa
cke
ts/s
)
• Unobservable 5-state Markov modulated process
– States
• Expected arrival rate for a Poisson arrival process
• (0, 100, 200, 300, 400) packets/s
– Stationary distribution
• (0.0188, 0.3755, 0.0973, 0.4842, 0.0242).
Multimedia Communications and Systems Laboratory 64
Non-Stationary Channel Transitions
• Channel state transition probabilities vary over time as an AR(1) process.
Self-transition probabilities
(White indicates a relatively high self-transition probability)
Multimedia Communications and Systems Laboratory 65
Information-Theoretic Power Cost
( )
2
2( , ) 2 1zc h zh
σ= −
Multimedia Communications and Systems Laboratory 66
Physical Layer: Adaptive Modulation and Power Control
• Transmission rate:
– bits per symbol:
– packet length (bits):
– packet rate (packets/s):
• Bit-error probability (BEP):
• SNR:
/n sTβ
L ( )/n n
sr LTβ=
4 3
2 1n
nn n
nBEP Q hβ
γ
β
≤ − ( ) ( )
2 /21/ 2 u
xQ x e duπ
∞ −∫�
tx
0
s nTP
Nγ =
1nβ ≥
Multimedia Communications and Systems Laboratory 67
Physical Layer: Adaptive Modulation and Power Control
• Variables of interest
– Packet throughput (packets/time slot):
– Bits per symbol:
– Transmission power (watts):
– Bit-error probability:
β
txPBEP
zpacket throughput
/n nsz LT tβ = ∆
bits per symbol
( ) 20 1
tx
2 1
3 4
nn
n nns
NP Q BEP
h T
ββ−− ≥
transmission power
n nBEP Q h
bit-error probability
Decision variable 1
Decision variable 2
Adaptive Modulation
Power Control
z
Multimedia Communications and Systems Laboratory 68
Initializing the PDS Value Function
• PDS Value iteration:
• Initialization:
– Define reasonable estimates: and
– Perform PDS value iteration with estimates
( ) ( ) ( ) ( )u u |k ksV s c s p s s V sγ ′
′ ′= + ∑� � � �
( ) ( ) ( ){ }1 k kmin ( , ) | ,k a ksV s c s a p s s a V s+ ∈= +∑A �
�� �
� ( )uc s� � ( )u |p s s′ �
Multimedia Communications and Systems Laboratory 69
Impact of Initial Conditions
• Initial arrival rate is assumed deterministic or uniform
• Channel state is assumed constant
3.9 3.95 4 4.05 4.1 4.15 4.2180
190
200
210
220
230
240
Holding cost (packets)
Pow
er
(mW
)
Init. Arr. Rate = 100 packets/sInit. Arr. Rate = 200 packets/sInit. Arr. Rate = 300 packets/sInit. Arr. Rate = 400 packets/sInit. Arr. Rate = 500 packets/sInit. Arr. Rate = 600 packets/sInitialized Arrival Rate = Uniform
Multimedia Communications and Systems Laboratory 70
Traffic State Transition Illustration
tttt
t t t t +1+1+1+1
t t t t +2+2+2+2
Ft = (1,2,3)
bbbbt ====(4,3,2)
Ft+1 = (2,3,1,4)
bbbbt+1 = = = = (3,2,6,1)
Ft+2 = (1,4,2,3)
bbbbt+2 = = = = (4,-1,4,1)
yyyyt = = = = (4444,0,0)
yyyyt+1 = = = = (0,0,2222,0)
yyyyt+2 = = = = (4444,0,1111,0)
Traffic StateScheduling
Action
Multimedia Communications and Systems Laboratory
Phase I Time Fraction
• K and Q symbols have to be transmitted in phase I and phase II, respectively
71
( ),1 ,2 ,1 ,2 0
11 1
1 /
i c ct i i i i i
t t c t t tR
R Rρ
β β β β β
+= ⇒ + <
+
( ),1 ,21i i i it t t tQ Kρ β ρ β−=
1 is the rate of the orthogonal STB/ C rulecR K Q= ≤
Multimedia Communications and Systems Laboratory 72
Reformulated Multi-user Markov Decision Process
• Global state:
• Decision variables:
– Scheduling action:
• Dynamic programming equation:
• Constraints:
• Challenges:
– Complexity is proportional to , which scales exponentially in
– Traffic state information is local to users
, where
S M
, where
Multimedia Communications and Systems Laboratory 73
Optimization Decomposition
• The multi-user optimization can be decomposed into local MDPs satisfying:
• Requires message exchanges between users and the AP
– Users ���� AP: Discounted infinite horizon resource consumption
– AP ���� Users: Uniform resource price to manage congestion
Multimedia Communications and Systems Laboratory
Proposed Opportunistic Cooperation Protocol (1/2)
• When is cooperative transmission better than direct transmission?
• Candidate cooperative nodes can self-select themselves:
• AP verifies fulfillment of following condition:
• If satisfied, then cooperation is better; otherwise, choose direct
74
,coop 0
,1 ,2 0
11i i c ct t i i i
t t t
R Rβ β
β β β
+> ⇒ + <
0
, where 1
:1
0i
i t c ct t ti
c ct
R R
R R
βξ ξ
β
= < ≤
+
+≤C
( ),2 0
111 c
ti it t
Rξ
β β
+< −
Multimedia Communications and Systems Laboratory
Proposed Opportunistic Cooperation Protocol (2/2)
• RTS
– Request to send
• CRS
– Cooperative recruitment signal
• HTS
– Help to send
• CTS
– Clear to send
75
{ }, , 0 ,it Mh ∀ ∈ …
*Dual channels, i.e. i it th h=
0 to candidatesitβ
,2itRh
,2 and and i it tz β λ
Multimedia Communications and Systems Laboratory
Computation of Transmission Rates
• Direct rate
• Phase I rate
• Phase II rate
76
Multimedia Communications and Systems Laboratory 77
Cooperation Statistics