Online Learning for Energy-Efficient Multimedia...

77
Multimedia Communications and Systems Laboratory 1 Online Learning for Energy-Efficient Multimedia Systems Nick Mastronarde [email protected] PhD Defense May 6, 2011

Transcript of Online Learning for Energy-Efficient Multimedia...

Page 1: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 1

Online Learning for Energy-EfficientMultimedia Systems

Nick Mastronarde

[email protected]

PhD Defense

May 6, 2011

Page 2: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

2

• Old: Higher multimedia quality is better

– Optimize rate-distortion performance

• H.264/AVC

– Minimize delay

– Minimize distortion

– …

• New: Quality costs power

SurveillanceVideo conferencing Sensor networks Data centersIn home

Resource intensive multimedia applications are booming over a variety of resource constrained networks and systems

Delay,Distortion

Energy

My Focus!Energy-efficient resource management

Page 3: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Performance Metrics and High-level System Model

• Performance metric depends on the system and application

– Minimize energy subject to QoS constraint

– Optimize QoS subject to energy budget

– …

• For example:

– E[Cost] = E[Energy] + µE[Delay]

3

QoSDelay,

Distortion

Buffer

ServerSource

Service AdaptationSource Adaptation

I P

B

P

B

Multimedia Data

Scheduling

Page 4: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Two types of optimization objectives

• Myopic:

– Minimize expected immediate cost

• Foresighted:

– Minimize expected immediate cost + expected future cost

– Why?

• Power & Delay: Time to transmit current packet impacts time available (and

power required) to transmit future packets before their deadlines

• Multimedia Utility: Scheduling decisions at the current time impact future

scheduling decisions due to source-coding dependencies

4

E[Cost] = E[Energy] + µE[Delay]

Suboptimal!

My Focus!

Page 5: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Foresighted Optimization

• How does foresighted optimization work?

– In time slot n, take transmission action to minimize:

( )( )( , ) , ,wc s a V f s a w+ E

Dynamics:wAction: a

Current cost Expected future cost

State: s State: ( ), ,s f s a w′ =

Time n Time n+1

ChannelBuffer backlogMM Data state

SchedulingAMC

ChannelData arrivals

Tx errors

Myopic solutions are suboptimal because they

ignore the expected future utility

5

Page 6: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 6

Challenges

• Challenge 1: Unknown dynamic environments

– Dynamic traffic and channel conditions

– Lack of statistical knowledge of dynamics

– Fast learning algorithms

• Challenge 2: Heterogeneous multimedia data

– Different deadlines, priorities, dependencies

• Challenge 3: Multi-user

– Coupling due to shared resources

– Curse of dimensionality

Page 7: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 7

Existing Solutions (1/2)

• Cross-layer optimization in multimedia communications and systems

– Myopic: Ignore the impact of current decisions on the future performance. [Nahrstedt 2006, 2007, He 2005, Sachs 2003, Mohapatra 2005, van der Schaar 2003, 2007]

• Single-layer optimizations

– Hardware layer (dynamic power management): [Benini 1999, Chung 2002, Marculescu 2005]

• Learning solutions require too much memory or are too complex

– Physical layer (transmission power-control)• Optimal solutions require statistical knowledge of dynamics [Berry 2002]

• Learning solutions are slow to converge [Borkar 2008]

– Application layer (multimedia rate-control) [Ortega 1994]

• Rate-distortion characteristics are assumed to be known

Page 8: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 8

Existing solutions (2/2)

• Multi-user network optimization

– Network utility maximization [Chiang 2007]

• Static utility function

• Ignores network dynamics

• Ignores packet deadlines, priorities, and dependencies

• No learning for unknown environments

– Stability-constrained optimization [Neely 2006]

• Guarantees queue stability, but achieves suboptimal power consumption in

low delay region

• Ignores packet deadlines, priorities, and dependencies

Page 9: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 9

Improvement over state-of-the-art

Problem setting Previous state-of-the-art Achieved improvement

Point-to-point energy-efficient wireless

communication

[Mastronarde 2011b]

Heuristic policy

[Nahrstedt 2007]

Reinforcement learning

[Borkar, 2008]

Reduce power by up to 33% for same delay

(in non-stationary environment)

Reduce delay and power by up to 50%and 23%, respectively, after 3000 learning steps

Cooperative multi-user video transmission

[Mastronarde 2011a]

Non-cooperative multi-user video transmission

[Fu, van der Schaar, 2010]

Improve 5 – 10 dB PSNR for nodes with feeble direct signals

Cross-layer multimedia system optimization*

[Mastronarde 2010, 2009b]

Cross-layer adaptation

[Nahrstedt 2005]

Improve up to 7 dB PSNR and reduce power by 21%

The proposed framework achieves...

*Prior work presented during Qualifying Exam

Page 10: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 10

Overview

• Part I: Fast reinforcement learning for energy-efficient wireless communication [Mastronarde, 2011b]

– Post-decision state learning

– Virtual experience learning

• Part II: A distributed cross-layer approach to cooperative video transmission [Mastronarde, 2011a]

– Multi-user Markov decision process formulation

– Mitigating the curse of dimensionality

Page 11: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 11

Overview

• Part I: Fast reinforcement learning for energy-efficient wireless communication [Mastronarde, 2011b]

– Post-decision state learning

– Virtual experience learning

• Part II: A distributed cross-layer approach to cooperative video transmission [Mastronarde, 2011a]

– Multi-user Markov decision process formulation

– Mitigating the curse of dimensionality

Page 12: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 12

The Solved Energy-efficient Wireless Communication Problem (1/2)

• Point-to-point time-slotted wireless communication system

• Minimize power consumption subject to buffer delay constraint

– Little’s law: Average buffer delay is proportional to average buffer occupancy

nb

nh

nBEP

nxny nz

nl nf

Page 13: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 13

The Solved Energy-efficient Wireless Communications Problem (2/2)

• System variables

– Buffer occupancy state:

– Channel state: -- Finite state Markov chain (e.g. Rayleigh fading)

– Power management state:

– Data arrivals: -- i.i.d.

• Decision variables (actions)

– Packet throughput:

– Bit-error probability:

– Power management action:

{ }0, ,nb B∈ …

{ }on,offnx ∈

nh

nl

, 0n n nz z b≤ ≤nBEP

{ }s_on, s_offny ∈

Goodput , 0n n nf f z≤ ≤

nb

nh

nBEP

nxny nz

nl nf

Page 14: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 14

Buffer Model

• Buffer state: ,

– Buffer recursion

– Controlled Markov chain with transition probabilities:

{ }: 0,1,nb n∈ =B … { }0,1, ,B=B …

[ ]( )| , , , , ,bp b b h x BEP y z′

( )( )

0init

1 min , , ,n n n n n n

b b

b b f BEP z l B+

=

= − +

( ) ( )( ) ( )

0

0

| , , if

| , , if

z l f

f

z l f

f l B b f

p b b f p f BEP z b Bp b b h x BEP y z

p l p f BEP z b B

=∞

= = − −

′ ′− − < = ′ =

∑∑ ∑

Page 15: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 15

Power Management Model

• Power management state:

– Controlled Markov chain with transition probabilities [Benini 1999]

{ }: 0,1,nx n∈ =X …

Switch “on”

Switch “off”

( ) ( )[ ],

| ,x x

x xy p x x y ′

′=P

• Switching wireless card “on” or “off”

– Incurs transition power penalty (watts):

– Incurs expected transition delay:

trP

t∆

( )

( )

on off

on 1 0s_on

off 1 0

on off

on 0 1s_off

off 0 1

x

x

=

=

P

P

Page 16: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 16

Costs

We want to achieve the optimal power subject to a buffer constraint

• Power cost:

• Buffer cost:

[ ] [ ] [ ]( ),

holding cost overflow cost

( , , , , ) max ,0f lg b x BEP y z b f b f l Bη = − + − + −

E ������� �������������������������

Proportional to the delay

(by Little’s law)

Provides incentive to tx packetsinstead of dropping them

[ ]( )

( )[ ]on tx

off

tr

, , , if on, s_on

, , , , , if off, s_off

, otherwise,

P P h BEP z x y

h x BEP y z P x y

P

ρ

+ = == = =

tr on off 0PP P≥ > ≥

Page 17: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 17

Formulation as Markov Decision Process (MDP)

• State:

• Action:

• Policy:

• Cost:

• Transition probability:

( ), ,s b h x�

( ), ,a BEP y z�

( ) [ ]( ) [ ]( )Buffer costPower cost

, , , , , , , , ,c s a h x BEP y z g b x BEP y zρ µ= + ����������������������������������

( ) [ ]( ) ( ) ( )Buffer state Power stateChannel state

| , , , , , , | | ,b h xp s s a p b h x BEP y z p h h p x x y′ ′ ′= ��������������������� ��������� �����������

: s aπ →

Page 18: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Value Functions

• State-value function:

• Optimal state-value function:

• Optimal policy:

( ) argmin ( , ), a

s Q s a sπ∗ ∗

∈= ∀ ∈

A

S

( ) ( )( ) ( )( ) ( )Current cost Expected future cost

, | ,s

V s c s s p s s s V sπ ππ γ π′∈′ ′= + ∑ S��������� ���������������������������

( ) ( ) ( ) ( ){ }( ),

min , | ,sa

Q s a

V s c s a p s s a V sγ

∗ ∗′∈

′ ′= + ∑ S�����������������������������������

If and are known, this is a simple numerical problem…( ),c s a ( )| ,p s s a′

18

Page 19: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Conventional Reinforcement Learning Algorithm:Q-Learning

19

( )1, , ,n n n n na cs sσ += ( )1

( , )

| ,

n n n

n n n

E c c s a

s p s s a+

= ′∼

( ) ( ) ( )( )

1

1

, 1 ,

min ,

n n n n n n n

n n n n

a

Q s s

c s

a Q a

Q a

α

α γ

+

+

′∈

← −

′+ + A

( )max

1

onto 0,

: average delay constraint

0,1 : learning

: projec

rat

ts

e

n

n n n ng

µ µ

δ

µ µ

β

β δ+ = Λ + −

Λ

Initialization at time n=0

Take Action

(Exploration vs. Exploitation)

Observe Experience

Update the Action-Value Function

Update the Lagrange Multiplier

n=n+1

( )0 ,, , as sQ a∀ ∈ ∀ ∈S A

( ) ( )( )

argmin , , with probability 1

rand , with probability

nan

n n n

n

Q s aa

−= A

ε

ε

Problem

Problem Problem

Page 20: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 20

Post-Decision State Definition

Definition: An intermediate state after the known dynamics take place, but before the unknown dynamics take place.

State(time n)

State(time n+1)

Post-decision state(time n)

( ), ,n n n ns b h x=

( ), ,n n n na BEP y z= ( )lp l ( )|hp h h′

( )[ ]( )1, ,

, ,

n n n n

n n n n

s b h x

b f h x +

=

= −

� �� �

Known Unknown

Deterministic •PM state transition

•Power cost

•N/A

Stochastic •Goodput distribution

•Holding cost

•Traffic arrival distribution

•Channel state distribution

•Overflow cost

( )[ ]( )

1 1 1 1

1 1

, ,

, ,

n n n n

n n n n n

s b h x

b f l h x

+ + + +

+ +

=

= − +

Page 21: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 21

Post-Decision State Generalization

• Transition probability function

• Cost function

( )k k u( , ) ( , ) | , ( , )s

c s a c s a p s s a c s a= + ∑�

� �

s s ′→� s s→ �

s s ′→� s s→ �

KnownUnknown

Known Unknown

( ) ( ) ( )u k| , | , | ,s

p s s a p s s a p s s a′ ′= ∑ �� �

In a large class of wireless systems

( )u u| , |p s s a p s s′ ′=� �

u u( , ) ( )c s a c s=� �

Page 22: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 22

Post-Decision State Value Function

State(time n)

State(time n+1)

Post-decision state(time n)

n n n ns b h x

n n n na BEP y z

n n n ns b h x� � 1ns +

( )k | , | , | ,p s s a p x x y p b b BEP z I h h� � ( )u |p s s′ �

( )uc s� k( , ) | , , , , ,c s a p f BEP z b f h x BEP y z

( )V s∗ ′( )V s∗� �( )V s∗

( ) ( ) ( ) ( )u u |s

V s c s p s s V sγ∗ ∗

′′ ′= + ∑� � � �

( ) ( ) ( )k kmin ( , ) | ,

as

V s c s a p s s a V s∗ ∗∈

= + ∑

A�

�� �

(a)(b)

(a)

(b)

known unknown

The PDS value function must

be learned

Page 23: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Post-Decision State Learning

23

( ) ( )0 0, V s V s s∀ ∈ S�

( ) ( )k kargmin ( , ) | ,n n n n

a s

a c s a p s s a V s∈

= + ∑

A �

�� �

( ) ( ) ( )1 1 1k kmin ( , ) | ,n n n n n

as

V s c s a p s s a V s+ + +

= + ∑

A�

�� �

( ) ( ) ( ) ( )1 1u1n n n n n n n n nV s V s c V sα α γ+ + ← − + +

� �� �

( )max

1

onto 0,

: average delay constraint

0,1 : learning

: projec

rat

ts

e

n

n n n ng

µ µ

δ

µ µ

β

β δ+ = Λ + −

Λ

( )1u, , , ,n n n n n ns a s c sσ +=� �( )1

( )

|

n nu u

n nu

E c c s

s p s s+

= ′

�∼

No Exploration!

Integrates known information!

Problem

Page 24: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 24

Virtual Experience Learning

• Problem: PDS learning only updates one PDS in each time slot

• Observation: unknown dynamics and are independent of the buffer and power management states

– Learn about all buffer and power management states in each time slot!

– Improve adaptation speed at the expense of increased complexity.

( )lp l ( )|hp h h′

( )1u, , , ,n n n n n ns a s c sσ += ��

( ) ( )( ) ( ){ }1u, , , , , ; , , , | ,n n n n n n ns a b h x c b l b l h x b xσ + ∑ = + ∀ ∈ × B X� � � � �� � ��

• Actual PDS experience tuple:

• Set of virtual experience tuples:

Current VE state Next VE state

( ) ( )u ; max , 0c b l b l Bη= + −� �VE cost

Page 25: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 25

...

...Throughput

...

...

...

...Throughput

...

...Throughput

...

...

Comparison of Learning Algorithms

Action Selection Complexity Learning Update Complexity

Q-learning ( )O A ( )O A

PDS learning ( )O S A� ( )O S A�

Virtual experience learning ( )O S A� ( )O ∑ S A�

=S B�

∑ = ×B X

Page 26: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 26

Simulation Setup

• PHY layer: QAM square constellations + Gray code

• Unknown channel transition and packet arrival distributions

Simulation Parameters

Parameter Value Parameter Value

Arrival rate λ 200 packets/second Packet loss rates PLR {1, 2, 4, 8, 16} %

Buffer size B 25 packets Power management actions y ∈ Y { }s_on, s_off

Channel states h ∈ H {-18.82, -13.79, -11.23, -9.37,

-7.80, -6.30, -4.68, -2.08} dB Power management states x ∈ X { }on, off

Holding cost constraint 4 packets Time slot duration t∆ 10 ms

“Off” power offP 0 watts Transmission actions*

z ∈ Z

{0, 1, 2, … , 10}

packets/time slot

“On” power onP 80 mW, 160mW, or 320 mW Discount factor γ 0.98

Transition power trP Set equal to onP Noise power spectral density

10

112 0N −×=

watts/Hz

*Symbol rate 31 / 50 100sT = × symbols/s

Packet size 5000= bits

Bits per symbol { }1,2, ,10β ∈ …

Page 27: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Learning Algorithm Performance Comparison (1/2)

27

0 2 4 6

x 104

0

5

10

15

20

25

Time slot (n)

Ho

ldin

g C

ost

0 2 4 6

x 104

200

250

300

Time slot (n)

Po

we

r (m

W)

0 2 4 6

x 104

0

0.1

0.2

0.3

0.4

Time slot (n)

θo

ff PDS Learning PDS Learning (No DPM) Q-learning

PDS + Virtual Experience (update period = 1) PDS + Virtual Experience (update period = 10)

*

****[Borkar, 2008]

Page 28: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Learning Algorithm Performance Comparison (2/2)

28

0 2 4 6

x 104

0

5

10

15

20

25

Time slot (n)

Ho

ldin

g C

ost

0 2 4 6

x 104

200

250

300

Time slot (n)

Po

we

r (m

W)

0 2 4 6

x 104

0

0.1

0.2

0.3

0.4

Time slot (n)

θo

ff

PDS + Virtual Experience (update period = 1) PDS + Virtual Experience (update period = 10) PDS + Virtual Experience (update period = 25) PDS + Virtual Experience (update period = 125)PDS Learning

Page 29: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 29

Comparison to State-of-the-Art

• Threshold-k [Nahrstedt, ’07]

– If backlog exceeds k, then turn on wireless card and transmit all packets

– After transmitting, turn card off

– Ignore channel conditions

• Non-stationary dynamics

– Markov modulated arrival process using unobservable 5-state Markov chain

– Time-varying channel transition probabilities

1 2 3 4 5 6 7 8

50

100

150

200

250

Holding cost (packets)P

ow

er

(mW

)

ProposedThreshold-k

11% – 33% improvement for same holding cost

*Update period for proposed: T = 50 time slots

Page 30: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 30

Summary:Fast learning for energy-efficient wireless communications

• Proposed first unified power management framework for delay-sensitive wireless communication

– Integrate system-level and physical-layer centric power management

• Exploited structure of the problem to improve learning performance

– Post-decision state

• Separation of known and unknown dynamics

• Eliminate need for exploration

– Virtual experience learning

• Independence of unknown dynamics and components of state

Page 31: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 31

Overview

• Part I: Fast reinforcement learning for energy-efficient wireless communication [Mastronarde, 2011b]

– Post-decision state learning

– Virtual experience learning

• Part II: A distributed cross-layer approach to cooperative video transmission [Mastronarde, 2011a]

– Multi-user Markov decision process formulation

– Mitigating the curse of dimensionality

Page 32: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 32

Multi-user Wireless Video Network With Cooperation

Cooperative phase II uses randomized

space-time block coding rule

• Direct mode: Transmit at data rate• Cooperative mode:

– Phase I: transmit at data rate

– Phase II: transmit at data rate

– Cooperative: data rate

0 (bits/s/Hz) for (s)it t

iRxβ

,1 (bits/s/Hz) for (s)i it t t

iR xβ ρ

( ),2 (bits/s/Hz) for 1 (s)i it t

itR xβ ρ−

( ),coop ,1 ,2= 1 (bits/s/Hz) i i i i it t t t tβ ρ β ρ β+ −

12th

Cooperative - Phase I

Cooperative - Phase II

13th

30th

20th

10th

( )

( )

1

1,2 20 30

2,3

,

t

t tt h h

=

=h

C

1 10t, tRx β

1,11 1, t t tR xρ β ( ) 1,21 11 , t t tR xρ β−

: Time slot duration (seconds)

: Transmission time fraction in [0,1]

: Phase I time fraction in [0,1]

R

itx

itρ

Page 33: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 33

Prior Work

• Throughput-maximizing (opportunistic) multiple access policies [Knopp 1995], [Viswanath 2002], [Tse 2005]

– Schedule nodes with good fades

– Ignore delay deadlines, priorities, and dependencies

• Cross-layer solutions [Katsaggelos 2007, 2008], [Su 2007], [van der Schaar 2010], [Melodia 2010]

– Balance between scheduling easy nodes and most important nodes

– Underlying inefficiency in network resource usage• Users with high priority data, but worse fades, get access to the channel

Cooperation reduces inefficiency!

Enables users with feeble direct signals, but high priority data, to exploit channel diversity

Page 34: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 34

A Sophisticated Traffic Model for Video

• Traffic state:

– Schedulable frame set:

– Buffer state:

Simple IBPB IBPB... GOP structure Illustrative Traffic State

( ),i i it t t= bT F

( )|i itt jib j ∈=b F

{ }( )| , 1, ,i it jd t t t Wj ∈ + …= +F

Page 35: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 35

Traffic State Transition Illustration

tttt

t t t t +1+1+1+1

t t t t +2+2+2+2

Ft = (1,2,3)

bbbbt ====(4,3,2)

Ft+1 = (2,3,1,4)

bbbbt+1 = = = = (3,2,6,1)

Ft+2 = (1,4,2,3)

bbbbt+2 = = = = (4,-1,4,1)

yyyyt = = = = (4444,0,0)

yyyyt+1 = = = = (0,0,2222,0)

yyyyt+2 = = = = (4444,0,1111,0)

Traffic StateScheduling

Action

Page 36: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 36

Multi-User Markov Decision Process Formulation

• States

– Channel state (i.i.d.):

– Traffic state:

• Actions

– Scheduling action:

– Cooperation decision:

• Utility and Transition Probability

{ } { }, for 0,1,2 , it ti

h i M≠ ∈ …=H

( ),i i it t t= bT F

( ), |i it t j

ity j= ∈y F

{ }0 direct, 1 cooperativeitz ∈ − −

( ) ,, it

i i t it t j jj

it tu s q y

∈= ∑y

F( ) ( ) ( )1 1

11| , | ,M

i i it t tt t t

itp p p+ ++

=

= ∏s s y H yT T

Distortion reduction for packets belonging to frame j

Page 37: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Feasible Scheduling Actions

• Constraint set:

– Buffer constraint:

– Packet constraint

– Dependency constraint:

37

( ), ,i ii it t tz∈y HP T

, ,0 i it j t jy b≤ ≤

( )1

i it ti

t

s

zR

PT

β≤y

( ), , ,if , then 0i i it k t k t jk j b y y− =≺

Page 38: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 38

Optimization Objective

• Decision variables:

– Scheduling action:

– Cooperation decision:

• Dynamic programming equation:

• Subject to

• Challenges:

– Complexity is quadratic in , which scales exponentially in and

– Traffic state information is local to users

, where

S M 2M

( ) ( ) ( ) ( ) ( )1

,1

max , | , ,M

i i

i

Mi i i i

i

U u p p Uα′ =∈

∗ ∗

=

′ ′= + ′ ∀∑ ∑ ∏

ys

zHs y y s s

S

T T T

( )21 ,, ,t t t t

TM= …y y y y

( )21 ,, ,t t t t

TM= …z z z z

( )1

, , and 1i i iM

i i

i

z x=

∈ ≤∑y HP T1( )

i

i i

i sPTx

zRβ= y

Page 39: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Mitigating the Curse of Dimensionality

• Problem 1: Complexity scales exponentially in

– Theorem: Cooperation decision that maximizes immediate throughput is long-term optimal [Mastronarde, 2011a]

– Implications of theorem: • Instead of tracking track maximum transmission rates

• Use an opportunistic cooperation scheme for cooperation decision

• Problem 2: Complexity scales exponentially in

– Solution [Fu, van der Schaar, 2010]: Lagrangian relaxation with a resource price

• The resulting MU-MDP can be decomposed into one local MDP per user

• Optimal resource price can be determined using subgradient method

M

2M

λ

tH

( ){ }max ,i

i i i

z

zβ β∗ = H

39

1

ii tt

s

R

PT

β ∗

≤y⇒1

i s it ti

tR

PTx

β ∗= y⇒

Page 40: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 40

Simulation Setup

• Scenarios:

– Homogeneous

• Foreman (CIF, 30 Hz, 1.5 Mb/s)

– Heterogeneous 1

• Coastguard (CIF, 30 Hz, 1.5 Mb/s)

• Mobile (CIF, 30 Hz, 2.0 Mb/s)

• Foreman (CIF, 30 Hz, 1.5 Mb/s)

– Heterogeneous 2

• Coastguard (CIF, 30 Hz, 1.5 Mb/s)

• Foreman (CIF, 30 Hz, 1.5 Mb/s)

• Mobile (CIF, 30 Hz, 2.0 Mb/s)

Parameter Description Value

L Length of the STBC 2

cR Rate of orthogonal STBC rule 1

ξ Self-selection parameter 0.20

P Packet size 8000 bits

BEP Bit error probability target 310−

δ Path loss exponent 3

cellR WLAN coverage radius

(5 dB SNR at boundary) 100 m

M Number of nodes

(excluding the AP) 50

α Discount factor 0.80

1/ sT Symbol rate

(symbols per second)

625000 or

1250000

0 1if will self-select itself as a coop , then nod re laye

it c

tict

R

R

βξ

β

+≤

****

****

Page 41: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Network Topology

-100 -50 0 50 100-100

-50

0

50

100

APVideo SourcesPotential Relays

-100 -50 0 50 100-100

-50

0

50

100

-100 -50 0 50 100-100

-50

0

50

100

41

Source Distance to AP Angle

1 20 m 25º

2 45 m -30º

3 80 m 0º

Page 42: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 42

Transmission Rates

• A: Feeble direct

• B: Strong direct

• C: Cooperative gains

• D: Homogeneous allocation

• E: Heterogeneous allocation

Cooperative (Low Congestion)Direct (Low Congestion)Cooperative (High Congestion)Direct (High Congestion)

1 2 30

200

400

600

800

1000

1200

1400

1600

1800Homogeneous (Foreman)

1 2 30

200

400

600

800

1000

1200

1400

1600

1800Heterogeneous 1 (Coastguard, Mobile, Foreman)

1 2 30

200

400

600

800

1000

1200

1400

1600

1800Heterogeneous 2 (Coastguard, Foreman, Mobile)

Avg. Transmission Rate (Kbps)

Avg. Transmission Rate (Kbps)

Avg. Transmission Rate (Kbps)

Page 43: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 43

Video Quality Comparison

• A: Feeble direct � video undecodable at receiver

• B: Cooperation achieves 5-10 dB PSNR improvement for nodes with feeble direct signals

• C: Cooperation minimally impacts nodes with strong direct signals

Streaming

Scenario

Transmission

Mode

Video User 1 @ 20 m

(Low / High)

Video User 2 @ 45 m

(Low / High)

Video User 3 @ 80 m

(Low / High)

Homogeneous

Foreman Foreman Foreman

Direct 36.82 dB / 36.51 dB 35.85 dB / 30.20 dB 29.89 dB / --- dB

Cooperative 36.69 dB / 35.82 dB 36.58 dB / 34.83 dB 36.04 dB / 27.12 dB

Change -0.13 dB / -0.69 dB 0.73 dB / 4.63 dB 6.15 dB / --- dB

Heterogeneous

1

Coastguard Mobile Foreman

Direct 32.30 dB / 31.09 dB 26.74 dB / 24.53 dB 25.94 dB / --- dB

Cooperative 31.94 dB / 30.89 dB 27.14 dB / 25.8 dB 35.69 dB / 27.12 dB

Change -0.36 dB / -0.20 dB 0.4 dB / 1.27 dB 9.75 dB / --- dB

Heterogeneous

2

Coastguard Foreman Mobile

Direct 31.91 dB / 31.72 dB 35.16 dB / 32.75 dB 21.85 dB / --- dB

Cooperative 31.56 dB / 30.97 dB 35.72 dB / 32.39 dB 26.53 dB / 22.03 dB

Change 0.35 dB / -0.75 dB 0.56 dB / -0.36 dB 4.68 dB / --- dB

Page 44: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Video Quality Example

• Video user 3 @ 80 m

• Low congestion

44

Original

Direct Transmission26.9 dB PSNR

Cooperative Transmission34.7 dB PSNR

Page 45: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 45

Optimal Resource Price

Streaming

Scenario

Transmission

Mode

Resource Price

(Low / High)

Homogeneous

Direct 45.79 / 42.97

Cooperative 38.72 / 52.56

Change -6.93 / 9.59

Heterogeneous

1

Direct 51.01 / 53.17

Cooperative 48.02 / 71.94

Change -2.99 / 18.77

Heterogeneous

2

Direct 68.24 / 41.48

Cooperative 62.61 / 72.86

Change -5.63 / 31.38

Page 46: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 46

Summary:Multi-user cooperative video transmission

• Multi-user MDP based approach

– Enables high priority nodes to exploit diversity of channel fading states in the network

– Improves video quality of feeble (distant nodes) by 5-10 dB PSNR• Reduces quality of nodes with strong direct signals by < 1 dB

– Resource price for managing congestion• Increases in congested networks

• Decreases in uncongested networks

• Mitigate complexity

– Opportunistic cooperation is long-term optimal

– Decompose problem into local MDPs for each user

Page 47: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Impact in Industry

Company Impact

Sanyo (i) Energy-efficient point-to-point wireless communication(ii) Cooperative video transmission

Intel Optimal video encoder mode decisions

IBM Learning for data exploration

Skype Rigorous modeling and optimization using MDP and reinforcement learning

47

Page 48: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 48

Thank you!

http://www.ee.ucla.edu/~nhmastro/

http://medianetlab.ee.ucla.edu/

Page 49: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 49

My Journal Papers

1. [Mastronarde, 2011b] N. Mastronarde and M. van der Schaar, “Fast reinforcement learning for energy efficient wireless communications,” in review.

2. [Mastronarde, 2011a] N. Mastronarde, F. Verde, D. Darsena, A. Scaglione, and M. van der Schaar, “Transmitting important bits and sailing high radio waves: a decentralized cross-layer approach to cooperative video transmission,” in review.

3. [Mastronarde, 2010] N. Mastronarde and M. van der Schaar, “Online reinforcement learning for dynamic multimedia systems,” IEEE Trans. on Image Processing, vol. 19, no. 2, pp. 290-305, Feb. 2010.

4. [Mastronarde, 2009c] N. Mastronarde and M. van der Schaar, “Designing autonomous layered video coders,”Elsevier Journal Signal Processing: Image Communication – Special Issue on Scalable Coded Media Beyond Compression, vol. 24, no. 6, pp. 417-436, July 2009.

5. [Mastronarde, 2009b] N. Mastronarde and M. van der Schaar, “Towards a General Framework for Cross-Layer Decision Making in Multimedia Systems,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 19, no. 5, pp. 719-732, May 2009.

6. [Mastronarde, 2009a] N. Mastronarde and M. van der Schaar, “Automated bidding for media services at the edge of a content delivery network,” IEEE Trans. on Multimedia, vol. 11, no. 3, pp. 543-555, Apr. 2009.

7. [Mastronarde, 2008] N. Mastronarde and M. van der Schaar, “A bargaining theoretic approach to quality-fair system resource allocation for multiple decoding tasks,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, no. 3, Mar. 2008.

8. [Mastronarde, 2007b] N. Mastronarde and M. van der Schaar, "A queuing-theoretic approach to task scheduling and processor selection for video decoding applications," IEEE Trans. Multimedia, vol. 8, no. 7, pp. 1493-1507, Nov. 2007.

9. [Mastronarde, 2007a] N. Mastronarde, D. S. Turaga, and M. van der Schaar. “Collaborative resource exchanges for peer-to-peer video streaming over wireless mesh networks,” IEEE J. on Select. Areas in Communications Peer-to-peer Communications and Applications, vol. 25, no. 1, pp. 108-118, Jan. 2007.

10. [Mastronarde, 2006] Y. Andreopoulos, N. Mastronarde, and M. van der Schaar, “Cross-layer optimized video streaming over wireless multi-hop mesh networks,” IEEE J. on Select. Areas in Communications Multi-Hop Wireless Mesh Networks, vol. 24, no. 11, pp. 2104-2115, Nov. 2006.

Page 50: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 50

References

• Myopic Cross-Layer (Multimedia systems and communications)

– [He, 2005] Z. He, Y. Liang, L. Chen, I. Ahmad, and D. Wu, “Power-rate-distortion analysis for wireless video communication under energy constraints,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 15, no. 5, pp. 645-658, May 2005.

– [Sachs, 2003] D. G. Sachs, S. Adve, D. L. Jones, “Cross-layer adaptive video coding to reduce energy on general-purpose processors,” in Proc. International Conference on Image Processing, vol. 3, pp. III-109-112 vol. 2, Sept. 2003.

– [Nahrstedt, 2006] W. Yuan, K. Nahrstedt, S. V. Adve, D. L. Jones, R. H. Kravets, “GRACE-1: cross-layer adaptation for multimedia quality and battery energy,” IEEE Trans. on Mobile Computing, vol. 5, no. 7, pp. 799-815, July 2006.

– [Nahrstedt, 2007] K. Nahstedt, W. Yuan, S. Shah, Y. Xue, and K. Chen, “QoS support in multimedia wireless environments,” in Multimedia Over IP and Wireless Networks, ed. M. van der Schaar and P. Chou, Academic Press, 2007.

– [Mohapatra, 2005] S. Mohapatra, R. Cornea, H. Oh, K. Lee, M. Kim, N. Dutt, R. Gupta, A. Nicolau, S. Shukla, N. Venkatasubramanian, “A cross-layer approach for power-performance optimization in distributed mobile systems,” 19th IEEE International Parallel and Distributed Processing Symposium, 2005.

– [Pillai, 2003] P. Pillai, H. Huang, and K.G. Shin, “Energy-Aware Quality of Service Adaptation,” Technical Report CSE-TR-479-03, Univ. of Michigan, 2003.

– [van der Schaar 2003] M. van der Schaar, S. Krishnamachari, S. Choi, and X. Xu, “Adaptive cross-layer protection strategies for robust scalable video transmission over 802.11 WLANs,” IEEE JSAC, vol. 21, no. 10, pp. 1752-1763.

– [van der Schaar 2007] M. van der Schaar, Y. Andreopoulos, and Z. Hu, “Optimized scalable video streaming over 802.11 a/e HCCA wireless networks under delay constraints,” IEEE Trans. on Mobile Computing, vol. 5, no. 6, pp. 755-768, June 2006.

• Foresighted Single-Layer (no learning, or heuristic)

– [Benini, 1999] L. Benini, A. Bogliolo, G. A. Paleologo, G. D. Micheli, “Policy optimization for dynamic power management,” IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 18, no. 6, pp. 813-833, June 1999.

– [Ortega, 1994] A. Ortega, K. Ramchandran, M. Vetterli, “Optimal trellis-based buffered compression and fast approximations,”IEEE Trans. on Image Processing, vol. 3, no. 1, pp. 26-40, Jan. 1994.

– [Berry, 2002] R. Berry and R. G. Gallager, “Communications over fading channels with delay constraints,” IEEE Trans. Info. Theory, vol. 48, no. 5, pp. 1135-1149, May 2002.

Page 51: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 51

References

• Foresighted Single Layer (with learning)

– [Chung, 2002] E.-Y. Chung, L. Benini, A. Bogliolo, Y.-H. Lu, and G. De Micheli, “Dynamic power management for nonstationary service requests,” IEEE Trans. on Computers, vol. 51, no. 11, Nov. 2002.

– [Marculescu, 2005] Z. Ren, B. H. Krogh, R. Marculescu, “Hierarchical adaptive dynamic power management,” IEEE Trans. on Computers, vol. 54, no. 4, Apr. 2005.

– [Borkar, 2008] N. Salodkar, A. Bhorkar, A. Karandikar, V. S. Borkar, “An on-line learning algorithm for energy efficient delay constrained scheduling over a fading channel,” IEEE JSAC, vol. 26, no. 4, pp. 732-742, Apr. 2008.

– [Krishnamurthy] M. H. Ngo and V. Krishnamurthy, “Monotonicity of constrained optimal transmission policies in correlated fading channels with ARQ,” IEEE Trans. on Signal Processing, vol. 58, no. 1, pp. 438-451, Jan. 2010.

• Multiuser network optimization

– [Neely, 2010] L. Huang, S. Moeller, M. J. Neely and B. Krishnamachari, “LIFO-Backpressure Achieves Near Optimal Utility-Delay Tradeoff,” Aug. 2010, ArXiv Technical Report, arXiv:1008.4895v1.

– [Neely, 2009] M. J. Neely and R. Urgaonkar, "Optimal Backpressure Routing in Wireless Networks with Multi-Receiver Diversity,"Ad Hoc Networks (Elsevier), vol. 7, no. 5, pp. 862-881, July 2009.

– M. J. Neely, "Energy Optimal Control for Time Varying Wireless Networks", IEEE Trans. On Information Theory, vol. 52, no. 7, pp. 2915-2934, July 2006.

– [Fu, van der Schaar, 2010] F. Fu and M. van der Schaar, “A systematic framework for dynamically optimizing multi-user video transmission,” IEEE JSAC, vol. 28, pp. 308-320, Apr. 2010.

– [Chiang 2007] M. Chiang, S. H. Low, A. R. Caldbank, and J.C. Doyle, “Layering as optimization decomposition: A mathematical theory of network architectures,” Proc. of IEEE, vol. 95, no. 1, 2007.

Page 52: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 52

References

• Other

– [Katsaggelos 2008] J. Huang, Z. Li, M. Chiang, and A.K. Katsaggelos, “Joint Source Adaptation and Resource Allocation for Multi-User Wireless Video Streaming,” IEEE Trans. Circuits and Systems for Video Technology, vol. 18, issue 5, 582-595, May 2008.

– [Katsaggelos 2007] E. Maani, P. Pahalawatta, R. Berry, T.N. Pappas, and A.K. Katsaggelos, “Resource Allocation for Downlink Multiuser Video Transmission over Wireless Lossy Networks,” IEEE Transactions on Image Processing, vol. 17, issue 9, 1663-1671, September 2008.

– [Su 2007] G.-M. Su, Z. Han, M. Wu, and K.J.R. Liu, “Joint Uplink and Downlink Optimization for Real-Time Multiuser Video Streaming Over WLANs,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 2, pp. 280-294, August 2007.

– [Knopp 1995] R. Knopp and P. A. Humblet, “Information capacity and power control in single-cell multiuser communications,” Proc. IEEE ICC, 1995.

– [Viswanath 2002] P. Viswanath, D. N. C. Tse, R. Laroia, “Opportunistic beamforming using dumb antennas,” IEEE Trans. on Information Theory, vol. 48, no. 6, June 2002.

– [Tse 2005] D. N. C. Tse and P. Viswanath, Fundamentals of wireless communication. Cambridge, U.K.: Cambridge Univ. Press, 2005.

– [Alay 2009] O. Alay, P. Liu, Z. Guo, L. Wang, Y. Wang, E. Erkip, and S. Panwar, “Cooperative layered video multicast using

randomized distributed space time codes”, IEEE INFOCOM Workshops 2009, Rio de Janeiro, Brazil, Oct. 2009, pp. 1–6.

– [Laneman 2003] J.N. Laneman and G.W. Wornell, “Distributed space-time block coded protocols for exploiting cooperative

diversity in wireless networks,” IEEE Trans. Inf. Theory, vol. 49, pp. 2415–2425, Oct. 2003.

– [Sendonaris 2003] A. Sendonaris, E. Erkip, and B. Aazhang, “User cooperation diversity – Part I & II,” IEEE Trans. Commun., vol.

51, pp. 1927–1948, Nov. 2003.

– [Melodia, 2010] T. Melodia and W. Heinzelmann, “Cross-layer optimization in video sensor networks,” IEEE COMSOC MMTC E-

Letter, vol. 5, no. 3, May 2010.

Page 53: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 53

Supplementary Slides

Page 54: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 54

Multimedia Application Characteristics

• Characteristics

– Stringent delay constraints

– Sophisticated source-coding

dependency structures

– Mixed priorities

– Intense resource requirements

(a) Sequential Dependencies (a) Sequential Dependencies

(b) Typical Hybrid Coder Dependencies (MPEG-2, H.264/AVC)

[Chou, 2006]

(c) Scalable Coding Dependencies

0 1 2 3 4 5 6 7 8 90

1000

2000

3000

4000

5000

6000

7000

8000Complexity profile over time for decoding four layers -- Silent.CIF at 1.5 Mb/s

Time (sec)

Norm

aliz

ed P

rocessor

Tic

ks

0 1 2 3 4 5 6 7 8 90

1000

2000

3000

4000

5000

6000

7000

8000Complexity profile over time for decoding four layers -- Silent.CIF at 1.5 Mb/s

Time (sec)

Norm

aliz

ed P

rocessor

Tic

ks

(c)

Decoding complexity (Silent sequence)

Time (seconds)

N

orm

aliz

ed C

om

ple

xit

y

Page 55: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Reinforcement Learning Architecture

55

Policy

Traffic and Channel

Dynamics

cost

actionstate

Error

Value

Function

Page 56: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 56

Conventional Reinforcement Learning Algorithm

• Q-learning

– Experience tuple:

– Q-learning update:

• Exploration vs. Exploitation

– Given state , how do we choose the action ?

– Exploitation: Take greedy action w.r.t. action-value function estimate• Prevents the discovery of potentially better actions

– Exploration: Take suboptimal action w.r.t. action-value function estimate

• Sacrifice immediate performance for possibly improved future performance

( )1, , ,n n n n ns a c sσ +=

ns na

( )1 1( , ) 1 ( , ) min ( , )n n n n n n n n n n n

aQ s a Q s a c Q s aα α γ+ +

′∈

′← − + + A

starting estimate new samplerevised estimate

( )1( , ), | ,n n n n n nE c c s a s p s s a+ ′= ∼

Problem

Problem

Problem

Page 57: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 57

Partially Known Dynamics

• Known dynamics

– Goodput:

– PM state transition distribution

– Power cost

– Holding cost

• Unknown dynamics

– Packet arrival distribution:

– Channel state transition:

– Overflow cost

• Post-decision state

– An intermediate state• After known dynamics take place

• Before unknown dynamics take place

( ) ( )| , bin ,1f n n n n np f BEP z z PLR= −

( )lp l

( )|hp h h′

Page 58: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 58

Decomposition into Known and Unknown Components

( ) ( ) ( ) ( )k | , | , | ,x fp s s a p x x y p b b BEP z I h h= − =� �� �

( ) ( ) ( ) ( )u | | ,h lp s s p h h p b b I x x′ ′ ′ ′= − =� �� �

( )

( )( ) ( )u

0overflow cost

max ,0 .1

l

l

c s p l b l Bγ

γ

== + −

− ∑ ���������������������

• Known and unknown transition probabilities

• Known and unknown costs

known

unknown

known

unknown

( ) [ ] [ ]( )k

0 power costholding cost

( , ) | , , , , ,z

f

f

c s a p f BEP z b f h x BEP y zµ ρ=

= − +∑�

������� �������������������

The unknown components do not depend on the action

Page 59: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 59

Post-Decision State Learning

• Post-decision state learning (online):

– PDS experience tuple: ( )1u, , , ,n n n n n ns a s c sσ += ��

( ) ( ) ( ) ( )u u |s

V s c s p s s V sγ∗ ∗

′′ ′= + ∑� � � �

( ) ( ) ( )k kmin ( , ) | ,

as

V s c s a p s s a V s∗ ∗∈

= + ∑

A �

�� �

(a)

(b)

The PDS value function must

be learned

( ) ( ) ( ) ( )[ ]1 1u1n n n n n n n n nV s V s c V sα α γ+ +← − + +� �� �

starting estimate new samplerevised estimate

[ ] ( )1( ), |n n n nu u uE c c s s p s s+ ′= � �∼

Page 60: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 60

Comparison to Prior Work using Post-Decision States [N. Salodkar, 2010]*

Salodkar Proposed

DPM No Yes

AMC Yes Yes

Power-control Yes Yes

Packet losses No Yes

Post-decision state Deterministic Stochastic

Costs Known only Known and unknown

State transitions Known and unknown Known and unknown

Optimization Criteria Undiscounted Discounted

Virtual Experience No Yes

* Differences in the proposed work are highlighted in red

Page 61: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 61

Learning Algorithm Performance Comparison

0 2 4 6

x 104

0

5

10

15

20

25

Time slot (n)

Ho

ldin

g C

ost

(a)

x 10

PDS + Virtual Experience (update period = 1) PDS + Virtual Experience (update period = 10) PDS + Virtual Experience (update period = 25) PDS + Virtual Experience (update period = 125)PDS Learning PDS Learning (No DPM) Q-learning

0 2 4 6

x 104

0

0.1

0.2

0.3

0.4

Time slot (n)

θo

ff

(e)

0 2 4 6

x 104

200

250

300

Time slot (n)

Po

we

r (m

W)

(b)

*[Borkar, 2008]

*

Page 62: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 62

Comparison to Optimal Policy With Imperfect Statistics

100

101

102

103

104

105

0

2

4

6

8

10

12

Time slot (n)

Hold

ing

Co

st

100

101

102

103

104

105

0

50

100

150

200

250

300

350

Time slot (n)

Pow

er

(mW

)

PDS + Virtual Experience (update period = 1)Optimal policy (imperfect statistics)

Page 63: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 63

Non-Stationary Arrivals

0 2 4 6

x 104

0

100

200

300

400

Time slot (n)Exp

ect

ed a

rriv

al r

ate

(pa

cke

ts/s

)

• Unobservable 5-state Markov modulated process

– States

• Expected arrival rate for a Poisson arrival process

• (0, 100, 200, 300, 400) packets/s

– Stationary distribution

• (0.0188, 0.3755, 0.0973, 0.4842, 0.0242).

Page 64: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 64

Non-Stationary Channel Transitions

• Channel state transition probabilities vary over time as an AR(1) process.

Self-transition probabilities

(White indicates a relatively high self-transition probability)

Page 65: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 65

Information-Theoretic Power Cost

( )

2

2( , ) 2 1zc h zh

σ= −

Page 66: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 66

Physical Layer: Adaptive Modulation and Power Control

• Transmission rate:

– bits per symbol:

– packet length (bits):

– packet rate (packets/s):

• Bit-error probability (BEP):

• SNR:

/n sTβ

L ( )/n n

sr LTβ=

4 3

2 1n

nn n

nBEP Q hβ

γ

β

≤ − ( ) ( )

2 /21/ 2 u

xQ x e duπ

∞ −∫�

tx

0

s nTP

Nγ =

1nβ ≥

Page 67: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 67

Physical Layer: Adaptive Modulation and Power Control

• Variables of interest

– Packet throughput (packets/time slot):

– Bits per symbol:

– Transmission power (watts):

– Bit-error probability:

β

txPBEP

zpacket throughput

/n nsz LT tβ = ∆

bits per symbol

( ) 20 1

tx

2 1

3 4

nn

n nns

NP Q BEP

h T

ββ−− ≥

transmission power

n nBEP Q h

bit-error probability

Decision variable 1

Decision variable 2

Adaptive Modulation

Power Control

z

Page 68: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 68

Initializing the PDS Value Function

• PDS Value iteration:

• Initialization:

– Define reasonable estimates: and

– Perform PDS value iteration with estimates

( ) ( ) ( ) ( )u u |k ksV s c s p s s V sγ ′

′ ′= + ∑� � � �

( ) ( ) ( ){ }1 k kmin ( , ) | ,k a ksV s c s a p s s a V s+ ∈= +∑A �

�� �

� ( )uc s� � ( )u |p s s′ �

Page 69: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 69

Impact of Initial Conditions

• Initial arrival rate is assumed deterministic or uniform

• Channel state is assumed constant

3.9 3.95 4 4.05 4.1 4.15 4.2180

190

200

210

220

230

240

Holding cost (packets)

Pow

er

(mW

)

Init. Arr. Rate = 100 packets/sInit. Arr. Rate = 200 packets/sInit. Arr. Rate = 300 packets/sInit. Arr. Rate = 400 packets/sInit. Arr. Rate = 500 packets/sInit. Arr. Rate = 600 packets/sInitialized Arrival Rate = Uniform

Page 70: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 70

Traffic State Transition Illustration

tttt

t t t t +1+1+1+1

t t t t +2+2+2+2

Ft = (1,2,3)

bbbbt ====(4,3,2)

Ft+1 = (2,3,1,4)

bbbbt+1 = = = = (3,2,6,1)

Ft+2 = (1,4,2,3)

bbbbt+2 = = = = (4,-1,4,1)

yyyyt = = = = (4444,0,0)

yyyyt+1 = = = = (0,0,2222,0)

yyyyt+2 = = = = (4444,0,1111,0)

Traffic StateScheduling

Action

Page 71: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Phase I Time Fraction

• K and Q symbols have to be transmitted in phase I and phase II, respectively

71

( ),1 ,2 ,1 ,2 0

11 1

1 /

i c ct i i i i i

t t c t t tR

R Rρ

β β β β β

+= ⇒ + <

+

( ),1 ,21i i i it t t tQ Kρ β ρ β−=

1 is the rate of the orthogonal STB/ C rulecR K Q= ≤

Page 72: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 72

Reformulated Multi-user Markov Decision Process

• Global state:

• Decision variables:

– Scheduling action:

• Dynamic programming equation:

• Constraints:

• Challenges:

– Complexity is proportional to , which scales exponentially in

– Traffic state information is local to users

, where

S M

, where

Page 73: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 73

Optimization Decomposition

• The multi-user optimization can be decomposed into local MDPs satisfying:

• Requires message exchanges between users and the AP

– Users ���� AP: Discounted infinite horizon resource consumption

– AP ���� Users: Uniform resource price to manage congestion

Page 74: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Proposed Opportunistic Cooperation Protocol (1/2)

• When is cooperative transmission better than direct transmission?

• Candidate cooperative nodes can self-select themselves:

• AP verifies fulfillment of following condition:

• If satisfied, then cooperation is better; otherwise, choose direct

74

,coop 0

,1 ,2 0

11i i c ct t i i i

t t t

R Rβ β

β β β

+> ⇒ + <

0

, where 1

:1

0i

i t c ct t ti

c ct

R R

R R

βξ ξ

β

= < ≤

+

+≤C

( ),2 0

111 c

ti it t

β β

+< −

Page 75: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Proposed Opportunistic Cooperation Protocol (2/2)

• RTS

– Request to send

• CRS

– Cooperative recruitment signal

• HTS

– Help to send

• CTS

– Clear to send

75

{ }, , 0 ,it Mh ∀ ∈ …

*Dual channels, i.e. i it th h=

0 to candidatesitβ

,2itRh

,2 and and i it tz β λ

Page 76: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory

Computation of Transmission Rates

• Direct rate

• Phase I rate

• Phase II rate

76

Page 77: Online Learning for Energy-Efficient Multimedia Systemsmedianetlab.ee.ucla.edu/data/slides/Defense_Slides_Nick.pdfThe Solved Energy-efficient Wireless Communication Problem (1/2) •

Multimedia Communications and Systems Laboratory 77

Cooperation Statistics