Efficient and Tractable System Identification through ...ahefny/pubs/7_21_17_berkley.pdfEfficient...

47
Efficient and Tractable System Identification through Supervised Learning AHMED HEFNY MACHINE LEARNING DEPARTMENT CARNEGIE MELLON UNIVERSITY 8/1/2017 1 JOINT WORK WITH GEOFFREY GORDON, CARLTON DOWNEY, BYRON BOOTS (GEORGIA TECH), ZITA MARINHO, WEN SUN

Transcript of Efficient and Tractable System Identification through ...ahefny/pubs/7_21_17_berkley.pdfEfficient...

Efficient and Tractable System Identification through Supervised LearningA HM E D HE FNY

M A CHINE L E A RNING DE PA RTME NT

CARNE GIE M E L LON UNIV E RSIT Y

8/1/2017 1

J O I NT W O R K W I T H

G E O F F RE Y G OR DON, C A R LTON D O W NE Y,

BY R O N B O OT S ( G E O R GI A T E C H ) ,

Z I TA MA R I NH O, W E N S UN

Outline- Problem Statement:

β—¦ Learning Dynamical Systems

β—¦ Solution Properties

- Formulation:β—¦ A Taxonomy of Dynamical System Models

β—¦ Predictive State Models: Formulation and Learning

β—¦ Connection to Recurrent Networks

- Extensions:β—¦ Controlled Systems

β—¦ Reinforcement Learning

8/1/2017 2

Outline- Problem Statement:

β—¦ Learning Dynamical Systems

β—¦ Solution Properties

- Formulation:β—¦ A Taxonomy of Dynamical System Models

β—¦ Predictive State Models: Formulation and Learning

β—¦ Connection to Recurrent Networks

- Extensions:β—¦ Controlled Systems

β—¦ Reinforcement Learning

8/1/2017 3

Learning Dynamical Systems

8/1/2017 4

Learning Dynamical Systems

8/1/2017 5

Learning Dynamical Systems

𝒔𝒕 𝒔𝒕+𝟏

𝒐𝒕

System Dynamics

Observation Model

Latent System State

Position & speed

8/1/2017 6

Learning Dynamical Systems

𝒔𝒕 𝒔𝒕+𝟏

𝒐𝒕

System Dynamics

Observation Model

Latent System State

Position & speed

8/1/2017 7

π‘žπ‘‘ ≑ 𝑃 𝑠𝑑 π‘œ1:π‘‘βˆ’1

Learning Dynamical Systems

8/1/2017 8

π‘žπ‘‘ ≑ 𝑃 𝑠𝑑 π‘œ1:π‘‘βˆ’1

π‘žπ‘‘historyπ‘œ1:π‘‘βˆ’1

futureπ‘œπ‘‘:∞

𝑃 π‘œπ‘‘+𝜏 π‘œ1:𝑑=1 ≑ 𝑃(π‘œπ‘‘+𝜏|π‘žπ‘‘)

Learning a Recursive FilterGiven:

Training Sequences:

(π‘œ1, π‘œ2, … , π‘œπ‘‡)

Output:β—¦ Initial belief π‘ž1β—¦ Filtering function 𝑓 π‘žπ‘‘+1 = 𝑓(π‘žπ‘‘ , π‘œπ‘‘)

β—¦ Observation function 𝑔 𝐸 π‘œπ‘‘ π‘œ1:π‘‘βˆ’1 = 𝑔(π‘žπ‘‘)

8/1/2017 9

Learning a Recursive FilterGiven:

Training Sequences:

(π‘œ1, π‘œ2, … , π‘œπ‘‡)

Output:β—¦ Initial belief π‘ž1β—¦ Filtering function 𝑓 π‘žπ‘‘+1 = 𝑓(π‘žπ‘‘ , π‘œπ‘‘)

β—¦ Observation function 𝑔 𝐸 π‘œπ‘‘ π‘œ1:π‘‘βˆ’1 = 𝑔(π‘žπ‘‘)

System- Non-linear- Partially observable- Controlled

Algorithm- Theoretical Guarantees - Scalability

8/1/2017 10

Outline- Problem Statement:

β—¦ Learning Dynamical Systems

β—¦ Solution Properties

- Formulation:β—¦ A Taxonomy of Dynamical System Models

β—¦ Predictive State Models: Formulation and Learning

β—¦ Connection to Recurrent Networks

- Extensions:β—¦ Controlled Systems

β—¦ Reinforcement Learning

8/1/2017 11

RNN [BPTT]

Learning a Recursive FilterGiven:

Training Sequences:

(π‘œ1, π‘œ2, … , π‘œπ‘‡)

Output:β—¦ Initial belief π‘ž1β—¦ Filtering function 𝑓 π‘žπ‘‘+1 = 𝑓(π‘žπ‘‘, π‘œπ‘‘)

β—¦ Observation function 𝑔 𝐸 π‘œπ‘‘ π‘œ1:π‘‘βˆ’1 = 𝑔(π‘žπ‘‘)

π‘œπ‘‘+1

π‘žπ‘‘ π‘žπ‘‘+1

π‘œπ‘‘

f

g

8/1/2017 12

Learn a model then derive f

HMM[EM, Tensor Decomp.]

Directly Learn f RNN [BPTT]

Learning a Recursive FilterGiven:

Training Sequences:

(π‘œ1, π‘œ2, … , π‘œπ‘‡)

Output:β—¦ Initial belief π‘ž1β—¦ Filtering function 𝑓 π‘žπ‘‘+1 = 𝑓(π‘žπ‘‘, π‘œπ‘‘)

β—¦ Observation function 𝑔 𝐸 π‘œπ‘‘ π‘œ1:π‘‘βˆ’1 = 𝑔(π‘žπ‘‘)

π‘œπ‘‘+1

π‘žπ‘‘ π‘žπ‘‘+1

π‘œπ‘‘

f

g

8/1/2017 13

Fix g (Predictive State) Learn g (Latent State)

Learn a model then derive f

HMM[EM, Tensor Decomp.]

Directly Learn f π‘žπ‘‘ ≑ 𝐸 πœ“(π‘œπ‘‘:∞) ∣ π‘œ1:π‘‘βˆ’1state = E[sufficient future stats]PSIM [DAgger]

RNN [BPTT]

Learning a Recursive FilterGiven:

Training Sequences:

(π‘œ1, π‘œ2, … , π‘œπ‘‡)

Output:β—¦ Initial belief π‘ž1β—¦ Filtering function 𝑓 π‘žπ‘‘+1 = 𝑓(π‘žπ‘‘, π‘œπ‘‘)

β—¦ Observation function 𝑔 𝐸 π‘œπ‘‘ π‘œ1:π‘‘βˆ’1 = 𝑔(π‘žπ‘‘)

π‘œπ‘‘+1

π‘žπ‘‘ π‘žπ‘‘+1

π‘œπ‘‘

f

g

8/1/2017 14

Fix g (Predictive State) Learn g (Latent State)

Learn a model then derive f

Predictive State Models[Method of moments: Two-stage regression]

HMM[EM, Tensor Decomp.]

Directly Learn f π‘žπ‘‘ ≑ 𝐸 πœ“(π‘œπ‘‘:∞) ∣ π‘œ1:π‘‘βˆ’1state = E[sufficient future stats]PSIM [DAgger]

RNN [BPTT]

Learning a Recursive FilterGiven:

Training Sequences:

(π‘œ1, π‘œ2, … , π‘œπ‘‡)

Output:β—¦ Initial belief π‘ž1β—¦ Filtering function 𝑓 π‘žπ‘‘+1 = 𝑓(π‘žπ‘‘, π‘œπ‘‘)

β—¦ Observation function 𝑔 𝐸 π‘œπ‘‘ π‘œ1:π‘‘βˆ’1 = 𝑔(π‘žπ‘‘)

π‘œπ‘‘+1

π‘žπ‘‘ π‘žπ‘‘+1

π‘œπ‘‘

f

g

8/1/2017 15

Why restrict 𝑓 and 𝑔 ?

* Predictive State (a.k.a Observable Representation):

State is a prediction of future observation statistics

Future statistics are noisy estimates of the state.

Reduction to supervised learning.

* Additional assumptions on dynamics facilitate the development of an efficient algorithm with provable guarantees.

* Local improvement is still possible.

Learning a Recursive Filter

PSM PSIM RNN

8/1/2017 16

Predictive State Model (Formulation)Predictive State π‘žπ‘‘ ≑ 𝑃 π‘œπ‘‘:𝑑+π‘˜βˆ’1 π‘œ1:π‘‘βˆ’1 ≑ 𝐸 πœ“π‘‘ π‘œ1:π‘‘βˆ’1

Extended Predictive State 𝑝𝑑 ≑ 𝑃 π‘œπ‘‘:𝑑+π‘˜ π‘œ1:π‘‘βˆ’1 ≑ 𝐸 πœπ‘‘ π‘œ1:π‘‘βˆ’1

Linear Dynamics 𝑝𝑑 =𝑾 π‘žπ‘‘

π‘œπ‘‘βˆ’1 π‘œπ‘‘ π‘œπ‘‘+π‘˜βˆ’1 π‘œπ‘‘+π‘˜

future πœ“π‘‘

extended future πœ‰π‘‘ππ’• 𝒒𝒕

Indicator Vector Joint Probability Table

1st and 2nd moments

Gaussian Distribution

178/1/2017

Predictive State Model (Formulation)Predictive State π‘žπ‘‘ ≑ 𝑃 π‘œπ‘‘:𝑑+π‘˜βˆ’1 π‘œ1:π‘‘βˆ’1 ≑ 𝐸 πœ“π‘‘ π‘œ1:π‘‘βˆ’1

Extended Predictive State 𝑝𝑑 ≑ 𝑃 π‘œπ‘‘:𝑑+π‘˜ π‘œ1:π‘‘βˆ’1 ≑ 𝐸 πœπ‘‘ π‘œ1:π‘‘βˆ’1

Linear Dynamics 𝑝𝑑 =𝑾 π‘žπ‘‘

Filtering: 𝑓 π‘žπ‘‘ , π‘œπ‘‘ = π’‡π’‡π’Šπ’π’•π’†π’“ 𝑝𝑑, π‘œπ‘‘ = π‘“π‘“π‘–π‘™π‘‘π‘’π‘Ÿ π‘Šπ‘žπ‘‘ , π‘œπ‘‘

π‘œπ‘‘βˆ’1 π‘œπ‘‘ π‘œπ‘‘+π‘˜βˆ’1 π‘œπ‘‘+π‘˜

future πœ“π‘‘

extended future πœ‰π‘‘ππ’• 𝒒𝒕 π’‡π’‡π’Šπ’π’•π’†π’“

Indicator Vector Joint Probability Table

Bayes Rule:𝑃 π‘œπ‘‘+1:𝑑+π‘˜ π‘œ1:𝑑 ∝ 𝑃(π‘œπ‘‘:𝑑+π‘˜ ∣ π‘œ1:π‘‘βˆ’1)

1st and 2nd moments Gaussian Distribution

Gaussian conditional mean and covariance.

fixed

learned

8/1/2017 18

Predictive State Model (Formulation)Predictive State π‘žπ‘‘ ≑ 𝑃 π‘œπ‘‘:𝑑+π‘˜βˆ’1 π‘œ1:π‘‘βˆ’1 ≑ 𝐸 πœ“π‘‘ π‘œ1:π‘‘βˆ’1

Extended Predictive State 𝑝𝑑 ≑ 𝑃 π‘œπ‘‘:𝑑+π‘˜ π‘œ1:π‘‘βˆ’1 ≑ 𝐸 πœπ‘‘ π‘œ1:π‘‘βˆ’1

Linear Dynamics 𝑝𝑑 =𝑾 π‘žπ‘‘

Filtering: 𝑓 π‘žπ‘‘ , π‘œπ‘‘ = π’‡π’‡π’Šπ’π’•π’†π’“ 𝑝𝑑, π‘œπ‘‘ = π‘“π‘“π‘–π‘™π‘‘π‘’π‘Ÿ π‘Šπ‘žπ‘‘ , π‘œπ‘‘

π‘œπ‘‘βˆ’1 π‘œπ‘‘ π‘œπ‘‘+π‘˜βˆ’1 π‘œπ‘‘+π‘˜

future πœ“π‘‘

extended future πœ‰π‘‘

fixed

learned

Why linear 𝑾 ?Crucial to the consistency of learning algorithm.

Why this particular filtering formulation ?Matches existing models (HMM, Kalman filter, PSR)

8/1/2017 19

Predictive State Model (Learning)πœ“π‘‘ and πœπ‘‘ are unbiased estimates of π‘žπ‘‘ and 𝑝𝑑:

β€’ πœ“π‘‘ = π‘žπ‘‘ + πœ–π‘‘

β€’ πœπ‘‘ = 𝑝𝑑 + πœˆπ‘‘

Learning Procedure:

β€’ ΰ·œπ‘ž0 =1

𝑁σ𝑖 πœ“π‘–

β€’ Learn π‘Š using linear regression with examples (πœ“π‘‘ , πœπ‘‘ )

π‘œπ‘‘βˆ’1 π‘œπ‘‘ π‘œπ‘‘+π‘˜βˆ’1 π‘œπ‘‘+π‘˜

future πœ“π‘‘

extended future πœ‰π‘‘

πœ–π‘‘ and πœˆπ‘‘ are correlatedπΆπ‘œπ‘£ π‘žπ‘‘ , 𝑝𝑑 β‰  πΆπ‘œπ‘£(πœ“π‘‘, πœπ‘‘)

8/1/2017 20

Predictive State (Learning)πœ“π‘‘ and πœπ‘‘ are unbiased estimates of π‘žπ‘‘ and 𝑝𝑑:

β€’ πœ“π‘‘ = π‘žπ‘‘ + πœ–π‘‘

β€’ πœπ‘‘ = 𝑝𝑑 + πœˆπ‘‘

Learning Procedure:

β€’ π‘ž0 =1

𝑁σ𝑖 πœ“π‘–

β€’ Denoise examples (πœ“π‘‘ , πœπ‘‘) to obtain ( ΰ·œπ‘žπ‘‘ , Ƹ𝑝𝑑)

β€’ Learn π‘Š using linear regression with examples (ΰ·œπ‘žπ‘‘ , Ƹ𝑝𝑑)

πœ–π‘‘ and πœˆπ‘‘ are correlatedπΆπ‘œπ‘£ π‘žπ‘‘ , 𝑝𝑑 β‰  πΆπ‘œπ‘£(πœ“π‘‘, πœπ‘‘)

π‘œπ‘‘βˆ’1 π‘œπ‘‘ π‘œπ‘‘+π‘˜βˆ’1 π‘œπ‘‘+π‘˜

future πœ“π‘‘

extended future πœ‰π‘‘

denoised future ΰ·œπ‘žπ‘‘

denoised extended future Ƹ𝑝𝑑

???

𝑝𝑑 = π‘Šπ‘žπ‘‘

𝐸 πœπ‘‘ π‘œπ‘‘βˆ’π‘˜:π‘‘βˆ’1 = π‘ŠπΈ[πœ“π‘‘ ∣ π‘œπ‘‘βˆ’π‘˜:π‘‘βˆ’1]Use regression

8/1/2017 21

𝐸 πœπ‘‘ π‘œ1:π‘‘βˆ’1 = π‘ŠπΈ[πœ“π‘‘ ∣ π‘œ1:π‘‘βˆ’1]

ΰ·œπ‘žπ‘‘ = π‘Š Ƹ𝑝𝑑

Learning Dynamical Systems Using Instrument Regression

π‘œπ‘‘βˆ’1 π‘œπ‘‘ π‘œπ‘‘+π‘˜βˆ’1 π‘œπ‘‘+π‘˜

history β„Žπ‘‘ future πœ“π‘‘

extended future πœ‰π‘‘

S1A regression

S1B regression

S2 regression (learn W)

Condition on π‘œπ‘‘ (filter) π‘žπ‘‘+1Marginalize π‘œπ‘‘ (predict) π‘žπ‘‘+1|π‘‘βˆ’1

denoised future 𝐸[π‘žπ‘‘|β„Žπ‘‘]estimated future ΰ·œπ‘žπ‘‘

denoised extended future 𝐸[𝑝𝑑|β„Žπ‘‘]

Apply W

estimated extended future Ƹ𝑝𝑑

8/1/2017 22

In a nutshell* Predictive State:

β—¦ State is a prediction of future observations

β—¦ Future observations are noisy estimates of the state

* Two stage regression:β—¦ Use history features to β€œdenoise” states (S1 Regression)

β—¦ Use denoised states to learn dynamics (S2 Regression)

8/1/2017 23

What do we gain ?* More understanding of existing algorithms:

β—¦ Spectral algorithms for learning HMMs, Kalman filters, PSRs are two stage regression algorithms with linear regression in all stages.

* Theoretical Results (Asymptotic and finite sample):

β—¦ Error in estimating π‘Š is ෨𝑂 1/βˆšπ‘ [Under mild assumptions]

β—¦ Exact rate depends on S1 regression error

* New flavors of dynamical systems learning algorithms:β—¦ HMM with logistic regression.

β—¦ Online learning of linear dynamical systems (Sun et al. 2015).

β—¦ Linear dynamical systems with sparse dynamics (Hefny et al 2015, Gus Xia 2016).

8/1/2017 24

Predictive State Models as RNNs

8/1/2017 25

π‘žπ‘‘ π‘žπ‘‘+1π‘π‘‘π‘Š π‘“π‘“π‘–π‘™π‘‘π‘’π‘Ÿ

πœ“π‘‘πœ“π‘‘+1πœ“π‘‘

π‘œπ‘‘

πœ“π‘‘+1

Predictive State Models as RNNs

Predictive state models define RNNs that are easy to initialize !!

8/1/2017 26

Back to Special Case: Modeling Discrete Systems with Indicator VectorsAssume the discrete case: π‘œπ‘‘ is an indicator vector.

Let πœ“π‘‘ = π‘œπ‘‘ and πœπ‘‘ = π‘œπ‘‘ βŠ—π‘œπ‘‘+1

Then:

π‘žπ‘‘ Probability Vector

𝑝𝑑 Joint Probability Table

𝑓(𝑝𝑑 , π‘œπ‘‘) Choose column from 𝑝𝑑 corresponding to π‘œπ‘‘ then renormalize

8/1/2017 27

Back to Special Case: Modeling Discrete Systems with Indicator Vectors

8/1/2017 28

𝒒𝒕

0.2

0.5

0.3

0.1 0.2 0.1

0.05 0.05 0.25

0.05 0.15 0.05

𝒑𝒕 𝒒𝒕+𝟏

0.2

0.5

0.3𝑾

|| . ||

0

1

0

𝒐𝒕

Back to Special Case: Modeling Discrete Systems with Indicator Vectors

8/1/2017 29

𝒒𝒕

0.2

0.5

0.3

𝒒𝒕+𝟏

0.2

0.5

0.3𝑾

|| . ||

0

1

0

𝒐𝒕

Back to Special Case: Modeling Discrete Systems with Indicator Vectors

8/1/2017 30

𝒒𝒕

0.2

0.5

0.3

𝒒𝒕+𝟏

0.2

0.5

0.3𝑾

|| . ||

0

1

0

𝒐𝒕 π‘žπ‘‘+1(π‘˜)

∝ 𝐢⊀ (π΄π‘žπ‘‘ ∘ π΅π‘œπ‘‘)

Multiplicative unit

Predictive State Models as RNNs

Predictive units have a multiplicative structure, similar to LSTMs and GRUs.

8/1/2017 31

What about the continuous case ?

Mean-maps for Continuous ObservationsMean-maps provide a powerful tool to model non-parametric distribution using the feature map of a universal kernel.

A discrete distribution is a special case that uses the delta kernel and indicator feature map. Continuous distributions can be modeled using e.g. RBF kernel.

Discrete Case General Case

Indicator Vector Kernel feature map πœ™(π‘₯)

Joint Probability Table P(X,Y) Covariance Operator πΆπ‘‹π‘Œ

Conditional Probability Table Conditional Operator 𝐢𝑋|π‘Œ

Normalization P(X,Y) P(X|Y) Kernel Bayes Rule 𝐢𝑋|π‘Œ = πΆπ‘‹π‘ŒπΆπ‘Œπ‘Œβˆ’1

8/1/2017 32

Mean-maps for Continuous ObservationsMean-maps provide a powerful tool to model non-parametric distribution using the feature map of a universal kernel.

A discrete distribution is a special case that uses the delta kernel and indicator feature map. Continuous distributions can be modeled using e.g. RBF kernel.

Discrete Case General Case RFF Approximation

Indicator Vector Kernel feature map πœ™(π‘₯) RFF Feature Vector

Joint Probability Table P(X,Y) Covariance Operator πΆπ‘‹π‘Œ Covariance Matrix

Conditional Probability Table Conditional Operator 𝐢𝑋|π‘Œ Conditional Matrix

Normalization P(X,Y) P(X|Y) Kernel Bayes Rule 𝐢𝑋|π‘Œ = πΆπ‘‹π‘ŒπΆπ‘Œπ‘Œβˆ’1 Solve Linear System

8/1/2017 33

Results

8/1/2017 34

Character prediction accuracy of English textError predicting handwriting trajectories

Outline- Problem Statement:

β—¦ Learning Dynamical Systems

β—¦ Solution Properties

- Formulation:β—¦ A Taxonomy of Dynamical System Models

β—¦ Predictive State Models: Formulation and Learning

β—¦ Connection to Recurrent Networks

- Extensions:β—¦ Controlled Systems

β—¦ Reinforcement Learning

8/1/2017 35

Extension to Controlled SystemsIn controlled systems we have observations and actions. (e.g. car velocity and pressure on pedals)

π‘œπ‘‘

π‘Žπ‘‘

𝑠𝑑 𝑠𝑑+1

Recursive Filter:

𝐸 π‘œπ‘‘ π‘žπ‘‘ , π‘Žπ‘‘ = 𝑔 π‘žπ‘‘ , π‘Žπ‘‘

𝐸 π‘žπ‘‘+1 π‘žπ‘‘ , π‘œπ‘‘ , π‘Žπ‘‘ = 𝑓(π‘žπ‘‘ , π‘Žπ‘‘ , π‘œπ‘‘)

8/1/2017 36

Extension to Controlled SystemsSame principle

𝑃𝑑 = π‘Š(𝑄𝑑)

This time, the predictive state is a linear operator encoding conditional distribution of future observations given future actions.

Example: Think of 𝑄𝑑 and 𝑃𝑑 as conditional probability tables.

Requires appropriate modifications to S1 regression. S2 regression remains the same.

8/1/2017 37

Extension to Controlled SystemsTwo stage regression: It is all about finding

𝐸[𝑄𝑑|π‘œπ‘‘βˆ’π‘˜:π‘‘βˆ’1, π‘Žπ‘‘βˆ’π‘˜:π‘‘βˆ’1]

0.1 0.8 0.5 0.2

0.3 0.1 0.2 0.1

0.6 0.1 0.3 0.7Ob

serv

atio

n

Action

8/1/2017 38

Extension to Controlled SystemsTwo stage regression: It is all about finding

𝐸 𝑄𝑑 π‘œπ‘‘βˆ’π‘˜:π‘‘βˆ’1, π‘Žπ‘‘βˆ’π‘˜:π‘‘βˆ’1

Problem:

At each time step we observe a noisy version of

a slice of 𝑄𝑑

? ? 1 ?

? ? 0 ?

? ? 0 ?Ob

serv

atio

n

Action

8/1/2017 39

Extension to Controlled SystemsTwo stage regression: It is all about finding

𝐸 𝑄𝑑 π‘œπ‘‘βˆ’π‘˜:π‘‘βˆ’1, π‘Žπ‘‘βˆ’π‘˜:π‘‘βˆ’1 = 𝑄(π‘œπ‘‘βˆ’π‘˜:π‘‘βˆ’1, π‘Žπ‘‘βˆ’π‘˜:π‘‘βˆ’1)

Problem:

At each time step we observe a noisy version of

a slice of 𝑄𝑑

Solution 1 (Joint Modeling):

Predict the joint distribution of observation and actions.

Manually convert to conditional table (e.g. normalize columns).

0 0 1 0

0 0 0 0

0 0 0 0Ob

serv

atio

n

Action

8/1/2017 40

Extension to Controlled SystemsTwo stage regression: It is all about finding

𝐸 𝑄𝑑 π‘œπ‘‘βˆ’π‘˜:π‘‘βˆ’1, π‘Žπ‘‘βˆ’π‘˜:π‘‘βˆ’1 = 𝑄(π‘œπ‘‘βˆ’π‘˜:π‘‘βˆ’1, π‘Žπ‘‘βˆ’π‘˜:π‘‘βˆ’1)

Problem:

At each time step we observe a noisy version of

a slice of 𝑄𝑑

Solution 2 (Conditional Modeling):

Train regression model to fit the observed slice of 𝑄𝑑.

min𝑄σ𝑑

𝑄(π‘œπ‘‘βˆ’π‘˜:π‘‘βˆ’1, π‘Žπ‘‘βˆ’π‘˜:π‘‘βˆ’1) πœ“π‘‘π‘Ž βˆ’ πœ“π‘‘

π‘œ 2

D/C D/C 1 D/C

D/C D/C 0 D/C

D/C D/C 0 D/COb

serv

atio

n

Action

8/1/2017 41

Results

8/1/2017 42

Results

Predicting the pose of a swimming robot:β€’ Ground Truthβ€’ Linear ARXβ€’ RFFPSR (our model)

8/1/2017 43

Reinforcement Learning with Predictive State Policy Networks

8/1/2017 44

PSRNNFeed-forward

Policyπ‘œπ‘‘ , π‘Žπ‘‘ π‘Žπ‘‘+1

β€’ Can be initializedβ€’ States have a meaning:

can be trained to match observations

Preliminary Results

8/1/2017 45

Conclusions- Predictive State Models for filtering in dynamical systems:β—¦ Predictive State: State is a prediction of future observations.

β—¦ Two Stage Regression: Learning predictive state models can be reduced to supervised learning.

- Predictive State Models are a special type of recurrent networks.

- Can be extended to controlled systems and employed in reinforcement learning. upon that to develop a principled and efficient approach for learning to predict and act in continuous systems ?

8/1/2017 46

Thank you !* Hefny, Downey and Gordon, β€œSupervised Learning for Dynamical System Learning” (NIPS15), https://arxiv.org/abs/1505.05310

* Hefny, Downey and Gordon, β€œSupervised Learning for Controlled Dynamical System Learning”, https://arxiv.org/abs/1702.03537

* Downey, Hefny, Li, Boots and Gordon, β€œPredictive State Recurrent Neural Networks”, https://arxiv.org/abs/1705.09353

8/1/2017 47