PowerPoint 簡報speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2018/Lecture/Reward (v3).pdf · Title...

Sparse RewardHung-yi Lee

Sparse RewardReward Shaping

Reward Shaping

Take “Play”, 𝑟𝑡+1 = 1, 𝑟𝑡+100 = −100

Take “Study”, 𝑟𝑡+1 = −1, 𝑟𝑡+100 = 100

𝑟𝑡+1 = 1

Reward Shaping

https://openreview.net/pdf?id=Hk3mPK5gg

Get reward, when closer

Need domain knowledge

VizDoomhttps://openreview.net/forum?id=Hk3mPK5gg&noteId=Hk3mPK5gg

Curiosity https://arxiv.org/abs/1705.05363

Actor

𝑠1

𝑎1

Env

𝑠2

Env

𝑠1

𝑎1

Actor

𝑠2

𝑎2

Env

𝑠3

𝑎2

……

𝑅 𝜏 =

𝑡=1

𝑇

𝑟𝑡

Reward

𝑟1

Reward

𝑟2

updatedupdated

ICM = intrinsic curiosity module +𝑟𝑡𝑖

ICM ICM

𝑟1𝑖 𝑟2

𝑖

Intrinsic Curiosity Module𝑟𝑡𝑖

𝑠𝑡𝑎𝑡 𝑠𝑡+1

Network 1

Ƹ𝑠𝑡+1 diff

Large reward if 𝑠𝑡+1 is hard to predict

鼓勵冒險

Some states is hard to predict, but not

important.

樹葉飄動

Intrinsic Curiosity Module𝑟𝑡𝑖

𝑠𝑡𝑎𝑡 𝑠𝑡+1

Network 1

Feature Ext

Feature Ext

Network 2

𝜙 𝑠𝑡 𝜙 𝑠𝑡+1

𝜙 𝑠𝑡+1

𝑎𝑡

ො𝑎𝑡

diff

𝜙 is useful features related to actions

Reward from Auxiliary Task

https://arxiv.org/abs/1611.05397

Sparse RewardCurriculum Learning

Curriculum Learning

• Starting from simple training examples, and then becoming harder and harder.

VizDoom

Reverse Curriculum Generation

𝑠𝑔

➢Given a goal state 𝑠𝑔.

➢ Sample some states 𝑠1 “close” to 𝑠𝑔

➢ Start from states 𝑠1, each trajectory has reward R 𝑠1

𝑠1

Reverse Curriculum Generation

𝑠𝑔

➢Delete 𝑠1 whose reward is too large (already learned) or too small (too difficult at this moment)

➢ Sample 𝑠2 from 𝑠1, start from 𝑠2

𝑠2

𝑠1

Sparse RewardHierarchical

Reinforcement Learning

Hierarchical RL

➢ If lower agent cannot achieve the goal, the upper agent would get penalty.

➢ If an agent get to the wrong goal, assume the original goal is the wrong one.

校長教授

下面這個例子純屬虛構，跟真實的狀況完全不同

菸酒生

provide goal provide subgoal action


Acknowledgement

•感謝芮祥麟博士發現課程網頁上拼字的錯誤

PowerPoint 簡報speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2018/Lecture/Reward (v3).pdf · Title...

Documents

Transcript of PowerPoint 簡報speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2018/Lecture/Reward (v3).pdf · Title...