Augmented Reality in Education at Augmented Reality Planet November 2014
Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters:...
Transcript of Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters:...
![Page 1: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/1.jpg)
Augmented Random Search
Presenters:
1. Gautam Suryawanshi
2. Prajit Krisshna Kumar
Reinforcement Learning – CSE 510
![Page 2: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/2.jpg)
Model free RL algorithms
• Gives solutions for controlling dynamical systems without need of actual physical models
• This systems successfully learn to play video games or games like GO and chess
• Not deployed in real world physical systems
![Page 3: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/3.jpg)
Problems
• We need to find a best method
• Studies indicated that several RL methods are not robust to changes in parameters.
• Small change affects them a lot
• Is not trustable to deploy in real world that needs to control 100s of motors
![Page 4: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/4.jpg)
New directions
• Evolution Strategies – derivative free optimization method, parallelized training
• Natural policy gradients for training linear policies
• ARS is combination of both
![Page 5: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/5.jpg)
History
• Published in March, 2018 by a team from University at California, Berkeley
• ARS is enhanced version of Basic random search
BRS:
• Policy = π𝜃
• We add +𝛎𝜹 and -𝛎𝜹 to existing policy(v<1 and it is noise) 𝜹 is random number from normal distribution
• Apply the actions and get the rewards
• Update the policy using 𝜃ʲ⁺¹ = 𝜃ʲ + 𝝰.Δ
• Where Δ = 1/N * Σ[r(𝜃+𝛎𝜹) - r(𝜃-𝛎𝜹)]𝜹
![Page 6: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/6.jpg)
![Page 7: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/7.jpg)
Motivation
One of the most mind-blowing algorithms in reinforcement learning,
Up to 15 TIMES FASTER than other algorithms with higher rewards in specific applications.
Does not require Deep Learning.
![Page 8: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/8.jpg)
Algorithm
![Page 9: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/9.jpg)
Simplified Explanation
• Add Random Noise(𝛿) to the weights ϴ.
• Run a test.
• If reward improves keep the weights.
• Otherwise discard.
![Page 10: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/10.jpg)
Method of Finite Differences
• Generate a random noise(𝛿) of the same shape of the weights (ϴ)
• Clone two versions of our weights.
• Add the noise to ϴ[+], subtract from ϴ[-]
• Test both versions for one episode each, collect r[+] , r[-]
• Update the weights ϴ += α(r[+] – r[-]). 𝛿
• Test and repeat for maximum performance.
![Page 11: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/11.jpg)
• Generate num_deltas deltas and evaluate positive and negative.
• Run num_deltas episodes with positive and negative variations.
• Collect rollouts as (r[+],r[-],delta) tuples.
• Calculate the standard deviation of all rewards.
• Sort the rollouts by maximum reward and select the best num_best_deltas rollouts.
• Step = sum((r[+] – r[-])*delta ), for each best rollout.
• Theta += learning_Rate/(num_best_deltas*sigma_rewards)*step
• Evaluate: play an episode with the new weights to measure improvement.
Training Loop
![Page 12: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/12.jpg)
Results
• Episode 1
![Page 13: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/13.jpg)
Results
• Episode 1000
![Page 14: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/14.jpg)
Comparison with DDPG
• At 1000 episode
![Page 15: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/15.jpg)
Comparison with PPO
• At 1000 episode
![Page 16: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/16.jpg)
![Page 17: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/17.jpg)
Comparing rewards
ARS DDPG PPO
![Page 18: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/18.jpg)
• Time for ARS to run : 5630 seconds
• Time for PPO to run : 2142 seconds
• Time for DDPG to run : 9270 seconds
![Page 19: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/19.jpg)
Humanoid
![Page 20: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/20.jpg)
Bipedal Walker
![Page 21: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/21.jpg)
Results of Swimmer
![Page 22: Augmented Random Search - cse.buffalo.eduavereshc/rl_spring20/...Augmented Random Search Presenters: 1. Gautam Suryawanshi 2. Prajit Krisshna Kumar Reinforcement Learning –CSE 510](https://reader034.fdocuments.us/reader034/viewer/2022050301/5f6a836e943b8c00767c57a8/html5/thumbnails/22.jpg)
References1. https://towardsdatascience.com/introduction-to-augmented-random-
search-d8d7b55309bd
2. https://arxiv.org/pdf/1803.07055.pdf)
3. Codes for DDPG and PPO from https://github.com/Anjum48/rl-examples/