Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James...
Transcript of Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James...
![Page 1: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/1.jpg)
Target-driven Visual Navigation in Indoor Scenes Using Deep Reinforcement Learning
[Zhu et al. 2017]
A. James E. [email protected]
University of WaterlooJune 27, 2018
(Note: Videos in the slide deck used in presentation have been replaced with YouTube links).
REINFORCEMENT LEARNING FOR ROBOTICS
![Page 2: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/2.jpg)
TARGET-DRIVEN VISUAL NAVIGATION IN INDOOR SCENES USING DEEP REINFORCEMENT LEARNING – Paper Authors
Yuke Zhu1
Roozbeh Mottaghi2Eric Kolve2
Joseph J. Lim1
Abhinav Gupta2,3
Li Fei-Fei1Ali Farhadi2,4
1Stanford University2Allen Institute for AI3Carnegie Mellon University4University of Washington
![Page 3: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/3.jpg)
MOTIVATION – Navigating the Grid World
Free Space:Occupied Space: Agent: Goal: Danger:
![Page 4: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/4.jpg)
MOTIVATION – Navigating the Grid World – Assign Numerical Rewards
Free Space:Occupied Space: Agent: Goal: Danger:
Can assign numerical rewards to the goal and the dangerous grid cells.
+1
-100
![Page 5: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/5.jpg)
MOTIVATION – Navigating the Grid World – Learn a Policy
Free Space:Occupied Space: Agent: Goal: Danger:
Can assign numerical rewards to the goal and the dangerous grid cells.
Then learn a policy.
![Page 6: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/6.jpg)
MOTIVATION – Visual Navigation Applications: Treasure Hunting
• Robotic visual navigation using robots has many applications.
• Treasure hunting.
![Page 7: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/7.jpg)
MOTIVATION – Visual Navigation Applications: Robot Soccer
• Robotic visual navigation using robots has many applications.
• Treasure hunting.• Soccer robots getting to
the ball first.
![Page 8: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/8.jpg)
MOTIVATION – Visual Navigation Applications: Pizza Delivery
• Robotic visual navigation using robots has many applications.
• Treasure hunting.• Soccer robots getting to
the ball first.• Drones delivering pizzas
to your lecture hall.
![Page 9: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/9.jpg)
MOTIVATION – Visual Navigation Applications: Self-Driving Cars
• Robotic visual navigation using robots has many applications.
• Treasure hunting.• Soccer robots getting to
the ball first.• Drones delivering pizzas
to your lecture hall.• Autonomous cars driving
people to their homes.
![Page 10: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/10.jpg)
MOTIVATION – Visual Navigation Applications: Search and Rescue
• Robotic visual navigation using robots has many applications.
• Treasure hunting.• Soccer robots getting to
the ball first.• Drones delivering pizzas
to your lecture hall.• Autonomous cars driving
people to their homes.• Search and rescue robots
finding missing people.
![Page 11: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/11.jpg)
MOTIVATION – Visual Navigation Applications: Domestic Robots
• Robotic visual navigation using robots has many applications.
• Treasure hunting.• Soccer robots getting to
the ball first.• Drones delivering pizzas
to your lecture hall.• Autonomous cars driving
people to their homes.• Search and rescue robots
finding missing people.• Domestic robots
navigating their way around houses to help with chores.
![Page 12: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/12.jpg)
PROBLEM – Domestic Robot: Setup
• Multiple locations in an indoor scene that our robot must navigate to.
• Actions consist of moving forwards and backwards and turning left and right.
![Page 13: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/13.jpg)
PROBLEM – Domestic Robot: Problems
• Multiple locations in an indoor scene that our robot must navigate to.
• Actions consist of moving forwards and backwards and turning left and right.
• Problem 1: Navigating to multiple targets.
• Problem 2: Using high-dimensional visual inputs is challenging and time consuming to train.
• Problem 3: Training on a real robot is expensive.
![Page 14: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/14.jpg)
PROBLEM – Multi Target: Can’t We Just Use Q-Learning?
• We can already navigate grid mazes using Q-learning by assigning rewards for finding a target.
• Assigning rewards to multiple locations on the grid does not allow specification of different targets.
![Page 15: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/15.jpg)
PROBLEM – Multi Target: Can’t We Just Use Q-Learning? Let’s Try It
• We can already navigate grid mazes using Q-learning by assigning rewards for finding a target.
• Assigning rewards to multiple locations on the grid does not allow specification of different targets.
+1+1
+1
![Page 16: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/16.jpg)
PROBLEM – Multi Target: Can’t We Just Use Q-Learning? Uh-oh
• We can already navigate grid mazes using Q-learning by assigning rewards for finding a target.
• Assigning rewards to multiple locations on the grid does not allow specification of different targets.
• Would end up at a target, but not any specific target.
+1+1
+1
![Page 17: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/17.jpg)
PROBLEM – Multi Target: A Policy for Every Target
• We can already navigate grid mazes using Q-learning by assigning rewards for finding a target.
• Assigning rewards to multiple locations on the grid does not allow specification of different targets.
• Would end up at a target, but not any specific target.
• Could train multiple policies, but that wouldn’t scale with the number of targets.
![Page 18: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/18.jpg)
PROBLEM – From Navigation to Visual Navigation
Image Action
Sensor Image
Target Image
Step and Turn
![Page 19: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/19.jpg)
Image PlanningWorld ModellingPerception Action
PROBLEM – Visual Navigation Decomposition: Overview
“Autonomous Mobile Robots” by Roland Siegwart and Illah R. Nourbakhsh (2011)
The visual navigation problem can be broken up into pieces with specialized
algorithms solving each piece.
![Page 20: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/20.jpg)
PROBLEM – Visual Navigation Decomposition: Image
Image PlanningWorld ModellingPerception Action
![Page 21: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/21.jpg)
PROBLEM – Visual Navigation Decomposition: Perception – Localization and Mapping
Image PlanningWorld ModellingPerception Action
![Page 22: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/22.jpg)
PROBLEM – Visual Navigation Decomposition: Perception – Object Detection
Image PlanningWorld ModellingPerception Action
![Page 23: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/23.jpg)
PROBLEM – Visual Navigation Decomposition: World Modelling
Image PlanningWorld ModellingPerception Action
![Page 24: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/24.jpg)
PROBLEM – Visual Navigation Decomposition: Planning – Choosing the End Position
Image PlanningWorld ModellingPerception Action
![Page 25: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/25.jpg)
PROBLEM – Visual Navigation Decomposition: Planning – Searching for a Path
Image PlanningWorld ModellingPerception Action
![Page 26: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/26.jpg)
PROBLEM – Visual Navigation Decomposition: Action
Image PlanningWorld ModellingPerception Action
![Page 27: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/27.jpg)
PROBLEM – Visual Navigation Decomposition: Result
Image PlanningWorld ModellingPerception Action
This decomposition is effective but each step requires a different algorithm.
![Page 28: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/28.jpg)
PROBLEM – Visual Navigation with Reinforcement Learning
Design a deep reinforcement learning architecture to handle visual navigation
from raw pixels.
Image Reinforcement Learning Action
![Page 29: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/29.jpg)
PROBLEM – Robot Learning: Reinforcement Learning with a Robot
Image Reinforcement Learning Action
![Page 30: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/30.jpg)
PROBLEM – Robot Learning: Data Efficiency and Transfer Learning
Idea: Train in simulation first, then fine tune learned policy on real robot.
Image Reinforcement Learning Action
![Page 31: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/31.jpg)
PROBLEM – Robot Learning: Goal
Design a deep reinforcement learning architecture to handle visual navigation
from raw pixels with high data-efficiency.
Image Reinforcement Learning Action
![Page 32: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/32.jpg)
IMPLEMENTATION DETAILS – Architecture Overview
• Similar to actor-critic A3C method which outputs a policy and value running multiple threads.
• Train a different target on each thread rather than copies of same target.
• Use fixed ResNet-50 pretrained on ImageNet to generate embedding for observation and target.
• Fuse the embeddings into a feature vector to get an action and a value.
![Page 33: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/33.jpg)
IMPLEMENTATION DETAILS – AI2-THOR Simulation Environment
• The House Of inteRactions (THOR)
• 32 scenes of household environments.
• Can freely move around a 3D environment.
• High fidelity compared to other simulators.
AI2-THORVirtual KITTISynthia
Video: https://youtu.be/SmBxMDiOrvs?t=18
![Page 34: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/34.jpg)
IMPLEMENTATION DETAILS – Handling Multiple Scenes
• Single layer for all scenes might not generalize as well.
• Train a different set of weights for every different scene in a vine-like model.
• Discretized the scenes with constant step length of 0.5 m and turning angle of 90°, effectively a grid when training and running for simplicity.
⋮
![Page 35: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/35.jpg)
EXPERIMENTS – Navigation Results: Performance Of Our Method Against Baselines
TABLE I
![Page 36: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/36.jpg)
EXPERIMENTS – Navigation Results: Data Efficiency – Target-driven Method is More Data Efficient
![Page 37: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/37.jpg)
EXPERIMENTS – Navigation Results: t-SNE Embedding – It Learned an Implicit Map of the Environment
Bird’s eye view t-SNE Visualization of Embedding Vector
![Page 38: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/38.jpg)
EXPERIMENTS – Generalization Across Targets: The Method Generalizes Across Targets
![Page 39: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/39.jpg)
EXPERIMENTS – Generalization Across Scenes: Training On More Scenes Reduces Trajectory Length Over Training Frames
![Page 40: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/40.jpg)
EXPERIMENTS – Method Works In Continuous World
• The method was evaluated with continuous action spaces.
• Robot could now collide with things and its movement had noise no longer always aligning with a grid.
• Result: Required 50M more training frames to train on a single target.
• Could reach door in average of 15 steps.
• Random agent took 719 steps.
Video: https://youtu.be/SmBxMDiOrvs?t=169
![Page 41: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/41.jpg)
EXPERIMENTS – Robot Experiment: Video
Video: https://youtu.be/SmBxMDiOrvs?t=180
![Page 42: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/42.jpg)
EXPERIMENTS – Summary of Contributions and Results
• Designed new deep reinforcement learning architecture using siamese network with scene-specific layers.
• Generalizes to multiple scenes and targets.
• Works with continuous actions.
• The trained policy in simulation can be run in the real world.
Video: https://youtu.be/SmBxMDiOrvs?t=98
![Page 43: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/43.jpg)
CONCLUSIONS – Future Work
• Physical interaction and object manipulation.
• Situation: Moving obstacles out of path.
• Situation: Opening containers for objects.
• Situation: Turning on the lights when the robot enters a dark room and can’t see.
![Page 44: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/44.jpg)
CREDIT
Thanks to “Twemoji” from Twitter used under license CC-BY 4.0.• “School” image changed to carry University of Waterloo logo.• “Quadcopter” created from “Helicopter” and “Robot Face” images.• “Goose” created from “Duck” image.• “Red Robot” created from “Robot” image.• “Dumbbell” created from “Nuts and Bolt” image.
Thank You
![Page 45: Learning from Experienceppoupart/teaching/cs885-spring...Learning from Experience Author James Cagalawan Created Date 6/27/2018 1:34:22 PM ...](https://reader034.fdocuments.us/reader034/viewer/2022050307/5f6f6db4687fa65a87173e0e/html5/thumbnails/45.jpg)
RECAP
• Motivation• Navigating the Grid World, Assign Numerical Rewards, Extract a Policy• Applications: Treasure Hunting, Robot Soccer, Pizza Delivery, Self-Driving Cars, Search and Rescue,
Domestic Robots• Problem
• Domestic Robot• Multi Target: Can’t We Just Use Q-Learning?• From Navigation to Visual Navigation• Visual Navigation Decomposition: Image, Perception, World Modelling, Planning, Action, Results• Goal of Visual Navigation• Robot Learning Considerations
• Neural Network Architecture• Embedding of Scene and Target using ResNet• Siamese Network• Scene-Specific Layers
• AI2-THOR Simulator• 32 household scenes with interactions, Trained on discretized representation with no interaction required
• Experiments• Shorter Average Trajectory Length, More Data Efficient, Training on More Targets Improves Success,
Training on More Scenes Reduces Trajectory Length, Works in Continuous World, Transfer Learning Works• Future Work
• More Sophisticated Interaction