Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a...

Control of a Quadrotor with Reinforcement Learning

Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter

Robotic Systems Lab, ETH Zurich

Presented by Nicole McNabb

University of Waterloo

June 27, 2018

1 / 15

Overview

1 Introduction

2 The Method

3 Empirical Results

4 Summary and Future Work

2 / 15

Introduction

What is a quadrotor?

Figure: Quadrotor [1]

3 / 15

Introduction

What is a quadrotor?

Figure: Quadrotor [1]

High-level goal:

Train the quadrotor to performtasks with varying initializations

A policy optimization problem.

4 / 15

Introduction

Related Approaches

Deep Deterministic PolicyGradient (DDPG)

Actor-critic architecture

Off-policy, model-free

Deterministic

Insufficient exploration

Very slow (if any)convergence

Trust Region Policy Optimization(TRPO)

Actor-critic architecture

Off-policy, model-free

Stochastic

Computationally intensive

Slow, unreliable convergence

5 / 15

Introduction

A New Approach

Goal:A deterministic model with

Fast and stable convergence

Model-free training

Extensive exploration

Solution:A method combining the actor-critic architecture with an on-policydeterministic policy gradient algorithm and a new exploration strategy.

6 / 15

The Method

Continuous State-Action Space

State Space

18-D states, model:

Orientation (or rotation)

Position

Linear velocity of system

Angular velocity of system

Action Space

4-D actions, dictate rotor thrust for each rotor

7 / 15

The Method

Exploration

Figure: Exploration Strategy [2]

8 / 15

The Method

Network Training

Figure: Value Network [2]

Value function training:Approximate with Monte-Carlosamples obtained from currenttrajectory

Figure: Policy Network [2]

Policy optimization:Same idea as TRPO, replacingKL-divergence with Mahalanobismetric

9 / 15

The Method

Learning Algorithm

Algorithm 1 Policy optimization

1: Input: Initial value function approximation, initial policy2: for j = 1,2,. . . do3: Perform exploration, take action4: Compute MC estimates from current trajectory5: Do approximate value function update6: Do policy gradient update7: end for

10 / 15

Empirical Results

Training done in simulation

Testing on two main tasks done on a real quadrotor

11 / 15

Summary and Future Work

Summary

Primary contributions:

A new deterministic, model-free neural network policy for training aquadrotor

Stable and reliable performance on hard tasks, even under harshinitial conditions

12 / 15

Future Research

Also compare model against PPO

Introducing more accurate model of the system into simulation

Train an RNN to adapt to model errors automatically

13 / 15

References

https://www.seeedstudio.com/Crazyflie-2.0-p-2103.html

Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter

Control of a Quadrotor with Reinforcement Learning

IEEE Robotics and Automation Letters, June 2017.

14 / 15

Questions?

15 / 15

Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a...

Documents

Transcript of Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a...

The Oakland University Unmanned Aerial Quadrotor System · The Oakland University Unmanned Aerial Quadrotor System 1 ... The Oakland University Unmanned Aerial Quadrotor System 3

PCB Quadrotor Brushless

Control Quadrotor Helicopter

Quadrotor Program

Evolving a Facade-Servicing Quadrotor Ensamble

FlightControlandAttitudeEstimationofa Quadrotor · FlightControlandAttitudeEstimationofa Quadrotor João Café InstitutoSuperiorTécnico–IST jp_cafe@hotmail.com Abstract–Thisworkcompareslinearandnonlin

Quadrotor Network Processing - GitHub Pages

Quadrotor Modeling and Control - UiT

High-speed Autonomous Quadrotor Research Navigation ...

NOVEL QUADROTOR MANIPULATION SYSTEM …

Control of Quadrotor Aerial Vehicles Equipped with a ... of Quadrotor... · Control of Quadrotor Aerial Vehicles Equipped with a Robotic Arm ... effector of a robot manipulator mounted

Quadrotor UAV control

Jemin Hwangbo, Vassilios Tsounis, Hendrik Kolvenbach and ... · Jemin Hwangbo, Vassilios Tsounis, Hendrik Kolvenbach and Marco Hutter Abstract—This paper presents design and experimental

Quadrotor Power System Documentation

Based On AR - Weeblymzfrank.weebly.com/uploads/1/3/9/6/13967377/quadrotor...Quadrotor Control Project Final Year Project of GAO Mingze (No.2011955028) Control of the quadrotor motion

Aggressive Quadrotor Flight through Cluttered …groups.csail.mit.edu/robotics-center/public_papers/Landry15b.pdf · Aggressive Quadrotor Flight through Cluttered Environments Using

Introducing the Quadrotor Flying Robot - Meetupfiles.meetup.com/1366773/Quadrotor Meetup... · Introducing the Quadrotor Flying Robot Roy Brewer ... – ... Automatic or manual …

quadrotor Report

Quadrotor State Estimation - Carnegie Mellon University

Quadrotor for Indoor Surveillance and Reconnaissance Conference... · Quadrotor for Indoor Surveillance and Reconnaissance Steven Viska, ... •Obstacle Avoidance ... •Microsoft