Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, &...

Verve: A General Purpose Open Source

Reinforcement Learning Toolkit

Tyler Streeter, James Oliver, & Adrian Sannier

ASME IDETC & CIE, September 13, 2006

MotivationIntelligent agents are becoming increasingly important.

Motivation

• Most intelligent agents today are carefully designed for very specific tasks

• Ideally, we could avoid a lot of work by letting the agents train themselves

• Goal: provide a general purpose agent implementation based on reinforcement learning

• Target audience: Application developers (especially roboticists and game developers)

Reinforcement Learning

• Learning how to behave in order to maximize a numerical reward signal

• Very general: lots of real-world problems can be formulated as reinforcement learning problems

Reinforcement Learning

• Typical challenges:– Temporal credit assignment– Structural credit assignment– Exploration vs. exploitation– Continuous state spaces

• Solutions:– TD learning with value function and policy represented as

single-layer neural networks– Eligibility traces for connection weights– Softmax action selection– Function approximation with Gaussian radial basis

functions

RL Agent Implementation

• Value function: maps states to “values”

• Policy: maps states to actions

• State representation converts observations to features (allows linear function approximation methods for value function and policy)

• Temporal difference (TD) prediction errors train value function and policy

RBF State Representation

Verve Software Library• Cross-platform library written in C++ with

Python bindings• License: BSD or LGPL• Unit tested, heavily-commented source

code• Complete API documentation• Widely applicable: user-defined sensors,

actuators, sensor resolution, and reward function

• Optimized to reduce computational requirements (e.g., dynamically-growing RBF array)

http://verve-agents.sourceforge.net

Free Parameters

• Inputs– Number of sensors– Choice of discrete or continuous (RBF)– Continuous sensor resolution– Circular continuous sensors

• Number of outputs• Reward function• Agent update rate (step size)• Learning rates• Eligibility trace decay time constant• Reward discounting time constant

C++ Code Sample (1/3)// Define an AgentDescriptor.verve::AgentDescriptor agentDesc;agentDesc.addDiscreteSensor(4); // Use 4 possible values.agentDesc.addContinuousSensor();agentDesc.addContinuousSensor();agentDesc.setContinuousSensorResolution(10);agentDesc.setNumOutputs(3); // Use 3 actions.

// Create the Agent and an Observation initialized to fit this Agent.verve::Agent agent(agentDesc);verve::Observation obs;obs.init(agent);

// Set the initial state of the world.initEnvironment();

C++ Code Sample (2/3)// Loop forever (or until some desired learning performance is achieved).while (1){

// Set the Agent and environment update rate to 10 Hz.verve::real dt = 0.1;

// Update the Observation based on the current state of the world. // Each sensor is accessed via an index.obs.setDiscreteValue(0, computeDiscreteInput());obs.setContinuousValue(0, computeContinuousInput0());obs.setContinuousValue(1, computeContinuousInput1());

// Compute the current reward, which is application-dependent.verve::real reward = computeReward();

// Update the Agent with the Observation and reward.unsigned int action = agent.update(reward, obs, dt);

C++ Code Sample (3/3)// Apply the chosen action to the environment.switch(action){

case 0:performAction0();break;

default:break;

// Simulate the environment ahead by 'dt' seconds.updateEnvironment(dt);

Examples

• 2D Maze

• Pendulum swing-up

• Cart-pole/inverted pendulum

2D Maze Task

Pendulum Swing-Up Task

Pendulum Neural Networks

Cart-Pole/Inverted Pendulum Task

Experimental Feature - Planning

• Planning: training the value function and policy from a learned model of the environment (i.e. reinforcement learning from simulated experiences)

• Reduces training time significantly

2D Maze Task with Planning

Experimental Feature - Curiosity

• Curiosity: an intrinsic drive to explore unfamiliar states

• Provide extra rewards proportional to uncertainty or “learning progress”

• Drives agents to improve mental models of the environment (used for planning)

Multiple Rewards Task with Curiosity

Future Work

• The exhaustive RBF state representation is too slow for high-dimensional state spaces. Possible solutions: dimensionality reduction (e.g., using PCA or ICA), hierarchical state and action representations, and focused attention

• Temporal state representation (e.g., tapped delay lines)

Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, &...

Documents

Transcript of Verve: A General Purpose Open Source Reinforcement Learning Toolkit Tyler Streeter, James Oliver, &...

ASME IDETC LFC Paper, Final

The Hammond Streeter

9471519 Fluid Transients Wiley and Streeter

Mecanica de Fluidos 8th Edicion Victor L Streeter

090508 ASME IDETC LFC Paper, Final Draftweb.mit.edu/awinter/www/Documents/090508 ASME IDETC LFC...Title Microsoft Word - 090508 ASME IDETC LFC Paper, Final Draft.doc Author Amos Created

James Yang, Ph.D. - TTU€¦ · xASME IDETC/CIE Advanced Vehicle Technology Best Student Paper Award 2016 xASME IDETC/CIE Advanced Modeling and Simulation Best Paper Award 2015 xChancellors’

KEKER & VAN NEST LLP JON STREETER - #101970

Streeter Work Samples 2016

Resume LOU SANNIER

Diversity Grant Report Streeter - JMU

Streeter 2014 P & T Catalog FINAL · 2014-05-13 · STREETER SUPER STANDS - PIT & TRAILER ACCESSORIES STREETERSUPERSTANDS.COM STREETER SHORTY BIG FOOT The Shorty Big Foot Stand is

URBAN ATLAS PRODUCTION First Results Dr Christophe Sannier

Proceedings of IDETC/CIE 2005 2005 ASME 2005 International

Fluid transients by wiley & streeter

2018 Project Highlights THE STREETER WOODS REPORTER D ...

halcottcenter.files.wordpress.com · 2016-04-12 · Streeter from 1939 to 1940. Ethel Streeter served from 1940 to 1968; Willa Belle Streeter was acting postmaster from 1968 to 1970.

Streeter Solution Manual

Streeter y Phelps

STREETER I GREENE AND STEUBEN COUNTIES …streeter/genealogy/streeter.pdf · STREETER IMMIGRANTS OF GREENE AND STEUBEN COUNTIES, NEW YORK Page 1 of 21 As published in The New York

The Nature of Change: David Streeter