Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved...
Transcript of Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved...
![Page 1: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/1.jpg)
Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles
James Doebbler, Monish D. Tandale, John ValasekTexas A&M University
Andrew J. MeadeRice University
AIAA Infotech@Aerospace Conference26-29 Sep 2005Washington, DC
![Page 2: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/2.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-2
Overview
Brief Introduction to Morphing Air VehiclesAdaptive-Reinforcement Learning Control ArchitectureSimplified Model of a Morphing Vehicle The Old Way– Reinforcement Learning Module– Structured Adaptive Model Inversion Control– Numerical Example
The New Way– Numerical Example
Results & ConclusionsFuture Work
![Page 3: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/3.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-3
Student Research Team2005 - 2006
![Page 4: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/4.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-4Aerospace Vehicle Systems Technology Program
Tech
nolo
gyPr
ogre
ss
Time
SOA Metal Supercritical Wing
AE Tailored Wet Composite Wing
Composite Blended Wing
Composite Wing & Fuselage VLA
Active Aero & Structures
Low-Boom SupersonicTransport(s)
Intelligent PersonalAir Vehicles
Bio/NanoSelf-Optimizing Aircraft
Advanced Concept Evolution
![Page 5: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/5.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-5
• Nanostructures: 100 times stronger than steel at 1/6 the weight
• Active flow control
• Distributed propulsion
• Electric propulsion, advanced fuel cells, high-efficiency electric motors
• Integrated advanced control systems and information technology
• Central “nervous system”and adaptive vehicle control
• Develop light, strong, and structurally efficient air vehicles.
• Improved aerodynamic efficiency.
• Design fuel-efficient, low-emission propulsion systems.
• Develop safe, fault-tolerant vehicle systems.
Today’s Challenges: Technology Solutions:
Revolutionary Vehicles–Technologies
Fuel Cell Propulsion
Active Flow Control
Adaptive Control
Nanotube
Electric Circuit
Anode Catalys
t
Exhaust
Cathode Catalyst
Fuel
PolymerElectrolyteMembrane
Air
![Page 6: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/6.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-6
WHEN to reconfigure
HOW to reconfigure
LEARNING to reconfigure
Big Picture Research Goals
![Page 7: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/7.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-7
Morphing for Mission Adaptation– Large scale, relatively slow, in-flight shape change
to enable a single vehicle to perform multiple diverse mission profiles
as opposed to:
Morphing for Control – In-flight physical or virtual shape change to achieve
multiple control objectives (maneuvering, flutter suppression, load alleviation, active separation control)
MissionAdaptation Control
John Davidson, NASA Langley, AFRL Morphing Controls Workshop – Feb 2004
Which Morphing?
![Page 8: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/8.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-8
Make Full Use Of The Physical Knowledge Of The Problem
??
Machine Learning
Reconfiguration Policy
learning
Adaptive Control
Parameters in a Known Functional Relationship
adapting
Learn When to Reconfigure With Machine Learning
Our Approach
![Page 9: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/9.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-9
Synthetic Jets for Virtual Shaping and Separation Control
MultiSensor MEMS Arrays for Flow Control Feedback
Adaptive Controller
Sensed Information Aggregation
Control Information Distribution
Reconfiguration Command Generation
System Performance Evaluation
Knowledge Base
Environment
Control Architecture
![Page 10: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/10.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-10
Adaptive-ReinforcementLearning Control (A-RLC)
Conceptual Control Architecturefor Reconfigurable Aircraft
SAMIStructured Adaptive Model Inversion
(Traditional Control)
Flight controller to handle wide variation in dynamic properties
due to shape change
RL Reinforcement Learning
(Intelligent Control)
Learn the morphing dynamics and the optimal shape at every flight
condition in real-time
![Page 11: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/11.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-11
Morphing Vehicle Model
Lockheed Martin
NextGen
![Page 12: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/12.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-12
2-D Plate Rectangular Block Ellipsoid Delta Wing Final2003 2004 2005 2006 Objective
Morphing Vehicle Evolution
![Page 13: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/13.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-13
Morphing Vehicle - TiiMY
ShapeEllipsoidal shape with varying axis dimensions.Constant volume (V) during morphing2 independent variables: y and z, dependent dimension
Morphing DynamicsSmart material: carbon nano-tubes or shape memory alloyMorphing Dynamics : Simple Nonlinear Differential Equations
6Vxyzπ
=
y-dimension 2z-dimension 3
y
z
y yy Vz zz V+ =+ =
![Page 14: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/14.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-14
Shape Morphing AnimationTiiMY
![Page 15: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/15.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-15
Morphing Time Histories
![Page 16: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/16.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-16
Optimal Shapes atVarious Flight Conditions
Optimality is defined by identifying a cost function.J=J (Current shape, Flight condition)
2 2
0.5
( ( )) ( ( ))
3 cos( ) and 2 22
y z y z
Fy z
J J J y S F z S F
S F S eπ −
= + = − + −
= + = +
![Page 17: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/17.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-17
6-DOF Mathematical Modelfor Dynamic Behavior
Variables
Nonlinear 6–DOF Equations– Kinematic level:– Acceleration level:
Drag Force– Function of air density, square of velocity along axis, and
projected area of the vehicle perpendicular to the axis
[ ] [ ]
[ ] [ ]
T Tc x y z c
T T
p d d d v u v w
p q rσ φ θ ψ ω
= =
= =
;
c l c a
c c d
d
p J v Jmv mv F F
I I I M M
σ ωω
ω ω ω ω
= =+ = +
+ + = +
additional dynamics due to morphing
![Page 18: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/18.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-18
Reinforcement Learning
![Page 19: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/19.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-19
1. Actor takes action based upon states and preference function2. Critic updates state value function, and evaluates action3. Actor updates preference function
Learning is done repetitively, by subjecting to different scenarios
Reinforcement Learningself training
![Page 20: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/20.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-20
Actor-critic method– On policy, TD method– The actor selects actions
Preference of an action:Greedy Policy:
Gibbs softmax policy:
– The critic criticize the actionsState value functions:TD error:
– Strengthen or weaken the tendency to select one action
),( asp
Policy
Value Function
Environment
Actor
Critic
Cost (Jy,Jz)
States (F,y,z)
Actions (Vy,Vz)
TD error
),(maxarg),( aspasa
t =π
∑==== ),(
),(
}Pr{),( aspa
asp
ttt eessaaasπ
),( asp
)()( 11 tttt sVsVr −+= ++ γδ)( tsV
ttttt aspasp βδ+← ),(),(
Reinforcement Learning
![Page 21: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/21.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-21
Q-LearningOnly has to learn the action value function Q(s,a)– How well agent performs action a in state s under policy pi
Q learning is an off-policy temporal difference methodProven convergence
Q(s,a)
asQasQrasQasQsra
Q(s,a)sa
s
Q(s,a)
a
return terminalis s until ss
)],(),(max[),(),( , observe ,action Take
policy)greedy - //(e.g., from derivedpolicy using from Choose
:episode) of stepeach (for Repeat Initialize
:episode)each (for Repeat yarbitraril Initialize
′←−′′++←
′
′γα
ε
Learning()-Q
![Page 22: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/22.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-22
Function Approximation K Nearest Neighbor Policy Iteration
The shape of the vehicle is on continuous domainsUse K-nearest neighbors method to approximate the action-value function Q(s,a)– Collect a set of state-action pair samples– Compute optimal action-values of these samples using Q-learning– The action-value of a new state-action pair is the interpolation of those
of its K nearest neighbors. – Takes a weighted average of the K nearest neighbors in the sampled
state spaceSusceptible to absence of accurate information near the desired point
),,,(
),(),(
to gneighborinnearest K ofset the},{ find ,),(For
),(Learn Collect
1 0000
000
∑=
=K
n ii
iisample
i
iisample
asasasQ
asQ
sssasQ
asQSample
Distance
KNNPI
![Page 23: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/23.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-23
Structured AdaptiveModel Inversion Control
![Page 24: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/24.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-24
Reference model
- e
my
model reference adaptive control
Controller Plantr yu
Adaptation Law
Estimated parameters
Adaptive Control
![Page 25: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/25.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-25
Structured Model ReferenceAdaptive Control
Akella, Schaub, Junkins (Texas A&M)
Dynamics
2nd order differential equations
Exact kinematic relationship between position and velocity
Acceleration level relationships between
forces and system parameters
x v=
Fv am
= =
F m a=
![Page 26: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/26.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-26
Features– Dynamic inversion inner-loop, with an MRAC outer-loop to
handle system uncertainties.– Controls are solved for explicitly:
– Undesirable dynamics are cancelled and replaced with user specified desired dynamics.
– Easily applicable to nonlinear systems.– Error dynamics can be specified.– Shown to be very effective for a wide variety of systems.
,
1
S y s te m M o d e l ( )R e fe re n c e T ra je c to ry
C o n tro l L a w ( ( ) ) s o th a t th ee r ro r d y n a m ic s b e c o m e s
r r
r
x A f x B ux x
u B x A f x ee e
λλ
−
= +
= − −= −
Structured Adaptive Model InversionSubbarao, Junkins (Texas A&M)
![Page 27: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/27.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-27
Structured Adaptive ModelInversion Features
Trajectory Tracking for Dynamic SystemsPlant:– Nonlinear in states, affine in control, uncertain parameters appear linearly.
Control:– Dynamic Inversion and Sliding Mode Control.– Dynamic Inversion requires knowledge of system parameters, which are
inherently uncertain. Adaptive Learning Parameters:– Updated in real-time, and used for the Dynamic Inversion
Adaptation Mechanism:– Driven by the error between the actual plant trajectory and the reference
trajectory Stability Analysis:– Guarantees that the plant trajectory, asymptotically converges to the reference
trajectory in the presence of Parametric Uncertainties, and initial condition errors.
![Page 28: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/28.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-28
Structured Model & Minimal Parameterization
Kinematic Level Model Acceleration level Model
c l c
a
p J vJσ ω==
c c d
d
mv mv F F
I I I M M
ω
ω ω ω ω
+ = +
+ + = +
Attitude Control * *( ) ( , ) ( )Ta a aI C P Mσ σ σ σ σ σ+ =
11
2211 12 13 1 1 2 3
3312 22 23 2 2 1 3
1213 23 33 3 3 1 2
13
23
0 0 00 0 00 0 0
II
I I I a a a aI
I I I a a a aI
I I I a a a aII
⎡ ⎤⎢ ⎥⎢ ⎥⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥= ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Minimal Parametrization of the Inertia Matrix
![Page 29: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/29.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-29
Control Law & Update Law
* *( ) ( , ) ( , , )a a aI C Yσ σ σ σ σ σ σ σ θ+ =
Using Minimal Parametrization of the Inertia Matrix
unknown parameters
With the control law ˆ{ ( , , , ) }Ta a r r da daM P Y C Kσ σ σ σ θ ε ε−= − −
the closed loop dynamics take the form* *{ ( , )} ( , , )a da a da aI C C K Yε σ σ ε ε σ σ σ θ+ + + =
and, along the adaptive law ˆ ( , , , )Ta r rYθ σ σ σ σ ε=−Γ
guarantees asymptotic stability of the tracking errors
![Page 30: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/30.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-30
Adaptive–Reinforcement Learning Control
![Page 31: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/31.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-31
A-RLC Architecture
![Page 32: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/32.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-32
0 20 40 60 80 1000
1
2
3
4
5
6
Flight Path Distance
Flig
ht C
ondi
tion
Objective– Demonstrate optimal
shape morphing for multiplespecified flight conditions
Method– For every flight condition, learn
optimal policy that commands voltage producing the optimal shape
– Minimize total cost over the entire flight trajectory– Evaluate the learning performance after 200 learning episodes
Numerical Example
RL Module is Completely Ignorant of Optimality Relations and Morphing Control Functions:
It Must Learn On Its Own, From Scratch
![Page 33: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/33.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-33
Agent: Morphing Air Vehicle Reinforcement Learning Module
Environment: Various flight conditions
Goal: Fly in optimal shape that minimizes cost
States: Flight condition; shape of vehicle
Actions: Discrete voltages applied to change shape of vehicle– Action set:
Rewards: Determined by cost functions
Optimal control policy: Mapping of the state to the voltage leading to the optimal shape
reinforcement learning definitions
( , ) [2,4] [0,5]( , ) [2,4] [0,5]y F
Sz F
×⎧ ⎫= ⎨ ⎬×⎩ ⎭
{ }( , ) [0, 0.5, 1, 5] ; [0, 0.5, 1, 4]y zA V V= … …
Numerical Example
![Page 34: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/34.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-34
Learning Process
0
5
23
4−5
0
5
F
20 Episodes
y dimension
Err
or V
y
0
5
23
4−5
0
5
F
60 Episodes
y dimension
Err
or V
y
0
5
23
4−5
0
5
F
100 Episodes
y dimension
Err
or V
y
0
5
23
4−5
0
5
F
200 Episodes
y dimension
Err
or V
y
Error in the Action Preference Function after:
![Page 35: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/35.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-35
Example: Old Way
![Page 36: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/36.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-36
Comparison of True Optimal Shape and Learned Shape
KNN learns poorly forseveral flight conditions
![Page 37: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/37.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-37
Time Histories of Angular States
![Page 38: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/38.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-38
Time Histories of Adaptive Parameters
![Page 39: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/39.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-39
Trajectory Tracking Controls
![Page 40: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/40.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-40
Morphing Control Voltages
![Page 41: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/41.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-41
F Still Equals d(mv)/dt– Stiffness, frequency, damping ratio, and cost function have a major
effect on learning times and learning performance.– Slowest element in system is critical
Some Assembly Required: Tuning– Flight condition transitions, morphing dynamics, learning performance,
and adaptive control must be balanced to achieve good performance.– Coordination and timing is everything
What Happened? (1)
![Page 42: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/42.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-42
Function Approximation– Errors remained which could not be eliminated with additional training.
– Use Galerkin-based Sequential Function Approximation (SFA) to approximate the action-value function Q(s,a)
What Happened? (2)
![Page 43: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/43.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-43
Galerkin-Based Sequential Function Approximation (SFA)
Ideal for approximating sparse multi-dimensional scattered dataAdaptive, no ad-hoc user parameters to adjustProvides information on the sensitivity to the inputsMatrix construction and evaluations are avoidedComputational cost is reduced while efficiency is enhanced by the low-dimensional unconstrained optimization
![Page 44: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/44.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-44
Example: New WaySFA learns optimal shape well
![Page 45: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/45.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-45
ResultsNormalized RMS error
Y dimension Z dimension
K N N 1.42 0.821
S F A 1.27 0.661
10% reduction 20% reduction
![Page 46: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/46.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-46
Improved function approximation provided large improvement in performance– Galerkin-based Sequential Function Approximation method learns
optimal shape well
Shape Changes for Mission Morphing can be treated as piecewise constant parameter changes– SAMI is a favorable method for trajectory tracking control
Morphing for Control will require different control strategy– Piecewise constant approximation no longer valid
Conclusions
![Page 47: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/47.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-47
Future Research 1
Modify the morphing dynamics to represent SMA actuators. – Hysteretic behavior
Investigate control methodologies to handle faster shape changes– Linear Parameter Varying (LPV) control
Investigate other function approximation methods– Radial Basis Functions (GLO-MAP)
Modify the simulation to include a more advanced aircraft model– Wing-Body, Wing-Body-Empennage, etc.
![Page 48: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/48.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-48
STATE OF THE ART:Multiple antennas must be mounted
on spacecraft to accommodate various ground station signals
RESEARCH GOALS:Demonstrate feasibility of a reconfigurableantenna design that utilizes Reinforcement Learning to independently achieve optimal shape through use of SMA actuators
BENEFITS:A single antenna capable of altering its geometry to achieve world-wide compatibility between receivers and transmitters
Morphing Space Based Antenna
Future Research 2
![Page 49: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/49.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-49
Future Research 3
Directly learn voltage inputs required to achieve certain position states with time dependency.– Skips mathematical modeling and computer simulation steps – Avoids modeling and simulation errors
![Page 50: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/50.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-50
Questions?
![Page 51: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish](https://reader030.fdocuments.us/reader030/viewer/2022041022/5ed400778d46b66d22634066/html5/thumbnails/51.jpg)
Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-51
Reinforcement Learning
Parameterized functional form
Gradient descent learning
The Temporal Displacement error drives all learning in both actor and critic
∑=
Φ=N
jjtjvt ssV
1)()( θπ ∑
=
Φ=N
jjtjpta ssp
a1
)()( θ
Ttttp
Tttpttttatptp δsp
aaaaHθHθHθθ αββδα +=−++=
−−− 111))((
Ttttv
Tttvttvtv y HθHθHθθ αδα +=−+= −−− 111 )~(
tδ
function approximation