UAIG: Second Fall 2013 Meeting. Agenda Introductory Icebreaker How to get Involved with UAIG? ...

UAIG: Second Fall 2013 Meeting

Agenda

Introductory Icebreaker

How to get Involved with UAIG? Discussion: Reinforcement Learning

Free Discussion

•Say your name and answer at least one of these questions:•If you were to change your name, what would you change your name to? Why?•Are you spring, summer, fall, or winter? Please share why.•What's your favorite material object that you already own?•What item, that you don't have already, would you most like to own?•If you were to create a slogan for your life, what would it be?

Introductory Icebreaker

•Come to our biweekly meetings

•Take charge of one of our meetings by presenting your own research, an interesting paper that you’ve read, or something else you might think is relevant (talk to us if you have ideas!).

•Organize an AI coding challenge or event

•If you do item 2 or 3, then we will appoint you as “Project Manager” and you will join the ranks of UAIG execs! ^_^

How to get Involved with

Reading for today’s meeting: “Reinforcement Learning: A Tutorial” by Harmon and Harmon. http://www.cs.toronto.edu/~zemel/documents/411/rltutorial.pdf . "The purpose of this tutorial is to provide an introduction to reinforcement learning (RL) at a level easily understood by students and researchers in a wide range of disciplines."

Discussion: RL

http://www.cs.toronto.edu/~zemel/documents/411/rltutorial.pdf

http://www.cs.toronto.edu/~zemel/documents/411/rltutorial.pdf

Reinforcement learning is not a type of neural network, nor is it an alternative to neuralnetworks. Rather, it is an orthogonal approach that addresses a different, more difficult question.Reinforcement learning combines the fields of dynamic programming and supervised learning to yield powerful machine-learning systems.

Definitions in the reading

•Dynamic Programming is a field of mathematics that has traditionally been used to solve problems of optimization and control.•Supervised learning is a general method for training a parameterized function approximator, such as a neural network, to represent functions.


•V*(xt) is the optimal value function •xt is the state vector•V(xt) is the approximation of the value•function•γ is a discount factor in the range [0,1] that causes immediate reinforcement to have more importance (weighted more heavily) than future reinforcement.•e(xt) is the error in the approximation of the value of the state occupied at time t.


•T is the terminal state. The true value of this state is known a priori. •In other words, the error in the approximation of the state labeled T, e(T), is 0 by definition.

•u is the action performed in state xt and causes a transition to state xt+1, •and r(xt, u) is the reinforcement received when performing action u in state xt.•Δwt is the change in the weights at time t….?


• One might use a neural network for the approximation V(xt ,wt) of V*(x), where wt is the parameter vector• A deterministic Markov decision process is one in which the state transitions are deterministic (an action performed in state xt always transitions to the same successor state xt+1 ).• Alternatively, in a nondeterministic Markov decision process, a probability distribution function defines a set of potential successor states for a given action in a given state.• α is the learning rate


• For the state/action pair (x ,u ) an advantage, A(xt, ut) is defined as the sum of the value of the state and the utility (advantage) of performing action u rather than the action currently considered best.• For optimal actions this utility is zero, meaning the value of the action is also the value of the state; • for sub-optimal actions the utility is negative, representing the degree of sub-optimality relative to the optimal action.


• K is a time unit scaling factor, • and <> represents the expected value over all possible results of performing action u in state xt to receive immediate reinforcement r and to transition to a new state xt+1.

• g is the sum of past gradients in equation (20)


Google these if you don’t understand them:•Markov chain•Markov decision process•Mean squared error•Monte Carlo rollout

More terms in the reading

^_^

Free Discussion

UAIG: Second Fall 2013 Meeting. Agenda Introductory Icebreaker How to get Involved with UAIG? ...

Documents

Transcript of UAIG: Second Fall 2013 Meeting. Agenda Introductory Icebreaker How to get Involved with UAIG? ...