UAIG: Second Fall 2013 Meeting. Agenda Introductory Icebreaker How to get Involved with UAIG? ...

14
UAIG: Second Fall 2013 Meeting

Transcript of UAIG: Second Fall 2013 Meeting. Agenda Introductory Icebreaker How to get Involved with UAIG? ...

Page 1: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

UAIG: Second Fall 2013 Meeting

Page 2: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

Agenda

Introductory Icebreaker

How to get Involved with UAIG? Discussion: Reinforcement Learning

Free Discussion

Page 3: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

•Say your name and answer at least one of these questions:•If you were to change your name, what would you change your name to? Why?•Are you spring, summer, fall, or winter? Please share why.•What's your favorite material object that you already own?•What item, that you don't have already, would you most like to own?•If you were to create a slogan for your life, what would it be?

Introductory Icebreaker

Page 4: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

•Come to our biweekly meetings

•Take charge of one of our meetings by presenting your own research, an interesting paper that you’ve read, or something else you might think is relevant (talk to us if you have ideas!).

•Organize an AI coding challenge or event

•If you do item 2 or 3, then we will appoint you as “Project Manager” and you will join the ranks of UAIG execs! ^_^

How to get Involved with

Page 5: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

Reading for today’s meeting: “Reinforcement Learning: A Tutorial” by Harmon and Harmon. http://www.cs.toronto.edu/~zemel/documents/411/rltutorial.pdf . "The purpose of this tutorial is to provide an introduction to reinforcement learning (RL) at a level easily understood by students and researchers in a wide range of disciplines."

Discussion: RL

Page 6: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

Reinforcement learning is not a type of neural network, nor is it an alternative to neuralnetworks. Rather, it is an orthogonal approach that addresses a different, more difficult question.Reinforcement learning combines the fields of dynamic programming and supervised learning to yield powerful machine-learning systems.

Definitions in the reading

Page 7: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

•Dynamic Programming is a field of mathematics that has traditionally been used to solve problems of optimization and control.•Supervised learning is a general method for training a parameterized function approximator, such as a neural network, to represent functions.

Definitions in the reading

Page 8: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

•V*(xt) is the optimal value function •xt is the state vector•V(xt) is the approximation of the value•function•γ is a discount factor in the range [0,1] that causes immediate reinforcement to have more importance (weighted more heavily) than future reinforcement.•e(xt) is the error in the approximation of the value of the state occupied at time t.

Definitions in the reading

Page 9: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

•T is the terminal state. The true value of this state is known a priori. •In other words, the error in the approximation of the state labeled T, e(T), is 0 by definition.

•u is the action performed in state xt and causes a transition to state xt+1, •and r(xt, u) is the reinforcement received when performing action u in state xt.•Δwt is the change in the weights at time t….?

Definitions in the reading

Page 10: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

• One might use a neural network for the approximation V(xt ,wt) of V*(x), where wt is the parameter vector• A deterministic Markov decision process is one in which the state transitions are deterministic (an action performed in state xt always transitions to the same successor state xt+1 ).• Alternatively, in a nondeterministic Markov decision process, a probability distribution function defines a set of potential successor states for a given action in a given state.• α is the learning rate

Definitions in the reading

Page 11: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

• For the state/action pair (x ,u ) an advantage, A(xt, ut) is defined as the sum of the value of the state and the utility (advantage) of performing action u rather than the action currently considered best.• For optimal actions this utility is zero, meaning the value of the action is also the value of the state; • for sub-optimal actions the utility is negative, representing the degree of sub-optimality relative to the optimal action.

Definitions in the reading

Page 12: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

• K is a time unit scaling factor, • and <> represents the expected value over all possible results of performing action u in state xt to receive immediate reinforcement r and to transition to a new state xt+1.

• g is the sum of past gradients in equation (20)

Definitions in the reading

Page 13: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

Google these if you don’t understand them:•Markov chain•Markov decision process•Mean squared error•Monte Carlo rollout

More terms in the reading

Page 14: UAIG: Second Fall 2013 Meeting. Agenda  Introductory Icebreaker  How to get Involved with UAIG?  Discussion: Reinforcement Learning  Free Discussion.

^_^

Free Discussion