Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell &...
-
date post
21-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell &...
![Page 1: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/1.jpg)
Reinforcement Learning in Real-
Time Strategy Games
Nick Imrei
Supervisors: Matthew Mitchell & Martin Dick
![Page 2: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/2.jpg)
Outline
Reasons What this research is about Motivation and Aim
Background RTS games Reinforcement Learning explained Applying RL to RTS
This project Methodology Evaluation Summary
![Page 3: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/3.jpg)
Motivation and Aims
Problem: AI has been a neglected area – game developers
have adopted the “not broken so why fix it” philosophy
Internet Thrashing – my own experience Aim:
Use learning to develop a human-like player Simulate beginner → intermediate level play Use RL and A-life-like techniques
E.g. Black and White, Pengi [Scott]
![Page 4: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/4.jpg)
RTS Games – The Domain
Two or more teams of individuals/cohorts in a war-like situation on a series of battlefields E.g. Command & Conquer, Starcraft, Age of Empires, Red Alert,
Empire Earth Teams can have a variety of:
Weapons Units Resources Buildings
Players required to manage all of the above to achieve the end goal.(Destroy all units, capture flag, etc.)
![Page 5: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/5.jpg)
Challenges offered in RTS games Real time constraints on actions High level strategies combined with low-
level tactics Multiple goals and choices
![Page 6: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/6.jpg)
The Aim and Approach
Create a human-like opponentRealisticDiverse behavior (not boring)This is difficult to do!
Tactics and Strategy Agents will be reactive to environmentLearn rather than code – Reinforcement
learning
![Page 7: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/7.jpg)
The Approach Part 1 – Reinforcement Learning Reward and Penalty
Action Rewards / Penalties Penalize being shot Reward killing a player on the other team
Strategic Rewards / Penalties Securing / occupying a certain area Staying in certain group formations Destroying all enemy units
Aim to receive maximum reward over time Problem: Credit assignment
What rewards should be given to which behaviors?
![Page 8: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/8.jpg)
The Approach Part 2 – Credit Assignment States and actions Decide on a state space and action space Assign values to
States, or States and Actions
Train the agent in this space
![Page 9: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/9.jpg)
Reinforcement Learning example
![Page 10: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/10.jpg)
Reinforcement Learning example
![Page 11: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/11.jpg)
Why use Reinforcement Learning? Well suited to problems where there is a
delayed reward (tactics and strategy) The trained agent moves in (worst case)
linear time (reactive) Problems:
Large state spaces (state aggregation)Long training times (ER and shaping)
![Page 12: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/12.jpg)
The Approach Part 3 – Getting Diversity
Agent
Agent state space
A-life-like behavior using aggregated state spaces
![Page 13: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/13.jpg)
Research Summary:
Investigate this approach using a simple RTS game Issues:
Empirical Research Applying RL in a novel way Not using entire state space Need to investigate
Appropriate reward functions Appropriate state spaces
Problems with Training Will need lots of trials - the propagation problem No. trials can be reduced using Shaping [Mahadevan] and
Experience Replay [Lin] Self play – other possibilities include A* and human opponents
Tesauro, Samuel
![Page 14: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/14.jpg)
Methodology
Hypothesis: “The combination of RL and reduced state spaces in
a rich (RTS) environment will lead to human-like gameplay”
Empirical investigation to test hypothesis Evaluate system behavior
Analyze the observed results Describe interesting phenomenon
![Page 15: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/15.jpg)
Evaluation
Measure the diversity of strategies How big a change (and what type) is required to
change the behaviour – a qualitative analysis of this Success of strategies
I.e. what level of gameplay does it achieve Time to win, points scored, resembles humans
Compare to human strategies “10 requirements of a challenging and realistic
opponent” [Scott]
![Page 16: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/16.jpg)
Summary thus far…
Interested in a human-level game program Want to avoid brittle, predictable programmed
solutions Search program space for most diverse
solutions using RL to direct search Allows specifications of results, without needing to
specify how this can be achieved
Evaluate the results
![Page 17: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/17.jpg)
The Game – Maps and Terrain
2 Armies of equal amount on an n*n map.
Terrain: Grass, Trees, Boundary
Squares and Swamp All units can move on these
squares, however Different terrain types
affect a soldier’s attributes each in a different way
![Page 18: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/18.jpg)
The Game – Soldiers
Soldier Attributes include: Sight Range Weapon Range Fatigue Speed Health Direction Relation Lines
![Page 19: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/19.jpg)
Experiments Part 1: Hand-coded Strategies Create 8 different Hand-coded Strategies
Incl. Horde, Disperse, Central Defense, etc. Test their effectiveness based on:
Time taken to winTime taken to eliminate an enemy once
spottedDamage sustained when victorious
![Page 20: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/20.jpg)
Results of Experiments Part 1
Units deployed closer resulted in quicker games. No strategy was consistently successful against
all others. The 3 most successful were:
Occupy Horde Central Defense
Strategies meant nothing once army sizes were > 150 on a 80*80 map.
![Page 21: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/21.jpg)
Experiments Part 2: Control Architectures Centralized
All units are controlled by one entityOnly do what is commanded (no auto-
behavior)View area = Central controllers ViewscreenGroup FormationUnit SelectionUnit Commanding
![Page 22: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/22.jpg)
Experiments Part 2: Control Architectures Localized
Units are independently maneuvered controlled ala Artificial-life
Viewing space is only what they see individually
Formation ; CohortsUnit Selection & Movement done via an
A-life State Machine
![Page 23: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/23.jpg)
Experiments Part 2: Control Architectures Testing:
Given the best 3 techniques from part1,Program them in a centralized and localized
mannerBase their effectiveness on criteria from part 1Observe the realism of the 6 new hand-coded
strategies
![Page 24: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/24.jpg)
Results of Experiments Part 2
As individual unit sight and weapon range increased, localized performed better.
A-life performed better on rougher terrain, whereas centralized often got stuck.
Centralized formation takes less time, hence it did better in situations where the
ArmySize : MapSize ratio increased.
![Page 25: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/25.jpg)
Results of Experiments Part 2
Realism Evaluation:Localized resembles more a group of soldiersCentralized better resembles human
gameplay. Given its success a local framework is
used as a template for the learning agent
![Page 26: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/26.jpg)
Learning Agents - Architecture
Each agent will work off the same learning table.
Is expected to speed up learning – by learning from everyone’s mistakes rather than just your own
Agents are trained against all opponents from parts 1 and 2
![Page 27: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/27.jpg)
Learning Agents – Representing the world States
Divide sight range up into sectionsEach section can have a combination of an
ally, a health spot, an enemy or none.(On or off a health spot) 288 possible world
states. Actions
Move & Shoot (left, forward, right, back)
![Page 28: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/28.jpg)
Learning Agents – Representing the world Rewards:
Positive: Shooting an enemy Moving to a health spot
Negative: Being shot / killed Being on a health spot when health is full
Reinforcement: Q(s,a) = R(s,a) + γ Σs’P(s’|s,a)Q(s,a,s’)
![Page 29: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/29.jpg)
Results of Experiments Part 3
Learning of behaviors was achieved in only a few simulations
Agents developed the following behaviors: Shoot when seen unless health is low If Health is low, move to health spot Units form a health-spot queue Diversion of a centralized opponents attention
Learning agents were consistently successful against all others bar centralized hording.
Agents told what to do – not how to do it Human testing didn’t prove too successful!
![Page 30: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/30.jpg)
Conclusions
A localized approach was found to be more successful overall than a centralized one.
Given the sans base and resource element of the game, the all out aggressive strategies faired the best
Learning strategies were successful against most programmed ones
Diversion and health spot sharing behaviors were observed
![Page 31: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/31.jpg)
Future Work
Extending the RTS game so it has:Resources and resource gatheringDifferent Unit typesBase building and maintenance
Testing the RL/A-life framework in other game genres including Role Playing Games, Sim games and Sports.
![Page 32: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/32.jpg)
References
Bob Scott. The illusion of intelligence. AI Game Programming Wisdom, pages 16–20, 2002.
Sridhar Mahadevan and Jonathan Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence 55, pages 311–364, 1992.
L Lin. Reinforcement learning for robots using neural networks. PhD thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh USA, 1993.
Mark Bishop Ring. Continual Learning in Reinforcement Environments. MIT Press, 1994.
![Page 33: Reinforcement Learning in Real- Time Strategy Games Nick Imrei Supervisors: Matthew Mitchell & Martin Dick.](https://reader031.fdocuments.us/reader031/viewer/2022032522/56649d635503460f94a4624d/html5/thumbnails/33.jpg)
Stay Tuned!
For more information, seehttp://www.csse.monash.edu.au/~ngi/
Thanks for listening!