ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose...

21
Wismut Labs OpenAI + DotA 2

Transcript of ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose...

Page 1: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

OpenAI + DotA 2

Page 2: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

DotA (Defense of the Ancients) 2 Gameplay

Page 3: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

DotA and StarCraft II Challenges [1, 2, 3]

• → →

Page 4: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

OpenAI DotA Progress Summary [1]

Page 5: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

OpenAI DotA Approach [1]

γ

Page 6: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

OpenAI DotA ‘Cheats’ [1, 3]

• →

• →

Page 7: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

OpenAI Five Network Architecture [1, 6]

Page 8: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

OpenAI Five Network Architecture [1, 6]

Page 9: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

OpenAI Five Model Structure [1]

Page 10: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

OpenAI Five Exploration [1]

• →

• →

Page 11: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

OpenAI Rapid [1]… a general-purpose RL training system

Page 12: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

Proximal Policy Optimization [1, 4, 5]

• →

• →

Page 13: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

Proximal Policy Optimization [1, 4, 5]

• →

Page 14: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

Proximal Policy Optimization [1, 4, 5]

• →

Page 15: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

Proximal Policy Optimization [1, 4, 5]

• →

Page 16: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

Proximal Policy Optimization [1, 4, 5]

• →

Page 17: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

Proximal Policy Optimization [1, 4, 5]

Page 18: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

Proximal Policy Optimization [1, 4, 5]

Page 19: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

Transfer Learning for RL [1]

Page 20: ut s OpenAI + DotA 2kanmy/courses/6101_1820/s13.pdf · ut s OpenAI Rapid[1] … a general-purpose RL training system . ut s Proximal Policy Optimization[1, 4, 5]

Wism

ut

Labs

Open Challenges & Moving Forward