Benchmarks and performance measures in artificial intelligence · Benchmarks and performance...
Transcript of Benchmarks and performance measures in artificial intelligence · Benchmarks and performance...
![Page 1: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/1.jpg)
Benchmarks and performance measures inartificial intelligence
Anders JonssonArtificial Intelligence and Machine Learning Group
Universitat Pompeu Fabra
HUMAINT workshop6 March 2018
Anders Jonsson
Benchmarks in AI
![Page 2: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/2.jpg)
Sequential decision making
Perform a sequence of actions to achieve a given objectiveEach decision has an impact on future decisions!
Anders Jonsson
Benchmarks in AI
![Page 3: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/3.jpg)
Sequential decision making
Reinforcement learning:
Effect of actions initially unknown
Intervention: perform actions to test hypotheses about them
Aim: in a given state s, estimate the value Q(s, a) of anaction a and/or a policy π(·|s) for action selection
AI planning:
System has a model of the actions
Aim: compute a sequence of actions in advance
Anders Jonsson
Benchmarks in AI
![Page 4: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/4.jpg)
Evaluation criteria
Theoretical analysis:
Performance bounds: how far from optimal is an AI algorithm?
Time complexity: how fast is it?
Memory complexity: how much memory does it use?
Empirical evaluation:
How does an AI algorithm perform in practice?
Anders Jonsson
Benchmarks in AI
![Page 5: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/5.jpg)
Empirical performance measures
What do we measure?
Winning
Scoring points
?
What do we compare to?
Optimal or near-optimal
Human
Other algorithms
?
Anders Jonsson
Benchmarks in AI
![Page 6: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/6.jpg)
Problems with empirical evaluation
Strong incentive to boost performance of ones own algorithm
Practices in reinforcement learning [Henderson et al. 2017]:
Run X trials, report average of 3 best runsOmit network architecture, random seeds, hyperparametersImplement own version of other researchers’ algorithms
Claims of human-level performance
Anders Jonsson
Benchmarks in AI
![Page 7: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/7.jpg)
Benchmarks
Set of instances that are representative of problem difficulty
More unbiased comparison
Difficult to artificially boost the performance of an algorithm
More likely that results generalize
Anders Jonsson
Benchmarks in AI
![Page 8: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/8.jpg)
Benchmarks
Atari
Open AI Gym
Project Malmo
Anders Jonsson
Benchmarks in AI
![Page 9: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/9.jpg)
Competitions
International Planning Competition
General Video Game AI Competition
AIIDE StarCraft AI Competition
Anders Jonsson
Benchmarks in AI
![Page 10: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/10.jpg)
Image classification
Anders Jonsson
Benchmarks in AI
![Page 11: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/11.jpg)
Problems with benchmarks
Might not accurately reflect real-world problems
Might require large amounts of computational power
Excessive focus on winning leads to algorithms that do notreally advance the state-of-the-art (e.g. portfolio algorithms)
Anders Jonsson
Benchmarks in AI
![Page 12: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/12.jpg)
AI in the real world
Clear and relevant performance criteria
Appropriate, publicly available benchmarks
Proper statistical comparisons
Independent verification and reproduction
Anders Jonsson
Benchmarks in AI
![Page 13: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/13.jpg)
Lifelong learning
System that operates for long periods of time
Task is not fixed but changes, some tasks initially unknown
New objects and actions become available over time
Anders Jonsson
Benchmarks in AI
![Page 14: Benchmarks and performance measures in artificial intelligence · Benchmarks and performance measures in arti cial intelligence Anders Jonsson ... Image classi cation Anders Jonsson](https://reader030.fdocuments.us/reader030/viewer/2022040507/5e480eaf5d62f67771442fad/html5/thumbnails/14.jpg)
Questions
Anders Jonsson
Benchmarks in AI