A Hierarchical Bayesian Approach to Optimal Experimental ... · A Hierarchical Bayesian Approach to...

A Hierarchical Bayesian Approach to Optimal Experimental Design: A Case of Adaptive Vision Testing

Jay Myung Computational Cognition Lab

Ohio State University

Joint work with

Mark Pitt Woojae Kim Hairong Gu

Zhong-Lin Lu

Based on Psychonomic Society Meeting Presentation (Nov 18, 2016: Boston, MA)

Scientific Inquiry Requires Use of Tools

• Tools help us answer questions like “How do children learn?”

• A large chunk of science is spent on developing tools

2

• Automation is the next frontier in tool development • Less involvement of humans in the drudgery of science

Cognitive Science of Old

3

Memory drum

Oscilloscope

Reel-to-reel tape deck

Significant human intervention in preparation, collection, and analysis of data

4

Cognitive Science of Today

Traditional Experimentation

5

• Suboptimal designs • Non-adaptive (no-feedback loop)

Theory/ Model

Heuristic designs

Experiment Inferences

“Not All Experimental Designs Are Equally Informative”: Optimal Experimental Design (OED)

http://www.ml.inf.ethz.ch/research/cb_projects

6

Optimizing Experimental Design (OED) in Substantive Fields

• Statistics (Lindley, 1956; Kiefer, 1959) • Vision science (Lesmes et al, 2010) • Neuroscience (Lewi et al, 2009) • Economics (Atkinson & Donev, 1992) • Engineering (Allen et al, 2003) • Systems biology (Kreutz & Timmer, 2009) • Clinical drug trials (Wathen & Thall, 2008) • Nano-materials (Nikolaev et al, 2014) • Cognitive science (Myung & Pitt, 2009)

7

Autonomous Research System (ARES) for Carbon Nanotubes Synthesis at AFRL

(Courtesy of the Air Force Research Lab)

8

9

• Adaptively designed experiments – Run a typical behavioral experiment as a sequence of mini-

experiments/trials – Optimized the design of the next mini-experiment/trial on

the fly based on observed outcomes from the previous mini-experiments , so as to accelerate inference

Adaptive Design Optimization (ADO) (Computational Cognition Lab at Ohio State)

Myung & Pitt (2009) Psychological Review Cavagnaro, Myung, Pitt & Kujala (2010) Neural Computation Myung, Cavagnaro & Pitt (2013) Journal of Mathematical Psychology

Adaptive Design Optimization (ADO)

Autonomous Experimentation System (closed-loop) 10

ADO formulated within a Bayesian decision theoretic framework

11

Optimal Design Observed Outcome

t <- t+1

Design Optimization

Experiment

Bayesian Updating

Posterior

Prior

Traditional vs. ADO Experimentation

12

• Optimized designs • Adaptive (feedback loop)

Parametric model

Design optimization

Theory/ model

Heuristic designs

Experiment Inferences

Experiment Inferences ADO

Technical Details of ADO

Mutual information as the utility of design d:

Next Trial/mini experiment

13

14

Threshold Estimation of Psychometric Function

15

Video Demo

Current Work: Hierarchical Extension of ADO

16


Next Trial

Design Optimization

Experiment

Posterior

Prior Knowledge from other participant’s data

Bayesian Updating

ADO

• Typically, ADO starts with non-informative priors • To achieve even greater efficiency, ADO can be extended to take advantage of

data collected from other individuals in the same task • Basic idea: Why not use the other individuals’ data as an “informative” prior

for a new individual?

Hierarchical Adaptive Design Optimization (HADO) (Kim, Pitt, Lu, Steyvers & Myung, 2014 NC)

• HADO combines the advantages of ADO and hierarchical Bayes modeling (HBM) to make judicious experimental designs from the very first trial.

1. ADO: Optimize based on responses from earlier trials of the current experiment

2. HBM: Utilize responses collected from other individuals from previously run experiments

17


Next Trial

Design Optimization

Experiment

Bayesian Updating

Posterior

Prior

Posterior of (Hyper-) parameters

HADO Framework

18

ADO (Each Individual)

Next Individual

HADO: Does It Work in Practice?

19

Empirical Validation of HADO in Adaptive Vision Testing

20

Testbed of HADO: Adaptive Estimation of Contrast Sensitivity Function (CSF)

(Lesmes, Lu, Baek & Albright, 2010 JOV)

Invisible

Visible

Cont

rast

Sen

sitiv

ity

Spatial Frequency

21

CSF modeled by 4 free parameters

CSF Parameterization

22

𝜃 = (𝛿,𝛽, 𝛾, 𝑓) (AULCSF, cutSF) Reparametrized

Trial 1 Trial t

∙∙∙ Trial 1 Trial t

∙∙∙ Trial 1 Trial t

∙∙∙ ∙ ∙ ∙

0.5 1 2 4 8 161

10

100

Spatial Frequency

Con

tras

t Sen

sitiv

ity

0.5 1 2 4 8 161

10

100

Spatial Frequency

Con

tras

t Sen

sitiv

ity

0.5 1 2 4 8 161

10

100

Spatial Frequency

Con

tras

t Sen

sitiv

ityIndividual 1 Individual 2 Individual n

∙ ∙ ∙

Hierarchical Bayes Modeling of CSF Parameters

Parameter space ~ D(θ|η)

23

Hierarchical Model Updating


Next Trial

Design Optimization

Experiment

Individual Model Updating

∙ ∙ ∙

0.5 1 2 4 8 161

10

100

Spatial Frequency

Co

ntr

ast

Se

nsi

tivity

0.5 1 2 4 8 161

10

100

Spatial Frequency

Co

ntr

ast

Se

nsi

tivity

0.5 1 2 4 8 161

10

100

Spatial Frequency

Co

ntr

ast

Se

nsi

tivity

Session 1 Session 2 Session n

∙ ∙ ∙

HADO-based CSF Estimation

HBM Informative Prior

24

Next individual’s ADO

HADO Demonstration and Validation (Gu et al, 2016 JOV)

The benefits of HADO were demonstrated in a behavioral study in which CSFs were estimated with human participants: • Phase I (Baseline Experiment): Collect data with which to build

HADO-based informative priors (100 participants)

• Phase II (Two Validation Experiments): Demonstrate the superiority of HADO over ADO

25

Phase I (Baseline Experiment)

• ADO-based adaptive estimation of CSF with 100 participants • Each participant is subject to three experimental conditions:

1 Normal: bare eyes 2 ND1: weak neutral density (wearing filtered goggles) 3 ND2: strong neutral density (wearing filtered goggles)

26

Phase II (Validation Experiment #1): To determine how large must be the sample size of the HADO

prior to find a significant improvement over ADO?

27

HADO informative priors of different sample sizes (Normal condition):

Take-home: • The clear advantage of HADO over ADO is demonstrated in

both experiment and simulation. • HADO seems to work well even with priors built based on

data from a small number (5-12) of participants.

28

What happens if a wrong prior is specified?

• How well can HADO take advantage of group membership differences to improve parameter estimation of an individual from a designated group?

• How much of a cost in estimation is incurred if group membership is misspecified?

29

Phase II (Validation Experiment #2): Effects of Prior Misspecification

30

AULCSF

cutS

F

Contour of Probability Density Function

0.5 1 1.5 2 2.50.8

1

1.2

1.4

1.6

1.8

2Sample_norSample_nd1Sample_nd2Sample_mix

AULCSF

cutS

F

Contour of Probability Density Function

0 1 2 3 4 5 60

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5DiffuseNormalNd1Nd2Mixture

HADO priors from different experimental conditions:

31

• Again, an informative prior accelerates parameter estimation. • While the greatest savings are obtained with a correctly

specified prior, a mixture prior is likely the best choice in practice, given its robustness against prior misspecification.

Conclusions

• HADO provides a judicious way to exploit two complementary schemes of inference (with past and current data) to achieve even greater accuracy and efficiency than the standard ADO.

• Other current & future work in lab – HADO for cognitive neuroscience (e.g., fMRI) – HADO for delayed discounting – HADO for psychotic diagnosis (e.g., OCD) – HADO for nano-materials science (e.g., CNTs)

32

Thank You

33

A Hierarchical Bayesian Approach to Optimal Experimental ... · A Hierarchical Bayesian Approach to...

Documents

Transcript of A Hierarchical Bayesian Approach to Optimal Experimental ... · A Hierarchical Bayesian Approach to...