A Hierarchical Bayesian Approach to Optimal Experimental ... · A Hierarchical Bayesian Approach to...
Transcript of A Hierarchical Bayesian Approach to Optimal Experimental ... · A Hierarchical Bayesian Approach to...
A Hierarchical Bayesian Approach to Optimal Experimental Design: A Case of Adaptive Vision Testing
Jay Myung Computational Cognition Lab
Ohio State University
Joint work with
Mark Pitt Woojae Kim Hairong Gu
Zhong-Lin Lu
Based on Psychonomic Society Meeting Presentation (Nov 18, 2016: Boston, MA)
Scientific Inquiry Requires Use of Tools
• Tools help us answer questions like “How do children learn?”
• A large chunk of science is spent on developing tools
2
• Automation is the next frontier in tool development • Less involvement of humans in the drudgery of science
Cognitive Science of Old
3
Memory drum
Oscilloscope
Reel-to-reel tape deck
Significant human intervention in preparation, collection, and analysis of data
4
Cognitive Science of Today
Traditional Experimentation
5
• Suboptimal designs • Non-adaptive (no-feedback loop)
Theory/ Model
Heuristic designs
Experiment Inferences
“Not All Experimental Designs Are Equally Informative”: Optimal Experimental Design (OED)
http://www.ml.inf.ethz.ch/research/cb_projects
6
Optimizing Experimental Design (OED) in Substantive Fields
• Statistics (Lindley, 1956; Kiefer, 1959) • Vision science (Lesmes et al, 2010) • Neuroscience (Lewi et al, 2009) • Economics (Atkinson & Donev, 1992) • Engineering (Allen et al, 2003) • Systems biology (Kreutz & Timmer, 2009) • Clinical drug trials (Wathen & Thall, 2008) • Nano-materials (Nikolaev et al, 2014) • Cognitive science (Myung & Pitt, 2009)
7
Autonomous Research System (ARES) for Carbon Nanotubes Synthesis at AFRL
(Courtesy of the Air Force Research Lab)
8
9
• Adaptively designed experiments – Run a typical behavioral experiment as a sequence of mini-
experiments/trials – Optimized the design of the next mini-experiment/trial on
the fly based on observed outcomes from the previous mini-experiments , so as to accelerate inference
Adaptive Design Optimization (ADO) (Computational Cognition Lab at Ohio State)
Myung & Pitt (2009) Psychological Review Cavagnaro, Myung, Pitt & Kujala (2010) Neural Computation Myung, Cavagnaro & Pitt (2013) Journal of Mathematical Psychology
Adaptive Design Optimization (ADO)
Autonomous Experimentation System (closed-loop) 10
ADO formulated within a Bayesian decision theoretic framework
11
Optimal Design Observed Outcome
t <- t+1
Design Optimization
Experiment
Bayesian Updating
Posterior
Prior
Traditional vs. ADO Experimentation
12
• Optimized designs • Adaptive (feedback loop)
Parametric model
Design optimization
Theory/ model
Heuristic designs
Experiment Inferences
Experiment Inferences ADO
Technical Details of ADO
Mutual information as the utility of design d:
Next Trial/mini experiment
13
14
Threshold Estimation of Psychometric Function
15
Video Demo
Current Work: Hierarchical Extension of ADO
16
Optimal Design Observed Outcome
Next Trial
Design Optimization
Experiment
Posterior
Prior Knowledge from other participant’s data
Bayesian Updating
ADO
• Typically, ADO starts with non-informative priors • To achieve even greater efficiency, ADO can be extended to take advantage of
data collected from other individuals in the same task • Basic idea: Why not use the other individuals’ data as an “informative” prior
for a new individual?
Hierarchical Adaptive Design Optimization (HADO) (Kim, Pitt, Lu, Steyvers & Myung, 2014 NC)
• HADO combines the advantages of ADO and hierarchical Bayes modeling (HBM) to make judicious experimental designs from the very first trial.
1. ADO: Optimize based on responses from earlier trials of the current experiment
2. HBM: Utilize responses collected from other individuals from previously run experiments
17
Optimal Design Observed Outcome
Next Trial
Design Optimization
Experiment
Bayesian Updating
Posterior
Prior
Posterior of (Hyper-) parameters
HADO Framework
18
ADO (Each Individual)
Next Individual
HADO: Does It Work in Practice?
19
Empirical Validation of HADO in Adaptive Vision Testing
20
Testbed of HADO: Adaptive Estimation of Contrast Sensitivity Function (CSF)
(Lesmes, Lu, Baek & Albright, 2010 JOV)
Invisible
Visible
Cont
rast
Sen
sitiv
ity
Spatial Frequency
21
CSF modeled by 4 free parameters
CSF Parameterization
22
𝜃 = (𝛿,𝛽, 𝛾, 𝑓) (AULCSF, cutSF) Reparametrized
Trial 1 Trial t
∙∙∙ Trial 1 Trial t
∙∙∙ Trial 1 Trial t
∙∙∙ ∙ ∙ ∙
0.5 1 2 4 8 161
10
100
Spatial Frequency
Con
tras
t Sen
sitiv
ity
0.5 1 2 4 8 161
10
100
Spatial Frequency
Con
tras
t Sen
sitiv
ity
0.5 1 2 4 8 161
10
100
Spatial Frequency
Con
tras
t Sen
sitiv
ityIndividual 1 Individual 2 Individual n
∙ ∙ ∙
Hierarchical Bayes Modeling of CSF Parameters
Parameter space ~ D(θ|η)
23
Hierarchical Model Updating
Optimal Design Observed Outcome
Next Trial
Design Optimization
Experiment
Individual Model Updating
∙ ∙ ∙
0.5 1 2 4 8 161
10
100
Spatial Frequency
Co
ntr
ast
Se
nsi
tivity
0.5 1 2 4 8 161
10
100
Spatial Frequency
Co
ntr
ast
Se
nsi
tivity
0.5 1 2 4 8 161
10
100
Spatial Frequency
Co
ntr
ast
Se
nsi
tivity
Session 1 Session 2 Session n
∙ ∙ ∙
HADO-based CSF Estimation
HBM Informative Prior
24
Next individual’s ADO
HADO Demonstration and Validation (Gu et al, 2016 JOV)
The benefits of HADO were demonstrated in a behavioral study in which CSFs were estimated with human participants: • Phase I (Baseline Experiment): Collect data with which to build
HADO-based informative priors (100 participants)
• Phase II (Two Validation Experiments): Demonstrate the superiority of HADO over ADO
25
Phase I (Baseline Experiment)
• ADO-based adaptive estimation of CSF with 100 participants • Each participant is subject to three experimental conditions:
1 Normal: bare eyes 2 ND1: weak neutral density (wearing filtered goggles) 3 ND2: strong neutral density (wearing filtered goggles)
26
Phase II (Validation Experiment #1): To determine how large must be the sample size of the HADO
prior to find a significant improvement over ADO?
27
HADO informative priors of different sample sizes (Normal condition):
Take-home: • The clear advantage of HADO over ADO is demonstrated in
both experiment and simulation. • HADO seems to work well even with priors built based on
data from a small number (5-12) of participants.
28
What happens if a wrong prior is specified?
• How well can HADO take advantage of group membership differences to improve parameter estimation of an individual from a designated group?
• How much of a cost in estimation is incurred if group membership is misspecified?
29
Phase II (Validation Experiment #2): Effects of Prior Misspecification
30
AULCSF
cutS
F
Contour of Probability Density Function
0.5 1 1.5 2 2.50.8
1
1.2
1.4
1.6
1.8
2Sample_norSample_nd1Sample_nd2Sample_mix
AULCSF
cutS
F
Contour of Probability Density Function
0 1 2 3 4 5 60
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5DiffuseNormalNd1Nd2Mixture
HADO priors from different experimental conditions:
31
• Again, an informative prior accelerates parameter estimation. • While the greatest savings are obtained with a correctly
specified prior, a mixture prior is likely the best choice in practice, given its robustness against prior misspecification.
Conclusions
• HADO provides a judicious way to exploit two complementary schemes of inference (with past and current data) to achieve even greater accuracy and efficiency than the standard ADO.
• Other current & future work in lab – HADO for cognitive neuroscience (e.g., fMRI) – HADO for delayed discounting – HADO for psychotic diagnosis (e.g., OCD) – HADO for nano-materials science (e.g., CNTs)
32
Thank You
33
34
35
36