VIRTEX Session 11.30-12 - KU Leuven · 2014-12-12 · VIRTEX Session 11.30-12.50 Introduction to...
Transcript of VIRTEX Session 11.30-12 - KU Leuven · 2014-12-12 · VIRTEX Session 11.30-12.50 Introduction to...
VIRTEX Session 11.30-12.50
Introduction to Virtex applets Kenneth Portier
Demonstration of Virtex applets Kenneth Portier
Eddie Schrevens
Luc Duchateau
Acknowledgements
• Funding and programming support for the development
of the VIRTEX applets.
– Flemish Ministry of Education grant
– KU Leuven grant
– Steve Dufresne
– Bart Jacobs
– Many others…..
Introduction to VIRTEX Applets
K. Portier, American Cancer Society
Leuven Statistics Days 2014
“Design of experiments and computer simulations in
statistical education”
in memory of the late Professor Paul Darius
KU Leuven, December 4-
Outline
• Focus of the VIRTual EXperimentation
(VIRTEX) applets (aka – ENV2EXP)
• Key operations to be learned
• General applet approach & teaching
aspects
• Brief discussion of the currently available
applets.
• Transition to demonstrations of 3 applets.
VIRTEX Focus
• Provide students the experience of designing experiments. Using virtual experiments that mimic real situations.
• Student-designed experiments generate data that is transfered to a standard statistical package for analysis. Using simulation to support increased understanding
of experimental design concepts.
Using standard statistical packages for analysis.
• Develop critical thinking skills. Report writing & expressing results in words and
statistics.
Key Operations
1. Select experimental units from available
Random sampling
Representative sampling
Uniform (controlled variance) sampling
2. Add a blocking and/or treatment factor
3. Delete a blocking and/or treatment factor
4. Rename a blocking and/or treatment factor level
5. Randomize/link treatments to units
6. Replicate treatments
7. Fractionate factorial treatments
Confound treatment effects
8. Split units
9. Group units
Model fitting. • Parameter estimation
• Effect estimation
• Formal tests
Graphics.
Computing.
Reporting.
General Applet Approach
• Must mimic a real situation of interest to the user.
• Must engage the user in collecting data to answer a
research question.
• The situation simulated must have a large enough set
of factors & responses to support interesting
experiments and interesting discussion.
• The virtual experiment must generate data that comes
from a realistic model of a process that has stochastic
(probabilistic) components.
• The data must be communicated to the user in a way
that facilitates input into a standard statistical analysis
computer package (SAS, SPSS, R-Stat, MiniTab, etc.)
Teaching Aspects
• Applet can be used for:
– In-class demonstration.
– Personal exercises.
– Testing comprehension.
• Applet should:
– Demonstrate a broad range of concepts.
– Support multiple research hypotheses.
• Data generated should support:
– Estimation/testing of continuous response effects .
– Estimation/testing of dichotomous (binary, multinomial) responses
– Response surface modeling – sequential experimentation for
process optimization.
– Fixed, random and mixed model effect estimation.
• Bottles – Quality control sampling.
• Train Track – environmental/spatial sampling.
• Tomato – Product quality/characteristic
sampling.
Applets not demonstrated
Sampling Concepts: Bottles
• Bottling line at a producer of carbonated drinks.
• Sampling for product uniformity, quality control.
• When to draw a sample from a process?
• How to incorporate what you know about a process to
inform your sampling scheme?
Learning Goal
Bottles
Applet
Bottles coming off
production line
Random selection.
Sequential pattern.
Environmental Sampling: Train Track
• Aerial photograph: A section of train track.
• Scenario: Suspected diesel spill.
• What is the average level of diesel in the top 15cm of soil.
• Are there hot spots?
• Experience spatial/environmental sampling.
• Sampling to estimate mean concentration.
• Sampling to find hot spots.
Learning Goal
A flexible environment for creating other sampling scenarios.
Train Track Applet
Simple
multiple use
applet
prototype.
Product Sampling: Tomato Applet
Scenario and Challenge
Sampling
thickness of
tomato skin
Practice sampling
Basic descriptive statistics
Descriptive graphics
Report writing
Computing practice
• Greenhouse – Experimentation on tomatoes in a
traditional greenhouse.
• Factory – Experimentation via a pilot plant of a factory
process with further experimentation in the full plant.
• Mastitis – Farm and animal within farm sampling in a
trial of mastitis vaccine.
Applets Demonstrated
Web-based software environments for
virtual experimentation
Demonstration of Virtex applets
The ‘Greenhouse’ applet
Practicing basic experimental designs.
One to three blocking factors
One treatment with user-assigned levels
Greenhouse Applet
Demonstrates
o Treatment factor (nitrogen level).
o Two control factors (light and heat).
o One covariate (initial plant weight)
o All classical row-col blocking designs
o Native variability in experimental units.
o Need for blocking. Restrictions on randomization.
o Blocking vs Covariate.
o Adequate replication & randomization.
o Decision making in experimental design.
Learning Goals
Generating data
• Observational unit – tomato seedling (144 available in 12 trays of 12)
• Initial weight of seedling known. – Available for blocking or covariate.
• Complex (TOMGROW) growth model
capable of producing multiple responses. – Total plant dry weight, leaf DW, root mass DW, etc.
– Tomato fruit yield, by quality classes.
– Leaf area.
• Responses a function of growing period, light and
heating sources.
Applet Utility
What specific concepts could this applet be used
to demonstrate?
1. Basic variability of the experimental unit.
2. Basic treatment comparisons. • How many treatment levels? Pairwise comparisons or dose
response?
• How much randomization?
3. Controlling for one extraneous source of variation? • Blocking or covariate adjustment based on initial weight
4. Controlling for two sources of variation • Blocking on (light & IDW) or (heat &IDW) or (light & heat) (Latin
Squares)
5. Controlling for three sources of variation • Blocking on light & heat & IDW (Latin Squares)
Demonstration
1. Understanding variability in the
response of the experimental
unit.
2. Completely randomized design.
3. Randomized block design.
Entry Screen
Ability to
group
units,
create
blocks.
12 trays of
12 tomato
seedlings
Create
treatments,
assign
doses/levels.
Establish growing period.
Harvest at the end of the growing period (for now plant dry mass)
Move plants
from input
trays to
greenhouse
floor.
Variability of Experimental Units
Move seedlings
from the input flats
to the floor of the
greenhouse.
Set the growing
period.
Grow then View
Output
View Output
Location on
floor of
greenhouse
Grouping
Factor
Treatment
Factor
Pre and Post weight
All one treatment allows estimation of mean and variance of response.
Completely Randomized Design
Output
Randomized Block Design
Output
Web-based software environments for
virtual experimentation
Demonstration of Virtex applets
The ‘Factory’ applet
Experimental optimization
Response surface methodologies
Response Surfaces: Factory Applet
• Real time simulation of a pilot plant.
• Raw materials storage for 10 runs.
• Each raw material batch slightly different
• Set temperature, time, concentration of plant run.
• Runs take time.
• Blocking (raw material batches).
• Fractionation of factors (temperature, time, concentration).
• Experimentation to establish optimum (response surface).
Learning Goal
Pilot Plant Experiments inform Production Plant
When should I change settings to improve production.
Response surface methodologies
Definition
Fitting relationships between explanatory/
independent variables and response/dependent
variables with ‘simple’ models (polynomials), based
on designed experiments (Box & Wilson, 1951).
Objective
Experimental optimization to obtain the optimal
operating conditions of processes.
Approaches
Experimental optimization
I. Sub-optimal operating conditions are known • A first order design and model leads to consecutive small
experiments: ‘method of steepest ascent’.
• A second order design and model needs a small number of
large experiments.
II. No or not enough a priori knowledge about the
process • In the first steps screening experiments are used to separate
important components from less important ones over a
relatively large experimental region, thus involving relatively
large ranges for each factor. Mostly first order models.
• Once the key factors are determined, the approach sub I can
be followed to determine the region of interest.
First order models in RSM
First order models with or without interaction
terms, are appropriate in three situations:
1. Screening experiments to select the important factors
out of set of possible factors of influence.
2. Experiments in so narrow ranges that the expected
effect on the response variable can be assumed to be
linear. This approach is especially suitable in the
‘Method of Steepest Ascent’.
3. When the real model is known to be first order linear.
Designs for first order models
For k experimental factors these models need at
least experiments with 2 levels per factor and k+1
treatments.
For screening experiments with large amounts of
components the Plackett-Burman- and fractional
factorial designs are most appropriate.
With or without possibilities for estimating 1st order
interactions
12 treatments Plackett-Burman designs
Experimental factors
Treat
ment
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
1
+
+
-
+
+
+
-
-
-
+
-
2
+
-
+
+
+
-
-
-
+
-
+
3
-
+
+
+
-
-
-
+
-
+
+
4
+
+
+
-
-
-
+
-
+
+
-
5
+
+
-
-
-
+
-
+
+
-
+
6
+
-
-
-
+
-
+
+
-
+
+
7
-
-
-
+
-
+
+
-
+
+
+
8
-
-
+
-
+
+
-
+
+
+
-
9
-
+
-
+
+
-
+
+
+
-
-
10
+
-
+
+
-
+
+
+
-
-
-
11
-
+
+
-
+
+
+
-
-
-
+
12
-
-
-
-
-
-
-
-
-
-
-
- low level (-1)
+ high level (+1)
Fractional factorial designs
• Also these designs are fractions of 2k full factorial designs, but only a limited amount of fractions is allowed because these designs emphasise ‘balance’ in estimating factor effects. In other words the estimation of each specific main effect should consist of geometrically balanced differences of measured responses on high and low factor levels.
• Of course the reduction in the number of treatments results in the impossibility of estimating high order interactions, depending on the fraction chosen .
½ fraction
of a
23 factorial design
23 factorial design
23 factorial
Treatment
X1
X2
X3
X4
X5
X6
X7
X8
1
-1
-1
-1
+1
+1
+1
-1
+1
2
+1
-1
-1
-1
-1
+1
+1
+1
3
-1
+1
-1
-1
+1
-1
+1
+1
4
+1
+1
-1
+1
-1
-1
-1
+1
5
-1
-1
+1
+1
-1
-1
+1
+1
6
+1
-1
+1
-1
+1
-1
-1
+1
7
-1
+1
+1
-1
-1
+1
-1
+1
8
+1
+1
+1
+1
+1
+1
+1
+1
9
+1
+1
+1
-1
-1
-1
+1
-1
10
-1
+1
+1
+1
+1
-1
-1
-1
11
+1
-1
+1
+1
-1
+1
-1
-1
12
-1
-1
+1
-1
+1
+1
+1
-1
13
+1
+1
-1
-1
+1
+1
-1
-1
14
-1
+1
-1
+1
+1
+1
+1
-1
15
+1
-1
-1
+1
+1
-1
+1
-1
16
-1
-1
-1
-1
-1
-1
-1
-1
16 treatments fractional factorial screening designs to
fit a first order model in 6, 7 and 8 factors
Designs for first order RSM, without interaction
Number of factors
Possible design
<4
Full factorial
4
½ fraction of 24: 8 treatments
5
1/4 fraction of 25: 8 treatments
6
1/4 fraction of 26: 16 treatments
7
1/8 fraction of 27: 16 treatments
8
1/16 fraction of 28: 16 treatments
Number of exp factors Possible design
4 < 7 12 treatments (frac fact is
better)
7 – 15 20 treatments
13 – 23 28 treatments
Plackett-
Burman
Fractional factorials of 2k
Designs for first order RSM, with interaction
Reflected
Plackett-
Burman
Larger fraction
Fractional factorials
of 2k
Number of factors
Possible design
<5
Full factorial
5
1/2 fraction of 25: 16
treatments
6
1/2 fraction of 26: 32
treatments
7
1/2 fraction of 27: 64
treatments
8
1/4 fraction of 28: 64
treatments
Number of factors
Possible design
<7
24 treatments, fractional
factorial is better
7-15
40 treatment
13-23
56 treatment
Designs for second order models
• The Box-Behnken and Central Composite designs
are mostly used. Typically, these designs are
appropriate for second order models in two to eight
factors.
• If more than eight factors are involved also these
designs become unpractically large. In this case a
preliminary screening experiment (first order
model) is considered to select (reduce) the number
of experimental factors.
3-factor Box-Behnken designs
12 centroids on the
perimeter
Centre point 3 times
replicated
15 treatments
Box-Behnken designs are subsets of 3k factorial designs
Treatment
x1
x2
x3
1
+1
+1
0
2
+1
-1
0
3
-1
+1
0
4
-1
-1
0
5
+1
0
+1
6
+1
0
-1
7
-1
0
+1
8
-1
0
-1
9
0
+1
+1
10
0
+1
-1
11
0
-1
+1
12
0
-1
-1
13
0
0
0
14
0
0
0
15
0
0
0
3-factor Box-Behnken designs
Box-Behnken designs are practical for 3 to 7
experimental factors
Box-Behnken designs
Number of factors Number of
centroids
Replication of
the centre point
Total number of
treatments
3 12 3 15
4 24 3 27
5 40 6 46
6 48 6 54
7 56 6 62
3-factor Central Composite designs
Star point
Factorial point
6 Centre point reps
20
treatments
These designs consist of a 2k full factorial or a fractional
factorial, augmented with 2k star points and nc centre points.
Treatment
X1
X2
X3
1
1.00
1.00
-1.00
2
-1.00
1.00
-1.00
3
1.00
-1.00
-1.00
4
-1.00
-1.00
-1.00
5
1.00
1.00
1.00
6
-1.00
1.00
1.00
7
1.00
-1.00
1.00
8
-1.00
-1.00
1.00
9
1.68 ()
0.00
0.00
10
-1.68 (-)
0.00
0.00
11
0.00
1.68 ()
0.00
12
0.00
-1.68 (-)
0.00
13
0.00
0.00
1.68 ()
14
0.00
0.00
-1.68 (-)
15
0.00
0.00
0.00
16
0.00
0.00
0.00
17
0.00
0.00
0.00
18
0.00
0.00
0.00
19
0.00
0.00
0.00
20
0.00
0.00
0.00
3-factor Central Composite design
23 full
factorial
centre points
star points
Central Composite designs
Number of
factors
Full or
fractional
factorial
Value
for
Number of
factorial
treatments
Number
of star
points
Number of
replicated
centre
points
Total number
of treatments
3
full
1.68
8
6
6
20
4
full
2
16
8
6
30
5
full
2.38
32
10
8
50
5 (1/2)
½
replication
2
16
10
8
34
6
full
2.83
64
12
10
86
6 (1/2)
½
replication
2.38
32
12
10
54
7
full
3.63
128
14
10
152
7 (1/2)
½
replication
2.83
64
14
10
88
The ‘Factory’ virtual experimentation
environment
Pilot
plant
Production plant
Description of the Factory applet
• The user has to experiment with a pilot plant to find
optimized settings for the parameters of an industrial
production process.
• The experiment runs in real time (39 weeks = 120
minutes).
• The raw material for the pilot plant is stored in a tank,
which can contain enough material for 10 trials. When
the tank is empty (or upon request of the user) it will be
refilled, but the new raw material may have slightly
different characteristics.
• The temperature, reaction time and concentration are
the controlling factors of the yield of the process.
Description of the Factory applet
• Each experiment takes time!
• It takes 6 weeks to implement new optimized settings in
the production plant.
• The Profit window gives an overview of the current
situation of costs and benefits.
• At the end of the 39 weeks, the balance should be
positive and as large as possible.
The ‘Factory’ virtual experimentation
environment
Underlying model
Response surface model with added noise
History
The Factory applet was originally developed for and in
cooperation with Unilever
The learning experience
• All types of designs in Response Surface Methodology
can be tried out
• All types of one-at-the-time experimentation can be
implemented
• Lucky shot strategies can be compared with systematic
designed approaches
• Additional problems, for which textbooks offer no
immediate solution
– When are the pilot plant results sufficiently convincing to justify the
cost for changing the production plant settings?
– Should one perform one large experiment and then decide, or do
many very small experiments with a possibility to make an
adjustment each time, or do something in between?
‘One-factor-at-a-time’ approach
Factorial approach
Method of steepest ascent
Sub-optimal
operating
conditions
Direction of
steepest
ascent
22 factorial
design
Method of steepest ascent
The learning experience
• Choice of factors : obvious, but tank=block ?
• Choice of levels: consequences
• Relation pilot plant results - factory results
• Confidence in a result: when to change factory settings?
• Sequential experimentation vs one-shot (or anything in
between)
• Cost of an experiment: power vs precision
• Need for randomization (hidden time trend)
• Time pressure
Demonstration
Web-based software environments for
virtual experimentation
Demonstration of Virtex applets
The ‘Mastitis’ applet
Animal Experimentation.
Random effects – Within and Between
Sampling Variation
Experimental Design: Vaccine Trial
• Simulation of the effect of a vaccine for E. coli mastitis – (udder infection in dairy cows).
• Raw material: All cows from multiple farms – (select which for treatment).
• Dose for each treated cow.
• Animal and Farm factors have impact on vaccine effect.
• Demonstrate effects of proper/improper randomization.
• Two sources of variability (animal and farm).
• Treatment (vaccine dose).
• Goal: estimate variance components, reduce error
variance.
Learning Goal
Vaccine Applet
Farms (10)
Cows(Farms)
No Vaccine
Vaccine
Low dose
High dose
Randomization
Zebras, zebus, and holsteins
Effect of infusion dose E. coli
on milk reduction at 48 hours
Increase of Escherichia coli Inoculum Doses Induces Faster
Innate Immune Response in Primiparous Cows (2004).
F. Vangroenweghe, P. Rainard, M. Paape, L. Duchateau, C. Burvenich.
JDS 84, 4132-4144.
Effects of infusion dose and vaccine
Experimental units and randomisation
Data Generation
• Observational unit = cow
• Initial milk from real dataset on 10 farms, 𝑠𝑖(𝑗)
• Model for milk reduction 48 h after infusion
𝑟𝑖(𝑗) = 𝜇 + 𝑓𝑗 + 𝜋𝑖(𝑗) + 𝛾1𝑖(𝑗) + 𝛾2𝑖(𝑗) + 𝛿𝑖(𝑗) + 𝑐𝑖(𝑗)
• Milk production 48 h after infusion
𝑚𝑖(𝑗) = 𝑠𝑖(𝑗) 1 − 𝑟𝑖(𝑗)
0.6 -0.3 -0.15 -0.3 0
~N(0,0.006) ~N(0,0.008)
Output/Response
• Feedback on the randomisation and of the
design proposed/used.
• Dataset on which an analysis can be
performed.
Examples
1. Bad: randomisation at farm level with 2 farms
2. Acceptable: 1 farm randomizing at cow level
3. Optimal: randomised complete block design
4. More complex: split plot design
Demonstration
Student Experiences at KUL
Last two years students evaluated the
greenhouse applet as a practical exercise tool
for a basic design of experiments course
Six questions were asked to 128 students
Magnitude estimation
0 100
I agree I do not agree
Experiences at KUL
Question 1
With this applet I learned things of which I didn’t realize the
importance after the classroom teaching 0
20
40
60
80
10
0
Question 1Agree
Do Not Agree
Experiences at KUL
Question 2
Obtaining the data with the applet was more difficult than
analysing it with the stats soft 0
20
40
60
80
10
01
20
Question 2
Experiences at KUL
Question 3
The applet shows a reasonable image of a real situation as
it occurs in practice 0
20
40
60
80
Question 3
Experiences at KUL
Question 4
The applet is sufficiently user friendly: it was not very
difficult to specify the experiments precisely as I wanted 0
20
40
60
80
10
0
Question 4
Experiences at KUL
Question 5
The practical exercise required too much of my time,
compared to the things I learned from them 0
20
40
60
80
10
0
Question 5
Experiences at KUL
Question 6
The experimental situation shown in the applet was
needlessly complex 0
20
40
60
80
10
01
20
Question 6
Conclusions
• Realistic/useful applets can be created.
• Applets can engage students in the learning of key
experimental design concepts beyond textbook
exercises.
• Need for easier-to-maintain programming
environments (JAVA engine upgrades).
• Need for richer data generating models/simulations
(full TOMGROW).
• Need to carefully integrate applet-based exercises
with traditional teaching methodologies.
• Need for an experimental design textbook that takes
the applet approach as its backbone.
VIRTEX index page
Available soon from:
http://www.biw.kuleuven.be/dtp/TQM/software/software.htm
Thank You