VIRTEX Session 11.30-12 - KU Leuven · 2014-12-12 · VIRTEX Session 11.30-12.50 Introduction to...

VIRTEX Session 11.30-12.50

Introduction to Virtex applets Kenneth Portier

Demonstration of Virtex applets Kenneth Portier

Eddie Schrevens

Luc Duchateau

Acknowledgements

• Funding and programming support for the development

of the VIRTEX applets.

– Flemish Ministry of Education grant

– KU Leuven grant

– Steve Dufresne

– Bart Jacobs

– Many others…..

Introduction to VIRTEX Applets

K. Portier, American Cancer Society

Leuven Statistics Days 2014

“Design of experiments and computer simulations in

statistical education”

in memory of the late Professor Paul Darius

KU Leuven, December 4-

Outline

• Focus of the VIRTual EXperimentation

(VIRTEX) applets (aka – ENV2EXP)

• Key operations to be learned

• General applet approach & teaching

aspects

• Brief discussion of the currently available

applets.

• Transition to demonstrations of 3 applets.

VIRTEX Focus

• Provide students the experience of designing experiments. Using virtual experiments that mimic real situations.

• Student-designed experiments generate data that is transfered to a standard statistical package for analysis. Using simulation to support increased understanding

of experimental design concepts.

Using standard statistical packages for analysis.

• Develop critical thinking skills. Report writing & expressing results in words and

statistics.

Key Operations

1. Select experimental units from available

Random sampling

Representative sampling

Uniform (controlled variance) sampling

2. Add a blocking and/or treatment factor

3. Delete a blocking and/or treatment factor

4. Rename a blocking and/or treatment factor level

5. Randomize/link treatments to units

6. Replicate treatments

7. Fractionate factorial treatments

Confound treatment effects

8. Split units

9. Group units

Model fitting. • Parameter estimation

• Effect estimation

• Formal tests

Graphics.

Computing.

Reporting.

General Applet Approach

• Must mimic a real situation of interest to the user.

• Must engage the user in collecting data to answer a

research question.

• The situation simulated must have a large enough set

of factors & responses to support interesting

experiments and interesting discussion.

• The virtual experiment must generate data that comes

from a realistic model of a process that has stochastic

(probabilistic) components.

• The data must be communicated to the user in a way

that facilitates input into a standard statistical analysis

computer package (SAS, SPSS, R-Stat, MiniTab, etc.)

Teaching Aspects

• Applet can be used for:

– In-class demonstration.

– Personal exercises.

– Testing comprehension.

• Applet should:

– Demonstrate a broad range of concepts.

– Support multiple research hypotheses.

• Data generated should support:

– Estimation/testing of continuous response effects .

– Estimation/testing of dichotomous (binary, multinomial) responses

– Response surface modeling – sequential experimentation for

process optimization.

– Fixed, random and mixed model effect estimation.

• Bottles – Quality control sampling.

• Train Track – environmental/spatial sampling.

• Tomato – Product quality/characteristic

sampling.

Applets not demonstrated

Sampling Concepts: Bottles

• Bottling line at a producer of carbonated drinks.

• Sampling for product uniformity, quality control.

• When to draw a sample from a process?

• How to incorporate what you know about a process to

inform your sampling scheme?

Learning Goal

Bottles

Applet

Bottles coming off

production line

Random selection.

Sequential pattern.

Environmental Sampling: Train Track

• Aerial photograph: A section of train track.

• Scenario: Suspected diesel spill.

• What is the average level of diesel in the top 15cm of soil.

• Are there hot spots?

• Experience spatial/environmental sampling.

• Sampling to estimate mean concentration.

• Sampling to find hot spots.

Learning Goal

A flexible environment for creating other sampling scenarios.

Train Track Applet

Simple

multiple use

applet

prototype.

Product Sampling: Tomato Applet

Scenario and Challenge

Sampling

thickness of

tomato skin

Practice sampling

Basic descriptive statistics

Descriptive graphics

Report writing

Computing practice

• Greenhouse – Experimentation on tomatoes in a

traditional greenhouse.

• Factory – Experimentation via a pilot plant of a factory

process with further experimentation in the full plant.

• Mastitis – Farm and animal within farm sampling in a

trial of mastitis vaccine.

Applets Demonstrated

Web-based software environments for

virtual experimentation

Demonstration of Virtex applets

The ‘Greenhouse’ applet

Practicing basic experimental designs.

One to three blocking factors

One treatment with user-assigned levels

Greenhouse Applet

Demonstrates

o Treatment factor (nitrogen level).

o Two control factors (light and heat).

o One covariate (initial plant weight)

o All classical row-col blocking designs

o Native variability in experimental units.

o Need for blocking. Restrictions on randomization.

o Blocking vs Covariate.

o Adequate replication & randomization.

o Decision making in experimental design.

Learning Goals

../Greenhouse/index.htm

Generating data

• Observational unit – tomato seedling (144 available in 12 trays of 12)

• Initial weight of seedling known. – Available for blocking or covariate.

• Complex (TOMGROW) growth model

capable of producing multiple responses. – Total plant dry weight, leaf DW, root mass DW, etc.

– Tomato fruit yield, by quality classes.

– Leaf area.

• Responses a function of growing period, light and

heating sources.

Applet Utility

What specific concepts could this applet be used

to demonstrate?

1. Basic variability of the experimental unit.

2. Basic treatment comparisons. • How many treatment levels? Pairwise comparisons or dose

response?

• How much randomization?

3. Controlling for one extraneous source of variation? • Blocking or covariate adjustment based on initial weight

4. Controlling for two sources of variation • Blocking on (light & IDW) or (heat &IDW) or (light & heat) (Latin

Squares)

5. Controlling for three sources of variation • Blocking on light & heat & IDW (Latin Squares)

Demonstration

1. Understanding variability in the

response of the experimental

unit.

2. Completely randomized design.

3. Randomized block design.

../Greenhouse/index.htm

Entry Screen

Ability to

group

units,

create

blocks.

12 trays of

12 tomato

seedlings

Create

treatments,

assign

doses/levels.

Establish growing period.

Harvest at the end of the growing period (for now plant dry mass)

Move plants

from input

trays to

greenhouse

floor.

Variability of Experimental Units

Move seedlings

from the input flats

to the floor of the

greenhouse.

Set the growing

period.

Grow then View

Output

View Output

Location on

floor of

greenhouse

Grouping

Factor

Treatment

Factor

Pre and Post weight

All one treatment allows estimation of mean and variance of response.

Completely Randomized Design

Output

Randomized Block Design

Output




The ‘Factory’ applet

Experimental optimization

Response surface methodologies

Response Surfaces: Factory Applet

• Real time simulation of a pilot plant.

• Raw materials storage for 10 runs.

• Each raw material batch slightly different

• Set temperature, time, concentration of plant run.

• Runs take time.

• Blocking (raw material batches).

• Fractionation of factors (temperature, time, concentration).

• Experimentation to establish optimum (response surface).

Learning Goal

Pilot Plant Experiments inform Production Plant

When should I change settings to improve production.

Response surface methodologies

Definition

Fitting relationships between explanatory/

independent variables and response/dependent

variables with ‘simple’ models (polynomials), based

on designed experiments (Box & Wilson, 1951).

Objective

Experimental optimization to obtain the optimal

operating conditions of processes.

Approaches

Experimental optimization

I. Sub-optimal operating conditions are known • A first order design and model leads to consecutive small

experiments: ‘method of steepest ascent’.

• A second order design and model needs a small number of

large experiments.

II. No or not enough a priori knowledge about the

process • In the first steps screening experiments are used to separate

important components from less important ones over a

relatively large experimental region, thus involving relatively

large ranges for each factor. Mostly first order models.

• Once the key factors are determined, the approach sub I can

be followed to determine the region of interest.

First order models in RSM

First order models with or without interaction

terms, are appropriate in three situations:

1. Screening experiments to select the important factors

out of set of possible factors of influence.

2. Experiments in so narrow ranges that the expected

effect on the response variable can be assumed to be

linear. This approach is especially suitable in the

‘Method of Steepest Ascent’.

3. When the real model is known to be first order linear.

Designs for first order models

For k experimental factors these models need at

least experiments with 2 levels per factor and k+1

treatments.

For screening experiments with large amounts of

components the Plackett-Burman- and fractional

factorial designs are most appropriate.

With or without possibilities for estimating 1st order

interactions

12 treatments Plackett-Burman designs

Experimental factors

Treat

ment

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

1

+

+

-

+

+

+

-

-

-

+

-

2

+

-

+

+

+

-

-

-

+

-

+

3

-

+

+

+

-

-

-

+

-

+

+

4

+

+

+

-

-

-

+

-

+

+

-

5

+

+

-

-

-

+

-

+

+

-

+

6

+

-

-

-

+

-

+

+

-

+

+

7

-

-

-

+

-

+

+

-

+

+

+

8

-

-

+

-

+

+

-

+

+

+

-

9

-

+

-

+

+

-

+

+

+

-

-

10

+

-

+

+

-

+

+

+

-

-

-

11

-

+

+

-

+

+

+

-

-

-

+

12

-

-

-

-

-

-

-

-

-

-

-

- low level (-1)

+ high level (+1)

Fractional factorial designs

• Also these designs are fractions of 2k full factorial designs, but only a limited amount of fractions is allowed because these designs emphasise ‘balance’ in estimating factor effects. In other words the estimation of each specific main effect should consist of geometrically balanced differences of measured responses on high and low factor levels.

• Of course the reduction in the number of treatments results in the impossibility of estimating high order interactions, depending on the fraction chosen .

½ fraction

of a

23 factorial design

23 factorial design

23 factorial

Treatment

X1

X2

X3

X4

X5

X6

X7

X8

1

-1

-1

-1

+1

+1

+1

-1

+1

2

+1

-1

-1

-1

-1

+1

+1

+1

3

-1

+1

-1

-1

+1

-1

+1

+1

4

+1

+1

-1

+1

-1

-1

-1

+1

5

-1

-1

+1

+1

-1

-1

+1

+1

6

+1

-1

+1

-1

+1

-1

-1

+1

7

-1

+1

+1

-1

-1

+1

-1

+1

8

+1

+1

+1

+1

+1

+1

+1

+1

9

+1

+1

+1

-1

-1

-1

+1

-1

10

-1

+1

+1

+1

+1

-1

-1

-1

11

+1

-1

+1

+1

-1

+1

-1

-1

12

-1

-1

+1

-1

+1

+1

+1

-1

13

+1

+1

-1

-1

+1

+1

-1

-1

14

-1

+1

-1

+1

+1

+1

+1

-1

15

+1

-1

-1

+1

+1

-1

+1

-1

16

-1

-1

-1

-1

-1

-1

-1

-1

16 treatments fractional factorial screening designs to

fit a first order model in 6, 7 and 8 factors

Designs for first order RSM, without interaction

Number of factors

Possible design

<4

Full factorial

4

½ fraction of 24: 8 treatments

5

1/4 fraction of 25: 8 treatments

6


7


8


Number of exp factors Possible design

4 < 7 12 treatments (frac fact is

better)

7 – 15 20 treatments

13 – 23 28 treatments

Plackett-

Burman

Fractional factorials of 2k

Designs for first order RSM, with interaction

Reflected

Plackett-

Burman

Larger fraction

Fractional factorials

of 2k

Number of factors

Possible design

<5

Full factorial

5

1/2 fraction of 25: 16

treatments

6


treatments

7


treatments

8


treatments

Number of factors

Possible design

<7

24 treatments, fractional

factorial is better

7-15

40 treatment

13-23

56 treatment

Designs for second order models

• The Box-Behnken and Central Composite designs

are mostly used. Typically, these designs are

appropriate for second order models in two to eight

factors.

• If more than eight factors are involved also these

designs become unpractically large. In this case a

preliminary screening experiment (first order

model) is considered to select (reduce) the number

of experimental factors.

3-factor Box-Behnken designs

12 centroids on the

perimeter

Centre point 3 times

replicated

15 treatments

Box-Behnken designs are subsets of 3k factorial designs

Treatment

x1

x2

x3

1

+1

+1

0

2

+1

-1

0

3

-1

+1

0

4

-1

-1

0

5

+1

0

+1

6

+1

0

-1

7

-1

0

+1

8

-1

0

-1

9

0

+1

+1

10

0

+1

-1

11

0

-1

+1

12

0

-1

-1

13

0

0

0

14

0

0

0

15

0

0

0

3-factor Box-Behnken designs

Box-Behnken designs are practical for 3 to 7

experimental factors

Box-Behnken designs

Number of factors Number of

centroids

Replication of

the centre point

Total number of

treatments

3 12 3 15

4 24 3 27

5 40 6 46

6 48 6 54

7 56 6 62

3-factor Central Composite designs

Star point

Factorial point

6 Centre point reps

20

treatments

These designs consist of a 2k full factorial or a fractional

factorial, augmented with 2k star points and nc centre points.

Treatment

X1

X2

X3

1

1.00

1.00

-1.00

2

-1.00

1.00

-1.00

3

1.00

-1.00

-1.00

4

-1.00

-1.00

-1.00

5

1.00

1.00

1.00

6

-1.00

1.00

1.00

7

1.00

-1.00

1.00

8

-1.00

-1.00

1.00

9

1.68 ()

0.00

0.00

10

-1.68 (-)

0.00

0.00

11

0.00

1.68 ()

0.00

12

0.00

-1.68 (-)

0.00

13

0.00

0.00

1.68 ()

14

0.00

0.00

-1.68 (-)

15

0.00

0.00

0.00

16

0.00

0.00

0.00

17

0.00

0.00

0.00

18

0.00

0.00

0.00

19

0.00

0.00

0.00

20

0.00

0.00

0.00

3-factor Central Composite design

23 full

factorial

centre points

star points

Central Composite designs

Number of

factors

Full or

fractional

factorial

Value

for

Number of

factorial

treatments

Number

of star

points

Number of

replicated

centre

points

Total number

of treatments

3

full

1.68

8

6

6

20

4

full

2

16

8

6

30

5

full

2.38

32

10

8

50

5 (1/2)

½

replication

2

16

10

8

34

6

full

2.83

64

12

10

86

6 (1/2)

½

replication

2.38

32

12

10

54

7

full

3.63

128

14

10

152

7 (1/2)

½

replication

2.83

64

14

10

88

The ‘Factory’ virtual experimentation

environment

Pilot

plant

Production plant

Description of the Factory applet

• The user has to experiment with a pilot plant to find

optimized settings for the parameters of an industrial

production process.

• The experiment runs in real time (39 weeks = 120

minutes).

• The raw material for the pilot plant is stored in a tank,

which can contain enough material for 10 trials. When

the tank is empty (or upon request of the user) it will be

refilled, but the new raw material may have slightly

different characteristics.

• The temperature, reaction time and concentration are

the controlling factors of the yield of the process.

Description of the Factory applet

• Each experiment takes time!

• It takes 6 weeks to implement new optimized settings in

the production plant.

• The Profit window gives an overview of the current

situation of costs and benefits.

• At the end of the 39 weeks, the balance should be

positive and as large as possible.

The ‘Factory’ virtual experimentation

environment

Underlying model

Response surface model with added noise

History

The Factory applet was originally developed for and in

cooperation with Unilever

The learning experience

• All types of designs in Response Surface Methodology

can be tried out

• All types of one-at-the-time experimentation can be

implemented

• Lucky shot strategies can be compared with systematic

designed approaches

• Additional problems, for which textbooks offer no

immediate solution

– When are the pilot plant results sufficiently convincing to justify the

cost for changing the production plant settings?

– Should one perform one large experiment and then decide, or do

many very small experiments with a possibility to make an

adjustment each time, or do something in between?

‘One-factor-at-a-time’ approach

Factorial approach

Method of steepest ascent

Sub-optimal

operating

conditions

Direction of

steepest

ascent

22 factorial

design

Method of steepest ascent

The learning experience

• Choice of factors : obvious, but tank=block ?

• Choice of levels: consequences

• Relation pilot plant results - factory results

• Confidence in a result: when to change factory settings?

• Sequential experimentation vs one-shot (or anything in

between)

• Cost of an experiment: power vs precision

• Need for randomization (hidden time trend)

• Time pressure

Demonstration




The ‘Mastitis’ applet

Animal Experimentation.

Random effects – Within and Between

Sampling Variation

Experimental Design: Vaccine Trial

• Simulation of the effect of a vaccine for E. coli mastitis – (udder infection in dairy cows).

• Raw material: All cows from multiple farms – (select which for treatment).

• Dose for each treated cow.

• Animal and Farm factors have impact on vaccine effect.

• Demonstrate effects of proper/improper randomization.

• Two sources of variability (animal and farm).

• Treatment (vaccine dose).

• Goal: estimate variance components, reduce error

variance.

Learning Goal

Vaccine Applet

Farms (10)

Cows(Farms)

No Vaccine

Vaccine

Low dose

High dose

Randomization

Zebras, zebus, and holsteins

http://www.google.be/url?sa=i&rct=j&q=&esrc=s&frm=1&source=images&cd=&cad=rja&uact=8&ved=0CAcQjRw&url=http://vastrapboran.com/&ei=fDl8VImEEcb0POGYgcgK&psig=AFQjCNG8LIVkHX9bAPr69ilFIzovRFi3ww&ust=1417513589532622

Effect of infusion dose E. coli

on milk reduction at 48 hours

Increase of Escherichia coli Inoculum Doses Induces Faster

Innate Immune Response in Primiparous Cows (2004).

F. Vangroenweghe, P. Rainard, M. Paape, L. Duchateau, C. Burvenich.

JDS 84, 4132-4144.

Effects of infusion dose and vaccine

Experimental units and randomisation

Data Generation

• Observational unit = cow

• Initial milk from real dataset on 10 farms, 𝑠𝑖(𝑗)

• Model for milk reduction 48 h after infusion

𝑟𝑖(𝑗) = 𝜇 + 𝑓𝑗 + 𝜋𝑖(𝑗) + 𝛾1𝑖(𝑗) + 𝛾2𝑖(𝑗) + 𝛿𝑖(𝑗) + 𝑐𝑖(𝑗)

• Milk production 48 h after infusion

𝑚𝑖(𝑗) = 𝑠𝑖(𝑗) 1 − 𝑟𝑖(𝑗)

0.6 -0.3 -0.15 -0.3 0

~N(0,0.006) ~N(0,0.008)

Output/Response

• Feedback on the randomisation and of the

design proposed/used.

• Dataset on which an analysis can be

performed.

Examples

1. Bad: randomisation at farm level with 2 farms

2. Acceptable: 1 farm randomizing at cow level

3. Optimal: randomised complete block design

4. More complex: split plot design

Demonstration

Student Experiences at KUL

Last two years students evaluated the

greenhouse applet as a practical exercise tool

for a basic design of experiments course

Six questions were asked to 128 students

Magnitude estimation

0 100

I agree I do not agree

Experiences at KUL

Question 1

With this applet I learned things of which I didn’t realize the

importance after the classroom teaching 0

20

40

60

80

10

0

Question 1Agree

Do Not Agree

Experiences at KUL

Question 2

Obtaining the data with the applet was more difficult than

analysing it with the stats soft 0

20

40

60

80

10

01

20

Question 2

Experiences at KUL

Question 3

The applet shows a reasonable image of a real situation as

it occurs in practice 0

20

40

60

80

Question 3

Experiences at KUL

Question 4

The applet is sufficiently user friendly: it was not very

difficult to specify the experiments precisely as I wanted 0

20

40

60

80

10

0

Question 4

Experiences at KUL

Question 5

The practical exercise required too much of my time,

compared to the things I learned from them 0

20

40

60

80

10

0

Question 5

Experiences at KUL

Question 6

The experimental situation shown in the applet was

needlessly complex 0

20

40

60

80

10

01

20

Question 6

Conclusions

• Realistic/useful applets can be created.

• Applets can engage students in the learning of key

experimental design concepts beyond textbook

exercises.

• Need for easier-to-maintain programming

environments (JAVA engine upgrades).

• Need for richer data generating models/simulations

(full TOMGROW).

• Need to carefully integrate applet-based exercises

with traditional teaching methodologies.

• Need for an experimental design textbook that takes

the applet approach as its backbone.

VIRTEX index page

Available soon from:

http://www.biw.kuleuven.be/dtp/TQM/software/software.htm

Thank You

VIRTEX Session 11.30-12 - KU Leuven · 2014-12-12 · VIRTEX Session 11.30-12.50 Introduction to...

Documents

Transcript of VIRTEX Session 11.30-12 - KU Leuven · 2014-12-12 · VIRTEX Session 11.30-12.50 Introduction to...