2012 02 some things you learn after getting your certificate
Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you...
Transcript of Things you learn on - University of Minnesota Duluthrregal/documents/5411_2010/... · Things you...
Things you learn on one road can be really helpful on later roads
Brainy CEO with a Philosophy of Fun
Minneapolis St Paul Business Journal - January 6 2006
by Steve LeBeau Managing Editorhttpwwwbizjournalscomtwincitiesstories20060109focus1html
Robert Senkler knows his business from the bottom up because thats where he started -- as an actuarial trainee 31 years ago
His career path first emerged in second grade when he realized he was a math whiz He attended the University of Minnesota Duluth -- partly because it had a great mathematics department but also because it was close to some family land where he could hunt
bull After Senkler graduated from UMD hendash hired a writing coach ndash joined Toastmasters to work on his speaking
skills
bull Writing and speaking matter
bull Readndash Browse the current books ndash Reading effective writing helps guide your own writingndash John Grisham or Steven king work fine
bull Speakndash Take a speech classndash Act in a play
Career Advice
ldquoBe something you love and understandrdquo
Ronnie Van Zant
Consulting Advice
John Wilder Tukey Donner Professor Emeritus of Science at Princeton University and one of the most important contributors to the field of statistics died 26 July 2000 in New Brunswick New Jersey following a heart attack
The introduction of new terminology to capture distinctive concepts would become a Tukey trademark For example he coined the contraction bitldquo for binary digit Tukey is credited with the first printed use of the word software to refer to computer programs he observed that the software might well prove to become more valuable than the hardware
The saying that an approximate solution of the exact problem is more useful than the exact solution of an approximate problem has often been attributed to Tukey
Understanding the Questions bull For example if your interest is in biological applications
bull Take real biology classes
bull Go to biology and medicine seminars
bull Read published reports
bull Work in a biology lab
bull Computational tasks
bull Web design
bull Volunteer
bullSimilarly if you have different areas of interest
Data Analysis
Plot the data
01
1
10
100
1000
01 1 10 100
Fetal Weight
Placental Weight
Computingbull Take CS or MIS classesbull Take other classes with computing
ndash GIS Geographic and Information Systems classndash Bioinformatics
bull Excel SAS and Rbull SAS Certification
ndash The Little SAS Book ndash SAS Certification Guide
bull Base Programmingbull Advanced Programming
However
bull If you do everything or even everything as well as you can you arenrsquot prioritizing
bull ldquoEverything in moderation including moderationrdquo Lost Horizon
bull Have a few things that you have done thoroughly and wellbull See recommendations darr
bull Get to know at least three people well enough so that they can give you good letters of recommendation
bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively
bullThis may not be in class
Design and Data Analysisbull Plan carefully before collecting data
bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might
influence results
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Brainy CEO with a Philosophy of Fun
Minneapolis St Paul Business Journal - January 6 2006
by Steve LeBeau Managing Editorhttpwwwbizjournalscomtwincitiesstories20060109focus1html
Robert Senkler knows his business from the bottom up because thats where he started -- as an actuarial trainee 31 years ago
His career path first emerged in second grade when he realized he was a math whiz He attended the University of Minnesota Duluth -- partly because it had a great mathematics department but also because it was close to some family land where he could hunt
bull After Senkler graduated from UMD hendash hired a writing coach ndash joined Toastmasters to work on his speaking
skills
bull Writing and speaking matter
bull Readndash Browse the current books ndash Reading effective writing helps guide your own writingndash John Grisham or Steven king work fine
bull Speakndash Take a speech classndash Act in a play
Career Advice
ldquoBe something you love and understandrdquo
Ronnie Van Zant
Consulting Advice
John Wilder Tukey Donner Professor Emeritus of Science at Princeton University and one of the most important contributors to the field of statistics died 26 July 2000 in New Brunswick New Jersey following a heart attack
The introduction of new terminology to capture distinctive concepts would become a Tukey trademark For example he coined the contraction bitldquo for binary digit Tukey is credited with the first printed use of the word software to refer to computer programs he observed that the software might well prove to become more valuable than the hardware
The saying that an approximate solution of the exact problem is more useful than the exact solution of an approximate problem has often been attributed to Tukey
Understanding the Questions bull For example if your interest is in biological applications
bull Take real biology classes
bull Go to biology and medicine seminars
bull Read published reports
bull Work in a biology lab
bull Computational tasks
bull Web design
bull Volunteer
bullSimilarly if you have different areas of interest
Data Analysis
Plot the data
01
1
10
100
1000
01 1 10 100
Fetal Weight
Placental Weight
Computingbull Take CS or MIS classesbull Take other classes with computing
ndash GIS Geographic and Information Systems classndash Bioinformatics
bull Excel SAS and Rbull SAS Certification
ndash The Little SAS Book ndash SAS Certification Guide
bull Base Programmingbull Advanced Programming
However
bull If you do everything or even everything as well as you can you arenrsquot prioritizing
bull ldquoEverything in moderation including moderationrdquo Lost Horizon
bull Have a few things that you have done thoroughly and wellbull See recommendations darr
bull Get to know at least three people well enough so that they can give you good letters of recommendation
bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively
bullThis may not be in class
Design and Data Analysisbull Plan carefully before collecting data
bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might
influence results
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
bull After Senkler graduated from UMD hendash hired a writing coach ndash joined Toastmasters to work on his speaking
skills
bull Writing and speaking matter
bull Readndash Browse the current books ndash Reading effective writing helps guide your own writingndash John Grisham or Steven king work fine
bull Speakndash Take a speech classndash Act in a play
Career Advice
ldquoBe something you love and understandrdquo
Ronnie Van Zant
Consulting Advice
John Wilder Tukey Donner Professor Emeritus of Science at Princeton University and one of the most important contributors to the field of statistics died 26 July 2000 in New Brunswick New Jersey following a heart attack
The introduction of new terminology to capture distinctive concepts would become a Tukey trademark For example he coined the contraction bitldquo for binary digit Tukey is credited with the first printed use of the word software to refer to computer programs he observed that the software might well prove to become more valuable than the hardware
The saying that an approximate solution of the exact problem is more useful than the exact solution of an approximate problem has often been attributed to Tukey
Understanding the Questions bull For example if your interest is in biological applications
bull Take real biology classes
bull Go to biology and medicine seminars
bull Read published reports
bull Work in a biology lab
bull Computational tasks
bull Web design
bull Volunteer
bullSimilarly if you have different areas of interest
Data Analysis
Plot the data
01
1
10
100
1000
01 1 10 100
Fetal Weight
Placental Weight
Computingbull Take CS or MIS classesbull Take other classes with computing
ndash GIS Geographic and Information Systems classndash Bioinformatics
bull Excel SAS and Rbull SAS Certification
ndash The Little SAS Book ndash SAS Certification Guide
bull Base Programmingbull Advanced Programming
However
bull If you do everything or even everything as well as you can you arenrsquot prioritizing
bull ldquoEverything in moderation including moderationrdquo Lost Horizon
bull Have a few things that you have done thoroughly and wellbull See recommendations darr
bull Get to know at least three people well enough so that they can give you good letters of recommendation
bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively
bullThis may not be in class
Design and Data Analysisbull Plan carefully before collecting data
bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might
influence results
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Career Advice
ldquoBe something you love and understandrdquo
Ronnie Van Zant
Consulting Advice
John Wilder Tukey Donner Professor Emeritus of Science at Princeton University and one of the most important contributors to the field of statistics died 26 July 2000 in New Brunswick New Jersey following a heart attack
The introduction of new terminology to capture distinctive concepts would become a Tukey trademark For example he coined the contraction bitldquo for binary digit Tukey is credited with the first printed use of the word software to refer to computer programs he observed that the software might well prove to become more valuable than the hardware
The saying that an approximate solution of the exact problem is more useful than the exact solution of an approximate problem has often been attributed to Tukey
Understanding the Questions bull For example if your interest is in biological applications
bull Take real biology classes
bull Go to biology and medicine seminars
bull Read published reports
bull Work in a biology lab
bull Computational tasks
bull Web design
bull Volunteer
bullSimilarly if you have different areas of interest
Data Analysis
Plot the data
01
1
10
100
1000
01 1 10 100
Fetal Weight
Placental Weight
Computingbull Take CS or MIS classesbull Take other classes with computing
ndash GIS Geographic and Information Systems classndash Bioinformatics
bull Excel SAS and Rbull SAS Certification
ndash The Little SAS Book ndash SAS Certification Guide
bull Base Programmingbull Advanced Programming
However
bull If you do everything or even everything as well as you can you arenrsquot prioritizing
bull ldquoEverything in moderation including moderationrdquo Lost Horizon
bull Have a few things that you have done thoroughly and wellbull See recommendations darr
bull Get to know at least three people well enough so that they can give you good letters of recommendation
bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively
bullThis may not be in class
Design and Data Analysisbull Plan carefully before collecting data
bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might
influence results
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Consulting Advice
John Wilder Tukey Donner Professor Emeritus of Science at Princeton University and one of the most important contributors to the field of statistics died 26 July 2000 in New Brunswick New Jersey following a heart attack
The introduction of new terminology to capture distinctive concepts would become a Tukey trademark For example he coined the contraction bitldquo for binary digit Tukey is credited with the first printed use of the word software to refer to computer programs he observed that the software might well prove to become more valuable than the hardware
The saying that an approximate solution of the exact problem is more useful than the exact solution of an approximate problem has often been attributed to Tukey
Understanding the Questions bull For example if your interest is in biological applications
bull Take real biology classes
bull Go to biology and medicine seminars
bull Read published reports
bull Work in a biology lab
bull Computational tasks
bull Web design
bull Volunteer
bullSimilarly if you have different areas of interest
Data Analysis
Plot the data
01
1
10
100
1000
01 1 10 100
Fetal Weight
Placental Weight
Computingbull Take CS or MIS classesbull Take other classes with computing
ndash GIS Geographic and Information Systems classndash Bioinformatics
bull Excel SAS and Rbull SAS Certification
ndash The Little SAS Book ndash SAS Certification Guide
bull Base Programmingbull Advanced Programming
However
bull If you do everything or even everything as well as you can you arenrsquot prioritizing
bull ldquoEverything in moderation including moderationrdquo Lost Horizon
bull Have a few things that you have done thoroughly and wellbull See recommendations darr
bull Get to know at least three people well enough so that they can give you good letters of recommendation
bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively
bullThis may not be in class
Design and Data Analysisbull Plan carefully before collecting data
bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might
influence results
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Understanding the Questions bull For example if your interest is in biological applications
bull Take real biology classes
bull Go to biology and medicine seminars
bull Read published reports
bull Work in a biology lab
bull Computational tasks
bull Web design
bull Volunteer
bullSimilarly if you have different areas of interest
Data Analysis
Plot the data
01
1
10
100
1000
01 1 10 100
Fetal Weight
Placental Weight
Computingbull Take CS or MIS classesbull Take other classes with computing
ndash GIS Geographic and Information Systems classndash Bioinformatics
bull Excel SAS and Rbull SAS Certification
ndash The Little SAS Book ndash SAS Certification Guide
bull Base Programmingbull Advanced Programming
However
bull If you do everything or even everything as well as you can you arenrsquot prioritizing
bull ldquoEverything in moderation including moderationrdquo Lost Horizon
bull Have a few things that you have done thoroughly and wellbull See recommendations darr
bull Get to know at least three people well enough so that they can give you good letters of recommendation
bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively
bullThis may not be in class
Design and Data Analysisbull Plan carefully before collecting data
bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might
influence results
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Data Analysis
Plot the data
01
1
10
100
1000
01 1 10 100
Fetal Weight
Placental Weight
Computingbull Take CS or MIS classesbull Take other classes with computing
ndash GIS Geographic and Information Systems classndash Bioinformatics
bull Excel SAS and Rbull SAS Certification
ndash The Little SAS Book ndash SAS Certification Guide
bull Base Programmingbull Advanced Programming
However
bull If you do everything or even everything as well as you can you arenrsquot prioritizing
bull ldquoEverything in moderation including moderationrdquo Lost Horizon
bull Have a few things that you have done thoroughly and wellbull See recommendations darr
bull Get to know at least three people well enough so that they can give you good letters of recommendation
bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively
bullThis may not be in class
Design and Data Analysisbull Plan carefully before collecting data
bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might
influence results
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
01
1
10
100
1000
01 1 10 100
Fetal Weight
Placental Weight
Computingbull Take CS or MIS classesbull Take other classes with computing
ndash GIS Geographic and Information Systems classndash Bioinformatics
bull Excel SAS and Rbull SAS Certification
ndash The Little SAS Book ndash SAS Certification Guide
bull Base Programmingbull Advanced Programming
However
bull If you do everything or even everything as well as you can you arenrsquot prioritizing
bull ldquoEverything in moderation including moderationrdquo Lost Horizon
bull Have a few things that you have done thoroughly and wellbull See recommendations darr
bull Get to know at least three people well enough so that they can give you good letters of recommendation
bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively
bullThis may not be in class
Design and Data Analysisbull Plan carefully before collecting data
bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might
influence results
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Computingbull Take CS or MIS classesbull Take other classes with computing
ndash GIS Geographic and Information Systems classndash Bioinformatics
bull Excel SAS and Rbull SAS Certification
ndash The Little SAS Book ndash SAS Certification Guide
bull Base Programmingbull Advanced Programming
However
bull If you do everything or even everything as well as you can you arenrsquot prioritizing
bull ldquoEverything in moderation including moderationrdquo Lost Horizon
bull Have a few things that you have done thoroughly and wellbull See recommendations darr
bull Get to know at least three people well enough so that they can give you good letters of recommendation
bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively
bullThis may not be in class
Design and Data Analysisbull Plan carefully before collecting data
bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might
influence results
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
However
bull If you do everything or even everything as well as you can you arenrsquot prioritizing
bull ldquoEverything in moderation including moderationrdquo Lost Horizon
bull Have a few things that you have done thoroughly and wellbull See recommendations darr
bull Get to know at least three people well enough so that they can give you good letters of recommendation
bull Recommendations are more than gradesbull Are you a good team memberbull Do you have leadership experiencebull Are you able to solve problems effectively
bullThis may not be in class
Design and Data Analysisbull Plan carefully before collecting data
bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might
influence results
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Design and Data Analysisbull Plan carefully before collecting data
bull Avoid bias and unexplained variabilitybull Randomizationbull Blindingbull Placebo sham surgerybull Blockingbull Record covariates and other factors that might
influence results
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
bull Understand the questionsbull Know how the experiment was conducted
bull Dependence between valuesbull Pairingbull Blockingbull Fish in tanks students in schools hellipbull Repeated measures for the same animal personhellip
bull Check clean the databull Plot individual data pointsbull Check for out of bounds entries
bull Age = 240bull Check consistency of entries
bull Not male and hysterectomy
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
bull Check assumptionsbull Independence How was the experiment
conductedbull Plot the original data pointsbull Normality
bull PDP polymerase significant after log transformation
bull Insect traps significant after log transformation
bull Equal variancesbull Fixing assumptions
bull Transformations sometimesbull Other methods
bull Weibull models
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
bull Check assumptionsbull Plot residualsbull Especially with more complicated data
where itrsquos hard to plot all factors at one time effectively
bull At leastbull Residuals versus predictedbull Normal plot of residualsbull Cookrsquos Distance
bull Additionally helpful to plot residualsbull vs factors in the modelbull vs other variables such as technicianbull In time order of collection
bull Check for drift over time
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
bull Cookrsquos Distance See answers to Lab 6bull An extreme outlier may not have a large residual
bull If it has large influence on the fitted model
0
100
200
300
400
500
600
0 2 4 6 8 10 12 14 16
Y
X
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
bull Cookrsquos Distance See answers to Lab 6bull bull Ross Garberichs masters project
bull The Economic Utilization of Patients with Refractory Angina bull Here one patient had a very large influence on the results bull The patient had 30 lifetime angioplasties bull When the angioplasties were grouped into 5 or more angioplasties
the Cooks distance plot is just finebull Cookrsquos distance gt 05 is often considered bigbull Cookrsquos distance gt 1 is often considered a definite problem
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
bull Include factors in models that improve predictionsbull Include temperatures when comparing brands of syrupsbull Include covariate Use when comparing keyboard painbull Include blocks=sites when comparing insect traps
bull Donrsquot overfitbull Hypothesis testsbull Information Criteria Akaike Information Criterion (AIC)
bull AICC Corrected for small sample sizes
Overfitting Polynomial
0
20
40
60
80
100
120
0 1 2 3 4 5 6
X
Y
bullThe fourth order polynomial in pinkbullFits the data exactly Error SS=0 bullBut likely would not work well for predicting for x=05 or x=15
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Using Regression Class Formulas
Using x = 1 2 3 4 5
Linear 10 14 Quadratic 21 21Cubic 49 48Quartic 158 156
( )ySE ˆAt x = 6 Simulations (1000)
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
The formulas are simplest for orthogonal polynomialsbull Inverting a diagonal matrix XTX is simple
( )
( ) ( )
( ) ( ) ( )
( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )xPxPxPxPyy
aaaxxxxxP
axxxxxP
axxxP
xxxP
44332211
22224
4
23
3
22
2
1
ˆˆˆˆˆ560
01314
13320
7312
1
lowast+lowast+lowast+lowast+=
minusminuslowast+
minuslowastminusminusminus=
minuslowastminusminusminus=
minusminusminus=
minus=
ββββ
Using these orthogonal polynomialsbull Type I and Type III SS are the samebull SS for P2(x) is SS explained by a quadratic term
above and beyond the linear term
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
let b0=5let b1=20let std=1let nsim=1000
data polynomialcall streaminit(0)do isim=1 to ampnsim
do time= 1 to 5 by 1mu=ampb0+ampb1time
y=rand(normal mu ampstd)output
end time = 6 y = output
endrun
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
ods listing closeproc glm data=polynomialmodel y = timeoutput out=outp p=predby isimrun
ods listingproc means data=outp
where time = 6var predrun
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
For Balanced Factorial Models andor Blocks
bull The complexity of the comparison of treatment effects is the samewhether we include another factor or blocks
bull The difference between means with same number of values inthe means either way
bull The only down side is fewer df for estimating the variance
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
bull Pay attention to interactionsbull Main effects need to be interpreted very
carefully when there are interactionsbull In the KCT example K does not have a
significant but K definitely affects yieldbull But differently for low and high temperaturesbull Donrsquot summarize K effects with means over
both temperaturesbull Summarize K effects separately for low and
high temperaturesbull In the age at metamorphosis data on Assign 7
bull B and C do not have significant main effectsbull But based on significant interactions both B
and C do affect age at metamorphosis
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Account for Unbalanced Data
bull Use Least Squares Means and Type III SS when appropriatebull Sometimes we are still interested in Type I SS or unadjusted
comparisonsbull For The Rose-Hellekant PCR data we could still be interested
in tamoxifen vs placebo not adjusted for outcome since the outcome is also partially a result of the treatment
bull If we have correlated effects possibly neither is significant with Type III SS while one or both could be significant for Type I SS
bull There are also sound arguments for using Type II SS which we didnrsquot cover
bull Adjust effects of keyboard on pain according to how much the keyboards were used
bull Adjust comparisons of male and female mice to account for different mixes of young and old mice
bull Adjust comparisons of tomato varieties to account for one variety not being used in all of the blocks
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Account for Random Effectsbull Donrsquot pretend that measurements for each fish in a tank are
separate independent pieces of information independent replicates
bull Donrsquot pretend testing the same paper helicopter more than one represents a separate independent piece of informationbull This is pseudo-replication mistake (Hurlbert 1984)
bull If data are balanced the analysis could be done with means in each tank of fishbull But donrsquot do this if data are unbalancedbull Also estimating sizes of variances from different
sources can be used to decide how to use resources optimally in the next experimentbull At what point is it better to take the time to make
the next paper helicopter rather than flying this helicopter again
bull See notes on random effects and nestingbull Donrsquot end up saying that a medication has an effect in the
wider population of possible patients based on data from just two patients
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Think about and check whether the results make sensebull Plot the databull In one case a student was getting p-values of less than 1 in a
million in their masterrsquos project when the right p-values were more like 06bull Plotting the data is the best protection to check the p-values
bull Donrsquot decide that Hansen and Eggo syrups are not different when the plot clearly shows that they are different
bull Check results with alternative calculationsbull Check that df in an ANOVA table add to the right valuebull Simulationsbull When using a new program run the program on a set of data
where you know the right answer to check that you are using the program correctlybull Use unbalanced data to make the test more general
bull Check that results make sense based on knowledge of the systembull Check if the estimate of slope for pain versus amount of use of
a keyboard comes out negative
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-
Cultivate Consideration Concern and Compassion
ldquoA little love and affection in everything you
Will make the world a better place with or without yourdquo
Neil Young Greendale
Stop to Smell the Flowers and Dream
ldquoYoursquove got to have a dream
If you donrsquot have a dream
How you going to have a dream come truerdquo
South Pacific
- Slide Number 1
- Slide Number 2
- Slide Number 3
- Slide Number 4
- Slide Number 5
- Slide Number 6
- Slide Number 7
- Slide Number 8
- Slide Number 9
- Consulting Advice
- Slide Number 11
- Slide Number 12
- Slide Number 13
- Slide Number 14
- Computing
- Slide Number 16
- Slide Number 17
- Slide Number 18
- Slide Number 19
- Slide Number 20
- Slide Number 21
- Slide Number 22
- Slide Number 23
- Slide Number 24
- Slide Number 25
- Slide Number 26
- Slide Number 27
- Slide Number 28
- Slide Number 29
- Slide Number 30
- Slide Number 31
- Slide Number 32
- Slide Number 33
- Slide Number 34
- Slide Number 35
- Slide Number 36
- Slide Number 37
-