Achievement Standard AS 91581. PPDAC cycle One of the principles: Grade distinctions should not be...
-
Upload
lucas-kelley -
Category
Documents
-
view
214 -
download
1
Transcript of Achievement Standard AS 91581. PPDAC cycle One of the principles: Grade distinctions should not be...
Ach
ieve
ment
Sta
ndard
AS 91581
PPDAC cycle
One of the principles:
Grade distinctions should not be based on the candidate being required to acquire and retain more subject-specific knowledge.
AS A M E
3.9 Investigate bivariate measurement data
Investigate bivariate measurement data, with justification
Investigate bivariate measurement data, with statistical insight
Ass
ess
ment
is
base
d o
n t
he P
PD
AC
Pose a question Plan Data Analyse Conclude
Back to question
Statistical enquiry cycle (PPDAC)
PPDAC is vital
Using the statistical enquiry cycle to …
investigate bivariate measurement data involves:
• posing an appropriate relationship question using a given multivariate data set
• selecting and using appropriate displays
• identifying features in data
• finding an appropriate model
• describing the nature and strength of the relationship and relating this to the context
• using the model to make a prediction
• communicating findings in a conclusion
Using the statistical enquiry cycle to …
investigate bivariate measurement data involves:
• posing an appropriate relationship question using a given multivariate data set
• selecting and using appropriate displays
• identifying features in data
• finding an appropriate model
• describing the nature and strength of the relationship and relating this to the context
• using the model to make a prediction
• communicating findings in a conclusion
Posing relationship questions
Possibly the most important component of the investigation
• Time spent on this component can determine the overall quality of the investigation
• This component provides an opportunity to show justification (M) and statistical insight (E)
Posing relationship questions
What makes a good relationship question?
• It is written as a question.
• It is written as a relationship question.
• It can be answered with the data available.
• The variables of interest are specified.
• It is a question whose answer is useful or interesting.
• The question is related to the purpose of the task.
• Think about the population of interest. Can the results be extended to a wider population?
Data
WH
OA
! We need to understand
the variables before we
can make any progress.
Consi
der
the v
ari
able
s (u
sing c
onte
xt))
Thin
k about
whic
h v
ari
able
s
could
be r
ela
ted
JU
STIF
Y YO
UR
REA
SO
NIN
G
Developing question posing skills
• Pose several relationship questions (written with reasons/justifications)
• Possibly critique the questions
• The precise meaning of some variables may need to be researched
Different relationship questions
Is there a relationship between variable 1 and variable 2 for Hector’s dolphins?
What is the nature of the relationship between variable 1 and variable 2 for Hector’s dolphins?
Can variable 1 be used to predict variable 2 for Hector’s dolphins?
Rese
arch
about th
e
situatio
n h
elp
s deve
lop
ideasWe might like to ask
ourselves why we
would want to know
about any relationships
that might exist”
HINT: Groupings
A morphological study of skull and
mandible features was undertaken to
examine variation between the most
genetically distinct population, occurring
on the west coast of the North Island,
and the populations around the South
Island. Univariate and principal
component analyses demonstrate that
the North Island population can be
differentiated from the southern
populations on the basis of several
skeletal characters.
Developing question posing skills
It is expected that you do some research:
• Improve knowledge of variables and context
• May find some related studies that creates potential for integration of statistical and contextual knowledge
Developing question posing skills
• Draw some scatter plots to start to investigate your questions
• Reduce, add to and/or prioritise their list of questions
• Possibly critique the questions again
Consi
der
the v
ari
able
s (u
sing c
onte
xt))
Appropriate displays
Which variable goes on the x-axis and which goes on the y-axis?
• It depends on the question and on the variables of interest
• Is there a relationship between zygomatic width and rostrum length for Hector’s dolphins?
Variables on axes
Is there a relationship between zygomatic width and rostrum length for Hector’s dolphins?”
This is a straight forward relationship question. Is there a relationship between zygomatic
width and rostrum length for Hector’s dolphins?”
This is a ‘correlation’ type question.
Is there a relationship between zygomatic width and rostrum length for Hector’s dolphins?”
Is t
here
a r
ela
tionsh
ip b
etw
een
zygom
ati
c w
idth
and r
ost
rum
length
for
Hect
or’
s dolp
hin
s?”
There is no difference in the
roles of each variable so it
does not matter which variable goes on the x-axis
and which goes on the
y-axis.
For this question it does not
matter which variable goes
on each axis.
Which variable goes on the x-axis and which goes on the y-axis?
It depends on the question and on the variables of interest• Is there a relationship between zygomatic
width and rostrum length for Hector’s dolphins?
• Is there a relationship between rostrum width at midlength and rostrum width at base for Hector’s dolphins?
Variables on axes
Is there a relationship between rostrum width at midlength and rostrum width at the base for Hector’s dolphins?
“I think that the width at the base could help determine (or influence) the width at midlength.”
Is there a relationship between rostrum width at midlength and rostrum width at the base for Hector’s dolphins?
Rostrum length at base (RWB) the explanatory variable and rostrum width at midlength the response variable..
Is there a relationship between rostrum width at midlength and rostrum width at the base for Hector’s dolphins?
Which variable goes on the x-axis and which goes on the y-axis?
• It depends on the question and on the variables of interest• Is there a relationship between zygomatic
width and rostrum length for Hector’s dolphins?
• Is there a relationship between rostrum width at midlength and rostrum width at the base for Hector’s dolphins?
• For Hector’s dolphins, can rostrum length be used to predict mandible length?
Variables on axes
For Hector’s dolphins, can rostrum length be used to predict mandible length?
This is a regression question
For Hector’s dolphins, can rostrum length be used to predict mandible length?
It is clear that we are interested in how mandible length responds to rostrum length, so rostrum length is the explanatory variable and mandible length is the response variable.
Experim
ents:
If the data comes from
an
experiment then som
e
variables would be classed as
input variables and others as
output variables. It would m
ake
sense to see how an input
variable affects an output
variable. The input variable
would be the explanatory
variable and the output
variable would be the response
variable.
Using the statistical enquiry cycle to …
investigate bivariate measurement data involves:
• posing an appropriate relationship question using a given multivariate data set
• selecting and using appropriate displays
• identifying features in data
• finding an appropriate model
• describing the nature and strength of the relationship and relating this to the context
• using the model to make a prediction
• communicating findings in a conclusion
Same approach whether linear or non-linear
investigate bivariate measurement data involves:
• posing an appropriate relationship question using a given multivariate data set
• selecting and using appropriate displays
• identifying features in data
• finding an appropriate model
• describing the nature and strength of the relationship and relating this to the context
• using the model to make a prediction
• communicating findings in a conclusion
Features, model, nature and strength
Generate the scatter plot using iNZight (or excel) RWM vs RWB
Features, model, nature and strength
Generate the scatter plot• Let the data speak• Use your eyes (visual aspects)
Write about what you see, not what you don’t see
DON’T fit a model yet
Features, model, nature and strength
Template for features (but allow flexibility)• Trend• Association (nature)• Strength (degree of scatter)• Groupings/clusters• Unusual observations• Other (e.g., variation in scatter)• If there are groupings evident in the plot then it
may be better to explore the groups separately at this stage.
Trend
From the scatter plot it appears that there is a linear trend between rostrum width at base and rostrum width at midlength.
Use descriptions of variables rather than variable names- keeps it contextual.
This is a reasonable expectation because two different measures on the same body part of an animal could be in proportion to each other.
Association The scatter plot also
shows that as the rostrum width at base increases the rostrum width at midlength tends to increase.
This is to be expected because dolphins with small rostrums would tend to have small values for rostrums widths at base and midlength and dolphins with large rostrums would tend to have large values for rostrums widths at base and midlength.
Association The scatter plot also
shows that as the rostrum width at base increases the rostrum width at midlength tends to increase.
A contextual description is preferable to one using technical terms.
However it is appropriate to use terms such as positive, negative or no association, but they are better used after the contextual description.
Higher level considerations This is to be expected because dolphins with
small rostrums would tend to have small values for rostrums widths at base and midlength and dolphins with large rostrums would tend to have large values for rostrums widths at base and midlength.
You should reflect on the nature of the relationship with respect to the context.
At this stage you could acknowledge (if the data does not come from a randomised experiment) that they have found only a statistical relationship and that this does not necessarily imply a causal relationship between the variables.
Higher level considerations Alternatively, if the data comes from a suitable
experiment you could make a causation claim in your conclusion.
You may acknowledge that other variables (which they must name) would impact on the (response) variable, and suggest how they might impact on the variable. For example, gender, age, etc. and perhaps show these and compare.
Find a model Because the trend is
linear I will fit a linear model to the data.
You must state why you
have selected this
particular model. The line is a good model for the data
because for all values of rostrum width at base, the number of points above the line are about the same as the number below it.
Find a model Because the trend is
linear I will fit a linear model to the data.
You must state why you
have selected this
particular model.
Don’t show the equation yet. There are still features in the data to comment on.
Find a model Because the trend is
linear I will fit a linear model to the data.
You must state why you
have selected this
particular model.
A discussion of fit throughout the range of x-values is required.
If the number of observations is small that casts some doubt on the reliability of the model.
Find a model The line is a good model for the data
because for all values of rostrum width at base, the number of points above the line are about the same as the number below it.
A discussion of fit throughout the range of x-values is required.
If the number of observations is small that casts some doubt on the reliability of the model.
Strength The points on the graph
are reasonably close to the fitted line so the relationship between rostrum width at midlength and rostrum width at base is reasonably strong. This is supported by the correlation coefficient r =
This must refer to visual aspects of the display.
Appropriate descriptors are: strong, moderate or weak.
Strength The points on the graph
are reasonably close to the fitted line so the relationship between rostrum width at midlength and rostrum width at base is reasonably strong. This is supported by the correlation coefficient r =
You must refer to the degree of scatter about the trend or, equivalently, the closeness of the points to the trend.
Gro
upin
gs
If gr
oupi
ngs
are
appa
rent
then
can
a r
easo
n be
foun
d
that
exp
lain
s th
e gr
oupi
ngs?
But
the
data
set
has
an
obvi
ous
grou
ping
va
riabl
e (I
slan
d).
INZight
Groupings
Groupings Now it seems that NI
Hector’s dolphins seem to have larger rostrum widths at base than SI ones.
This is now helpful for doing predictions.
There are some common values of RWB for both species of dolphin.
This allows more comments to be made about the model fitted to all data points.
Groupings Now it seems that NI
Hector’s dolphins seem to have larger rostrum widths at base than SI ones.
Many NI points are above the line.
Why is that?
It could be the effect on the fitted line of the unusual point (RWB = 86, RWM = 50).
Groupings
This now provides an opportunity to comment on these models.
South Island dolphins: Model seems appropriate.
North Island: Only 13 data points
– model may not be useful for reliable conclusions.
Points with lower RWB values tend to be below the line, points in the middle tend to be above.
Could try a quadratic model but small number of data points is an issue
Unusual points One dolphin, one of
those with a rostrum width at base of 86mm, had a smaller rostrum width at midlength compared to dolphins with the same, or similar,rostrum widths at base.
Unusual points One dolphin, one of
those with a rostrum width at base of 86mm, had a smaller rostrum width at midlength compared to dolphins with the same, or similar,rostrum widths at base.
Refer to actual data points.
Unusual points One dolphin, one of
those with a rostrum width at base of 86mm, had a smaller rostrum width at midlength compared to dolphins with the same, or similar,rostrum widths at base.
Comment on the effect any unusual values would have on the model.
Anything else Variation in scatter?
Anything else Variation in scatter?
Constant versus non-constant scatter relates to assumptions of the linear regression i.e. that the residuals are normally distributed.
Anything else Variation in scatter?
Variation in scatter is relevant when discussing precision of predictions.
Anything else Variation in scatter?
Refer to visual aspects of the display and keep it contextual.
For example: As the values of x increase the amount (or degree) of variation in y tends to increase (for fanning out)
Prediction
Try to use relevant values of x. In this case any RWB value from 81mm to 86mm is sensible.
Prediction
Linear TrendRWM = 0.77 * RWB + -8.72
Summary for Island = 1 Linear TrendRWM = 0.48 * RWB + 19.19
Summary for Island = 2 Linear TrendRWM = 0.46 * RWB + 14.37
Use all three models with RWB = 85mm (all, NI only,
SI only)
Using RWB = 85mm All points: RWM = 0.77 x 85 – 8.72 = 56.73 NI dolphins: RWM = 0.48 x 85 + 19.19 = 59.99 SI dolphins: RWM = 0.46 x 85 + 14.37 = 53.47
Linear TrendRWM = 0.77 * RWB + -8.72
Summary for Island = 1 Linear TrendRWM = 0.48 * RWB + 19.19
Summary for Island = 2 Linear TrendRWM = 0.46 * RWB + 14.37
Interpret in context, with units.
Linear TrendRWM = 0.77 * RWB + -8.72
Summary for Island = 1 Linear TrendRWM = 0.48 * RWB + 19.19
Summary for Island = 2 Linear TrendRWM = 0.46 * RWB + 14.37
Use sensible rounding.
Using RWB = 85mm All points: RWM = 0.77 x 85 – 8.72 = 56.73 NI dolphins: RWM = 0.48 x 85 + 19.19 = 59.99 SI dolphins: RWM = 0.46 x 85 + 14.37 = 53.47
Linear TrendRWM = 0.77 * RWB + -8.72
Summary for Island = 1 Linear TrendRWM = 0.48 * RWB + 19.19
Summary for Island = 2 Linear TrendRWM = 0.46 * RWB + 14.37
Dangers – predicting outside the range of observed x-values.
Linear TrendRWM = 0.77 * RWB + -8.72
Summary for Island = 1 Linear TrendRWM = 0.48 * RWB + 19.19
Summary for Island = 2 Linear TrendRWM = 0.46 * RWB + 14.37
How good is a prediction?
Linear TrendRWM = 0.77 * RWB + -8.72
Summary for Island = 1 Linear TrendRWM = 0.48 * RWB + 19.19
Summary for Island = 2 Linear TrendRWM = 0.46 * RWB + 14.37
A prediction is an estimate so I need to consider bias and precision.
How good is a prediction?
Linear TrendRWM = 0.77 * RWB + -8.72
Summary for Island = 1 Linear TrendRWM = 0.48 * RWB + 19.19
Summary for Island = 2 Linear TrendRWM = 0.46 * RWB + 14.37
Bias:
If a model is good, then a prediction is likely to be accurate
How good is a prediction?
Linear TrendRWM = 0.77 * RWB + -8.72
Summary for Island = 1 Linear TrendRWM = 0.48 * RWB + 19.19
Summary for Island = 2 Linear TrendRWM = 0.46 * RWB + 14.37
The number of data points used to form the model is relevant.
How good is a prediction?
Linear TrendRWM = 0.77 * RWB + -8.72
Summary for Island = 1 Linear TrendRWM = 0.48 * RWB + 19.19
Summary for Island = 2 Linear TrendRWM = 0.46 * RWB + 14.37
Precision – Relates to the degree of scatter
Don’t relate a prediction to observed y-values.
Statistical enquiry cycle (PPDAC)
Communicating findings in a conclusion Each component of the cycle must be
communicated The question(s) must be answered
Summary Basic principles
• Each component
• Context
• Visual aspects Higher level considerations
• Justify
• Extend
• Reflect
Other issues (if time)
• The use or articles or reports to assist contextual understanding
• How to develop understanding of outliers on a model
• The place of residuals and residual plots
• Is there a place for transforming variables?