Bst322week1
-
Upload
howie-real -
Category
Data & Analytics
-
view
135 -
download
0
description
Transcript of Bst322week1
![Page 1: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/1.jpg)
Week 1
![Page 2: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/2.jpg)
Part 1
Introduction to the Course
The Nature of Data
![Page 3: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/3.jpg)
Why Statistics?
• Evidence-based practice!
• Research provides evidence for changes in nursing/medical practice– Away from “that’s the way it has always been
done”
![Page 4: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/4.jpg)
Integral to Research
• Question (hypothesis)
• Design
• Data collection
• Analysis
• Answer to question– And often more questions asked!
![Page 5: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/5.jpg)
Data
• Data: – Factual information, especially information
organized for analysis or used to reason or make decisions; a fact or proposition used to draw a conclusion or make a decision
• The American Heritage® Dictionary of the English Language, Fourth EditionCopyright © 2000 by Houghton Mifflin Company.
– Datum: Singular of data• an item of factual information derived from
measurement or research
![Page 6: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/6.jpg)
Two Types of Data
• Qualitative– Non-numeric or narrative information
• Example: transcripts of interviews• Maybe “scored” to be made quantitative
• Quantitative– Numeric or quantifiable information
• Example: weights of kindergartners
![Page 7: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/7.jpg)
Variable
• A quantity capable of assuming a set of values
• A characteristic or attribute of a person, object, etc that varies within a population under study
• Examples:– Body temperature, BP, DOB, ABG, weight
![Page 8: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/8.jpg)
Independent and Dependent
• Independent– The variable assumed to influence the
outcome• It is independent of the outcome
– In research, the manipulated variable
• Dependent– The outcome variable of interest– In research, value assumed to be dependent
on the independent variable (by hypothesis)
![Page 9: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/9.jpg)
Independent and Dependent
• Examples:– What is the effect of smoking on the incidence
of lung cancer?– Does high fiber diet reduce the risk of colon
cancer?– Does AZT help prevent maternal transmission
of HIV?
![Page 10: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/10.jpg)
Discrete vs Continuous
• Discrete variable: has a finite number of values between two points
• Continuous variable: has, in theory, an infinite number of values between two points
![Page 11: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/11.jpg)
Discrete vs Continuous
• Examples:– Number of children– Body temperature– Hospital readmissions– Chemotherapy sessions– Body weight– DOB
![Page 12: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/12.jpg)
Measurement
• The assignment of numbers to objects according to specified rules to characterize quantities of some attribute
![Page 13: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/13.jpg)
Measurement Rules
• Common/familiar/accepted– Temperature, weight, height
• Researcher designed– Particularly for new materials/ideas
• Coding– The process of transforming raw data into
standardized form for processing and analysis
![Page 14: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/14.jpg)
Advantages of Measurement
• Objectivity– Objective measure can be independently
verified by other researchers
• Precision– Quantitative measures allow for reasonable
precision
• Communication– Facilitates communication of data and
research
![Page 15: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/15.jpg)
Levels of Measurement/Types of Variables
• Nominal
• Ordinal
• Interval
• Ratio
![Page 16: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/16.jpg)
Nominal Measurement/Variable
• Nominal = Named
• Lowest level
• Assignment of characteristics into categories– Simply putting into boxes with no meaning of
where the boxes fall in a line
• Examples– Gender, marital status
![Page 17: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/17.jpg)
Ordinal Measurement/Variable
• Ordinal=Order
• Next in the hierarchy of measurement
• Involves rank order of variable along some dimension
• Examples– School grades– Clinical nursing levels
![Page 18: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/18.jpg)
Interval Measurement/Variable
• Interval=equal distances
• Attribute is rank-ordered on a scale that has equal distances between points on that scale
• Examples– Temperature
![Page 19: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/19.jpg)
Ratio Measurement/Variables
• Equal distances between score units and which has a true, meaningful zero point– A true ratio can be calculated
• The highest level of measurement
• Examples– Weight– Pulse
![Page 20: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/20.jpg)
Why care about type of measurement/variable?
• Statistical tests are/have been developed to work and provide meaningful analysis for specific types of measurement and variable
• The tests you choose to run should be based, in part, on the type of variables with which you work
![Page 21: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/21.jpg)
Which measurement?
• A single variable may be measurable using different types of measurement
• Rule of Thumb: use the highest level of measurement possible– Higher levels provide more information– Higher levels can be analyzed with more
powerful statistical tools
![Page 22: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/22.jpg)
Data Analysis
• Data starts out “raw”– unanalyzed
• Processing– Coding, if appropriate– Data entry
• Into database or matrix
– Cleaning• Finding and correcting (if possible) errors in entry
and coding
– Analysis
![Page 23: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/23.jpg)
Sample vs Population
• Sample– A subset of a population – Ideally selected to be representative of the
population
• Population– The entire set of individuals (objects, units,
etc) having common characteristics
![Page 24: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/24.jpg)
Two Types of Statistics
• Descriptive– Used to describe and summarize data set– Allows us to describe, compare, determine a
relationship– Usually straightforward - %, averages, etc
• Inferential– Permit us to infer whether a relationship
observed in a sample is likely to occur in the population of concern
– Are relationships “real”?
![Page 25: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/25.jpg)
Uses of Inferential Statistics
• Draw conclusions about a single variable in a population
• Evaluate relationships between variables in populations
• Are the relationships “real”?
![Page 26: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/26.jpg)
Inferential Stats: Relationships
• Existence– Is there a relationship between X and Y?
• Magnitude– How strong is the relationship between X and
Y?
• Nature– What type of relationship is there between X
and Y?
![Page 27: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/27.jpg)
Number of variables…
• “Univariate”– One variable being described
• “Bivariate”– Two variables being compared
• NOTE: in epidemiology, this is also known as “univariate”
• Mulitvariate– More than two variables being compared
• Different statistical tests for each
![Page 28: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/28.jpg)
Purposes of Data Analysis
• In research all usually get done to some extent– Clean data– Sample description– Assessment of bias– Evaluation of tools used to collect data– Evaluation of need for data transformations– Address the research question
![Page 29: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/29.jpg)
Describing the Data Set
• Organize the data
• Examine the patterns of distribution
• Describe patterns of distribution
• Asses the variability of the data
![Page 30: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/30.jpg)
Simplest Distribution: The Frequency Distribution
• Lists categories of scores or values as well as counts of the number of each score or value– List and tally– By computer
• Enter data• Run “frequency”
![Page 31: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/31.jpg)
Two Kinds of Frequency
• Absolute– Number of times a score occurs– Symbol: f
• Relative– Proportion of times a score occurs– Most commonly percent
• % = (f/N) X 100– f=frequency, N=sum of all frequencies
![Page 32: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/32.jpg)
Frequency Example: Blood Pressure (mm Hg) Readings in
an Anti-Hypertensive Trial – Raw Data
166 160 166 162 168 148 164 174 164 188
176 170 166 172 168 172 150 190 164 150
164 146 178 154 166 148 156 164 180 166
172 170 180 156 162 176 184 166 174 158
186 158 166 170 168 178 178 154 166 152
168 160 168 166 152 160 170 146 186 176
n=60
![Page 33: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/33.jpg)
Frequency Distribution
x f x f x f
146 1 162 4 178 2
148 2 164 5 180 2
150 2 166 9 182 2
152 2 168 5 184 1
154 2 170 4 186 2
156 2 172 3 188 1
158 2 174 2 190 1
160 3 176 2 192 0n=60
![Page 34: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/34.jpg)
Relative Frequency (rf) Distribution
x rf x rf x rf146 0.03 162 0.03 178 0.05
148 0.03 164 0.08 180 0.03
150 0.03 166 0.15 182
152 0.03 168 0.08 184 0.02
154 0.03 170 0.07 186 0.03
156 0.03 172 0.05 188 0.02
158 0.03 174 0.03 190 0.01
160 0.05 176 0.05 192
n=60
![Page 35: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/35.jpg)
Cumulative Relative Frequency (Cf) Distribution
x Cf x Cf x Cf146 0.03 162 0.32 178 0.88
148 0.07 164 0.40 180 0.91
150 0.10 166 0.55 182 0.91
152 0.13 168 0.63 184 0.93
154 0.17 170 0.70 186 0.96
156 0.20 172 0.75 188 0.98
158 0.23 174 0.78 190 1
160 0.28 176 0.83 192
n=60
![Page 36: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/36.jpg)
Grouped Frequency Distribution
• Values are grouped into intervals– Class intervals are all the same size– Class intervals are mutually exclusive– Useful when data is dispersed
• Or there are restrictions on “small cell size”– For example: HIV/AIDS reporting
– Loss of information with grouping• Anytime one moves from the individual level to
group level
![Page 37: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/37.jpg)
Grouped Frequency Distribution
Interval f
<150mm Hg 4
150-158mm Hg 10
160-168mm Hg 24
170-178mm Hg 15
180-188mm Hg 6
≥188mm Hg 1n=60
![Page 38: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/38.jpg)
Displaying Data
• Tables
• Bar graphs
• Pie charts
• Histograms
• Frequency Polygons– aka Line charts/graphs
![Page 39: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/39.jpg)
Bar Graph
• Used primarily for nominal and ordinal data
• Values across the X axis
• Frequencies along the Y axis
![Page 40: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/40.jpg)
0
1
2
3
4
5
6
7
8
9
10
146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190
mm Hg
num
ber o
f sub
ject
s
Bar Graph of Hypertension Data
(Generated in Excel)
![Page 41: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/41.jpg)
Histogram
• Like a bar graph
• Used for continuous (interval or ratio) data– Rarely seen even for interval or ratio data– Not offered as an option in Excel
• Bars touch
• May use grouped data
![Page 42: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/42.jpg)
Bar Chart and Histogram
150 160 170 180 190
BP1
0
2
4
6
8
10
12
14
Fre
qu
en
cy
Mean = 166.43Std. Dev. = 10.692N = 60
146
148
150
152
154
156
158
160
162
164
166
168
170
172
174
176
178
180
184
186
188
190
BP1
0
2
4
6
8
10
Cou
nt
(Generated in SPSS)
![Page 43: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/43.jpg)
Frequency Polygon (aka Line Graph)
• Used for interval and ratio data
• X and Y axes the same as for bar charts
• Marker placed at intersection of the value and frequency for a series of values
• Markers then connected with a line
![Page 44: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/44.jpg)
Frequency Polygon of Hypertension Data
0
1
2
3
4
5
6
7
8
9
10
146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190
mm Hg
nu
mb
er o
f su
bje
cts
![Page 45: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/45.jpg)
Effective Graphical Display
• Should accurately represent data
• Should be easily understood– Not too busy or complicated
• Should stand alone– Ideal and rare
![Page 46: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/46.jpg)
Distribution Shapes – 5 Basics
• Modality
• Symmetry and Skewness
• Kurtosis
• Central Tendency
• Variability
![Page 47: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/47.jpg)
Modality – Basic Shape
• Peaks or high points in the data
• May have one or multiple peaks– Unimodal = 1 peak– Bimodal = 2 peaks– Multimodal = multiple
peaks
![Page 48: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/48.jpg)
Symmetry
• Symmetrical– If you draw a line through the center it
produces mirror images– In real life: approximately the same
distribution on either side of the center line
• Asymmetrical– Distribution is lopsided or skewed
![Page 49: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/49.jpg)
Asymmetric Distribution: Skewness
• Affected by outliers• Positive
– The “tail” points to the right (positive direction)
• Negative– The “tail” points to the
left (negative direction)
![Page 50: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/50.jpg)
Kurtosis
• Assumes symmetric distribution
• Refers to how pointy the peak of the distribution is– How concentrated in the
middle of the distribution
• Platykurtic– Low, flattened peak
• Leptokurtotic– High narrow peak
![Page 51: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/51.jpg)
The Normal Distribution
• Unimodal
• Symetrical
• Peak is neither high nor flat
• “Bell-shaped curve”
• The ideal distribution– And therefore “normal”
![Page 52: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/52.jpg)
Looking at Frequency Distributions
• Learn about the data set
• Clean the data
• Identify missing values
• Test assumptions– About the distribution
• Answer research questions– About the distribution
![Page 53: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/53.jpg)
Quartiles
• Calculated by dividing data into quarters– The median is the 2nd quartile
• Quartile 1 is the point at which 25% of values are below and 75% of values are above
• Quartile 3 is the point at which 75% of values are below and 25% of values are above
![Page 54: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/54.jpg)
Part 2
Describing and Displaying Data
Measures of Central Tendency
Univariate Statistics
![Page 55: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/55.jpg)
Measures of Central Tendency
• Tells you about the area of the distribution where the bulk of values fall
• Measures include:– Mean– Median– Mode
![Page 56: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/56.jpg)
Mode
• The value that occurs most often
• Limitations– Data may be multimodal– Mode can vary from one sample to another in
the same population• Considered unstable
![Page 57: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/57.jpg)
Median (Mdn)
• Point that divides the distribution in half
• Corresponds to the 50th percentile
• 50% will be below the median and 50% above it
• If the number of scores is odd– Median is the number exactly in the middle
• If the number of scores is even– Median is the average of the 2 middle numbers
![Page 58: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/58.jpg)
Median (Mdn)
• Measures the location of the middle of the distribution
• Not sensitive to actual numerical values– Not affected by outliers
![Page 59: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/59.jpg)
Mean
• Most common measure of central tendency• Most stable, provides the most accurate
estimate– Assuming a normal distribution
• Calculated by adding all values and dividing by the number of cases– aka average– Best understood by the general public
![Page 60: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/60.jpg)
Mean
• Affected by each value in the distribution
• Intended for interval or ratio data– In some designs can be used for ordinal
• The sum of the deviation scores from the mean always equals 0
• Abbreviated x for samples– X for Population
![Page 61: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/61.jpg)
Mean, Median, and Mode
• Mean is preferred in a normal distribution– Extreme scores or outliers can result in a
mean that doesn’t reflect central tendency• Skewed data
• With skewed data, or extreme outliers use median– Example: Median home price
![Page 62: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/62.jpg)
x, Mdn, Mode – Hypertension Study
Sum of values (x) = Σ(x) = 9,989 Number of cases = n = 60
x f x f x f146 1 162 4 178 2
148 2 164 5 180 2
150 2 166 9 182 2
152 2 168 5 184 1
154 2 170 4 186 2
156 2 172 3 188 1
158 2 174 2 190 1
160 3 176 2 192 0
Mode = 166Mdn = 166
Mean = 9991/60 = 166.5
![Page 63: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/63.jpg)
Quickly Assessing Distribution
• If the mean, median, and mode are similar– Approximately normally distributed
• If the median>mean– Negatively skewed
• If the median<mean– Positively skewed
• The mean is pulled in the direction of the skew
![Page 64: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/64.jpg)
Effect of Skew
ModeMode MedianMedian
MeanMean
Mode Mode
MedianMedian
MeanMean
The mean is pulled in the directionThe mean is pulled in the directionof the skewof the skew
![Page 65: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/65.jpg)
Variability
• Refers to how spread out the scores are in a distribution
• Two distributions with the same mean can differ greatly in variability– Homogeneous: values are similar– Heterogeneous: values with more variability
• Measures:– Range/Semiquartile Range– Variance– Standard Deviation
![Page 66: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/66.jpg)
Variability – Spread or Dispersion
0
1
2
3
4
5
6
7
8
9
10
146 148 150 152 154 156 158 160 162 164 166 168 170 172 174 176 178 180 182 184 186 188 190
mm Hg
nu
mb
er o
f su
bje
cts
![Page 67: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/67.jpg)
Range
• Simplest of the measures of variability
• Difference between the lowest and highest values in a distribution– 190-146 = 44
• Sometimes reported as a minimum and maximum value– Range 146-190
![Page 68: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/68.jpg)
Range
• Limitations– Based on only 2 values, highest and lowest
• Can be unstable when multiple samples are taken from the same population
• Doesn’t tell you anything about what is happening in the middle
– As the sample size increases, range is likely to increase
• Greater chance of outlier
![Page 69: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/69.jpg)
Standard Deviation - s, SD, Std Dev
• A measure of how far values vary from the mean of a given sample– Tells you the average deviation – How much the scores deviate from the mean
• Most widely used measure of variability
• Takes into consideration every score in the distribution
![Page 70: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/70.jpg)
Standard Deviation
Standard deviation = s = Σ (X-mean)2
N-1
X X-mean (X-mean)2
7 7-4=3 95 5-4=1 14 4-4=0 03 3-4= -1 11 1-4= -3 9
N=5 Σ 20Mean=4
s= 205-1
= √ 5 = 2.24
![Page 71: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/71.jpg)
Standard Deviation
• Taking the square root returns the value of the standard deviation to the original scale
• The lower the standard deviation, the better measure the mean is as a summary of the data– The less variability there is among the scores
![Page 72: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/72.jpg)
Variance
• Simply s2
• The standard deviation calculation before the square root is taken and is equal to:
Σ (X-mean)2 /N-1
![Page 73: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/73.jpg)
Standard Deviation - uses
• Useful when looking at a single score in relation to a distribution
• Normal Distribution– There are about 3 SD above and below the
mean– A fixed percent of scores lie within each SD:
• 68% within 1 SD above and below the mean• 95% within 2 SD above and below the mean• 99.7% within 3 SD above and below the mean
![Page 74: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/74.jpg)
Normal Distribution with SD = 15and mean = 100
2.5% 13.5% 34% 2.5%13.5%34%
68%
95%
1008570 115 130
![Page 75: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/75.jpg)
Standard Scores
• Scores that represent relative distance from mean. Measures of position.
Z= X-X/SD
• Raw score minus mean divided by SD: gives score in SD units
• Z score is # of SDs a given value of ‘X’ is away from mean. Z score of 1 is 1 SD above mean.
• Z Distribution has mean = 0 and SD = 1
![Page 76: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/76.jpg)
Standard Scores – Z scores
• Allow for the standardization (in SD units) of values in a distribution relative to the mean
• Standard Score Z = (x-x)/SD
• Number of SD a given value of x is from the mean– Z score of 1 is 1 SD above the mean
• Z distribution has mean=0 and SD=1
![Page 77: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/77.jpg)
55 60 65 85 100 115
-1 0 +1
Score - Mean --------------------- SD
Z Distribution
Score - Mean --------------------- SD
Z Score = Z Score =
![Page 78: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/78.jpg)
Normal Z Distribution (Mean = 0, SD = 1)
2.5% 13.5% 34% 2.5%13.5%34%
68%
95%
0-1-2 +1 +2
![Page 79: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/79.jpg)
Normal Distribution/Z Scores
• The entire percent under the curve is 100%– Probability of being somewhere under the
curve is 100%
• Most values will lie in the middle
• Out at the ends we become less sure– Is a value out at 1% really representative?
![Page 80: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/80.jpg)
Using Normal Distributions/Z Scores
• Transformation– The z score can be transformed to reset the
mean and SD• Transformed Z = 10(Z) + 50
– Now mean = 50 and SD=10
• P-value– Likelihood of a given value falling at a
particular point on the curve– We will come back to this
![Page 81: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/81.jpg)
Using Normal Distributions/Z Scores
• Z score can tell you the probability of a value falling into a given area of the curve– Get z score– Match to %
• Z= 2 corresponds to 95%
– Gives the probability of the value being the true mean
– Z-score tables
![Page 82: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/82.jpg)
Parameters vs Statistics
• Parameters– Computed for populations– Greek symbols used
• μ =mean, σ = std dev
• Statistics– Computed for samples– English symbols used
• X =mean, s = std dev
![Page 83: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/83.jpg)
Computers and Measures of Central Tendency
• Statistical software is in widespread use but…
• The operator (you) must be aware of levels of measurement etc– The computer doesn’t know– Have to choose the right method for type of
data
![Page 84: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/84.jpg)
“Bivariate” Statistics
![Page 85: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/85.jpg)
“Bivariate” Statistics
• Used to describe the relationship between 2 variables (bi-variate)– 2 nominal variables– 1 nominal, 1 ratio/interval– 2 ratio/interval
![Page 86: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/86.jpg)
Crosstabulation
• Results in a contingency table– 2 dimensional frequency distribution
• The simplest: 2X2– 2 nominal or ordinal variables
• One heading columns• One heading rows
![Page 87: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/87.jpg)
Crosstabulation - Example
High School
College total
<$30,000 64 19 83
≥$30,000 36 81 117
total 100 100 200
![Page 88: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/88.jpg)
Comparison of Group Means
• Nominal & Interval or Ratio Variable
• IV: nominal or ordinal– Sex, ethnicity, age group etc
• DV: interval or ratio– Heart rate, BP, weight etc
• Means and SD calculated for each category of the IV
• NOTE: NO INFERENCE is made about significance of difference between categories
![Page 89: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/89.jpg)
Comparison of Group Means
education n mean SD Min Max
High School 100 24,657 2,598 10,103 75,362
College 100 36,431 7,912 15,256 126,754
total 200 31,989 6,110 10,103 126,754
![Page 90: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/90.jpg)
Correlation
• A linear relationship between 2 variables– Interval or ratio variables
• Can be plotted and displayed graphically– Scatter plot
• Can be calculated statistically– Correlation coefficient– r
![Page 91: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/91.jpg)
Scatterplot
• Values for one variable on X axis
• Values for the other variable on Y axis
• Data plotted for each subject/case
• Examine the plot for pattern– Data arrayed closely together indicates strong
correlation
![Page 92: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/92.jpg)
Positive Correlation
• As one variable increases in value, so does the other
• On the plot:– Diagonal line upwards
and to the right
• Example:– Age and BP
![Page 93: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/93.jpg)
Negative Correlation
• As one value increases the other decreases
• On the plot:– Diagonal line down
and to the right
• Example:– Age and bone density
![Page 94: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/94.jpg)
Scattered Scatterplots
• Indicate little or no relationship between variables
• Can be dispersed or concentrated
![Page 95: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/95.jpg)
Non-linear Scatterplots
• There is a relationship but…
• Some relationships are not linear….
• May be curved– S– U– Up then flat– others
![Page 96: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/96.jpg)
Outliers on Scatterplots
• Scatterplots can also help identify where outliers are
![Page 97: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/97.jpg)
Correlation Coefficient - r
• Statistical measure used to– Determine if a relationship exists between two
variables– Test a hypothesis about that relationship
• Allows us to make a mathematical statement about the relationship– Do the variables vary together?
• AKA Pearson Correlation Coefficient
![Page 98: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/98.jpg)
Correlation Coefficient - Assumptions
• Sample must accurately representative
• The distributions must be approximately normal
• Each value of X must have a corresponding value of Y– If many have X value but not Y value, analysis
will be strongly biases
![Page 99: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/99.jpg)
Correlation
r = n(∑xy) – (∑x)(∑y)
[n(∑x2) - (∑x)2 ] [n(∑y2) - (∑y)2 ]
= cov(X,Y)
var(X) x var(Y)
![Page 100: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/100.jpg)
Correlation Examplex y xy x2 y2
8 -2 -16 64 4
4 2 8 16 4
5 1 5 25 1
-1 6 -6 1 36
1 4 4 1 16
2 3 6 4 9
6 -1 -6 36 1
x=25 y=13 y=-5 x2=147 y2=71
r= 7(-5) – (25)(13)
[7(147)-(25)2] x [7(71)-(13)2]
= -0.989
![Page 101: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/101.jpg)
Correlation Coefficient
• Range– -1 to 1
• Positive correlation– 0 to1
• Negative correlation– -1 to 0
• The closer to each of these the stronger the correlation– -0.9: strong negative– -0.2: weak negative– 0: none– 0.2: weak positive– 0.9 strong positive
![Page 102: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/102.jpg)
Correlation Coefficient – Significance
• Depends on number of pairs
• Varies for each r– r of 0.3 may be significant for n=1500 but not
for n=40
• Also depends on variance (SD)– Greater the variance, less significance
• Generally:– 0.60 or –0.60 is strong for medical variables
• Manufacturing requires 0.90 or greater
![Page 103: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/103.jpg)
The Scatterplot and r
• ALWAYS look at the scatterplot along with r
• Each of these plots has r=0.70
![Page 104: Bst322week1](https://reader035.fdocuments.us/reader035/viewer/2022062514/558c4509d8b42adc348b45fe/html5/thumbnails/104.jpg)
Correlation
• The square of the correlation coefficient, R2, indicates the variability in one variable that can be explained by the other– Example: age and BP
• R2 = 0.49 (r=0.70)• 49% of the variation in BP is explained by age
– aka Coefficient of Determination
• Correlation does NOT imply causation