Introduction to statistics RSS6 2014
-
Upload
rss6 -
Category
Health & Medicine
-
view
218 -
download
2
Transcript of Introduction to statistics RSS6 2014
![Page 1: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/1.jpg)
Introduction to Statistics
Amr Albanna, MD, MSc
![Page 2: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/2.jpg)
Content
• Scales of Measurement – Categorical Variables – Numerical Variables:
• Displays of Categorical Data – Frequencies – Bar Graph – Pie Chart
• Numerical Measures of Central Tendency – Mean – Median – Mode
• Numerical Measures of Spread • Association • Correlation • Regression
![Page 3: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/3.jpg)
Scales of Measurement
• Categorical Variables: – Nominal: Categorical variable with no order (e.g. Blood
type A, B, AB or O). – Ordinal: Categorical, but with an order (e.g. Pain: “none",
“mild", “moderate", or “severe").
• Numerical Variables:
– Interval: Quantitative data where differences are meaningful (e.g. Years 2009 -2010.). Here differences are meaningful; ratios are not meaningful.
– Ratio: Quantitative data where ratios are meaningful (e.g. weights, 200 lbs is twice as heavy as 100 lbs).
![Page 4: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/4.jpg)
Categorical Variables
• Displays of Categorical Data
– Frequencies
– Bar Graph
– Pie Chart
![Page 5: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/5.jpg)
Categorical Variables Variable (Sex) Frequency Proportion
Male 609 0.61
Female 391 0.39
Total 1000 100
0
100
200
300
400
500
600
700
Male Female
Bar Graph Pie Chart
![Page 6: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/6.jpg)
Bar Graph
![Page 7: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/7.jpg)
Numerical Variables
Central Tendency
Numerical Spread
![Page 8: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/8.jpg)
Measures of Central Tendency
• The 3 M's
– Mean
– Median
– Mode
![Page 9: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/9.jpg)
Measures of Central Tendency
Sample Mean
The sample mean, 𝑥 , is the sum of all values in the sample divided by the total number of observations, n, in the sample.
𝑥 = 𝑥𝑖𝑛𝑖=1
𝑛
![Page 10: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/10.jpg)
Example: Sample Mean
• Mean systolic blood pressure
Scenario 1:
Mean = (120 + 135 + 115 + 110 + 105 + 140)/6 =121
Subjects BP
1 120 (x1)
2 135 (x2)
3 115 (x3)
4 110 (x4)
5 105 (x5)
6 140 (x6)
![Page 11: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/11.jpg)
Sample Mean
• The mean is affected by extreme observations and is not a resistant measure.
Scenario 2:
Mean = (120 + 135 + 115 + 110 + 105 + 140 + 280)/7 =144
Subjects BP
1 120 (x1)
2 135 (x2)
3 115 (x3)
4 110 (x4)
5 105 (x5)
6 140 (x6)
7 280 (x7)
![Page 12: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/12.jpg)
Median
• The sample median, M, is the number such that “half" the values in the sample are smaller and the other “half" are larger.
• Use the following steps to find M. – Sort the data (arrange in increasing order).
– Is the size of the data set n even or odd?
– If odd: M = value in the exact middle.
– If even: M = the average of the two middle numbers.
![Page 13: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/13.jpg)
Example: Sample Median
• Median systolic BP: Scenario 1: 120 : 135 : 115 : 110 : 105 : 140 Median = (115 + 110) /2 = 112.5 Scenario 2: 120 : 135 : 115 : 110 : 105 : 140 : 280 Median = 110
• The median is not affected by extreme observations and is a resistant measure.
![Page 14: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/14.jpg)
Mode
• The sample mode is the value that occurs most frequently in the sample (a data set can have more than one mode).
• This is the only measure of center which can also be used for categorical data.
• The population mode is the highest point on the population distribution.
![Page 15: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/15.jpg)
Symmetric Data Distribution
0
1
2
3
4
5
6
10 20 30 40 50
Fre
qu
en
cy
Value
![Page 16: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/16.jpg)
Rightward Skewness of Data
0
1
2
3
4
5
6
10 20 30 40 50
Mode
Fre
qu
en
cy
Value
Median Mean
![Page 17: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/17.jpg)
Leftward Skewness of Data
0
1
2
3
4
5
6
10 20 30 40 50
Mean Median Mode
Value
Fre
qu
en
cy
![Page 18: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/18.jpg)
Numerical Measures of Spread
• Range
• Sample Variance
• Inter Quartile Range (IQR)
![Page 19: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/19.jpg)
Numerical Measures of Spread
Range: The range of the data set is the difference between the highest value and the lowest value.
– Range = highest value - lowest value
– Easy to compute BUT ignores a great deal of information.
– Obviously the range is affected by extreme observations and is not a resistant measure.
![Page 20: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/20.jpg)
Numerical Measures of Spread
• Variance: equal to the sum of squared deviations from the sample mean divided by n - 1, where n is the number of observations in the sample.
![Page 21: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/21.jpg)
Numerical Measures of Spread
• Percentile: The percentile of a distribution is the value at which observations fall at or below it.
![Page 22: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/22.jpg)
Numerical Measures of Spread
• The most commonly used percentiles are the quartiles.
1st quartile Q1 = 25th percentile.
2nd quartile Q2 = 50th percentile.
3rd quartile Q1 = 75th percentile.
![Page 23: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/23.jpg)
Numerical Measures of Spread
Inter Quartile Range (IQR)
A simple measure spread giving the range covered by the middle half of the data is the (IQR) defined below.
IQR = Q3 - Q1
The IQR is a resistant measure of spread.
![Page 24: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/24.jpg)
Numerical Measures of Spread
Outliers: extreme observations that fall well outside the overall pattern of the distribution.
• An outlier may be the result of a
– Recording error,
– An observation from a different population,
– An unusual extreme observation (biological diversity)
![Page 25: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/25.jpg)
Numerical Measures of Spread
![Page 26: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/26.jpg)
Association Between Variables
• Explanatory (exposure) variable “X”
• Response (outcome) variable “Y”
![Page 27: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/27.jpg)
Association Between Variables
![Page 28: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/28.jpg)
Association Between Variables
![Page 29: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/29.jpg)
Association Between Variables
![Page 30: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/30.jpg)
Measurement of Correlation
![Page 31: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/31.jpg)
Correlation is NOT Association
![Page 32: Introduction to statistics RSS6 2014](https://reader034.fdocuments.us/reader034/viewer/2022052508/5597de441a28ab58388b467a/html5/thumbnails/32.jpg)
Regression