Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric...
Transcript of Computers(and softwares) are - Lunds universitet · Measuresof dispersion • Symmetric...
Biostatistics for biomedical professionBIMM34
Karin Källen & Linda Hartman
November-December 2015
2015
-11-
02
1
Who needs a course in biostatistics?• - Anyone who uses quntitative methods to interpret biological
processes.
2015
-11-
02
2
But really….is it necessary withtodays advanced computersand statistical packages?
Now more than ever!
Computers (and softwares) aredumb like stones!
They just do what we tell themto.
2015
-11-
02
3
• An imaginary 2 x 2 table:
2015
-11-
02
4
• Correct hypothesis correct basic design
• Poorly specified hypothesis poor design
• Adequate methods
• Sub-optimal methods
• An imaginary 2 x 2 table:
2015
-11-
02
5
• Correct hypothesis correct basic design
• Porly specified hypotesis poor design
• Adequate methods
• Sub-optimal methods
• The draw-back of studieswith these characteristics
will be detected.
• An imaginary 2 x 2 table:
2015
-11-
02
6
• Correct hypothesis correct basic design
• Poorly specified hypothesis poor design
• Adequate methods
• Sub-optimal methods
• In a good design the useof non-optimal statistical methods could bias the results but the effects
are seldom strong.
• The draw-back of studieswith these characteristics
will be detected.
• An imaginary 2 x 2 table:
2015
-11-
02
7
• Correct hypothesis correct basic design
• Poorly specified hypothesis poor design
• Adequate methods
• Over-belief in sophisticated statistical methods is depressingly
common. Could be difficult to detect.
• Sub-optimal methods
• In a good design the useof non-optimal statistical methods could bias the results but the effects
are seldom strong.
• The draw-back of studieswith these characteristics
will be detected.
The objectives of the currentcourse in biostatistics• An imaginary 2 x 2 table:
2015
-11-
02
8
• Correct hypothesis correct basic design
• Poorly specified hypothesis poor design
• Adequate methods
• Utilization of braincapacity when designing
knowlege of basicstatistical methods
correct interpretation ofthe results.
• Over-belief in sophisticated statistical methods is depressingly
common. Could be difficult to detect.
• Sub-optimal methods
• In a good design the useof non-optimal statistical methods could bias the results but the effects
are seldom strong.
• The draw-back of studieswith these characteristics
will be detected.
Statistics
Population
Sample
Probability
Inferential statistics
Descriptivestatistics
2015
-11-
02
9
Statistics
• Descriptive statistics
Methods to summarize (the variables in) a sample
• Summary measures
• Graphical methods
• Inferential statistics
Methods to learn about the population that the sample is drawn from
• Effect measures (w confidence intervals)
• Tests (ttest chi2-test Mann-Whitney …)
• Regression modeling
2015
-11-
02
10
Today:Basic• numerical
summaries of data• graphical
summaries of data
Types of data
Categorical Quantitative
Binary/dichotomous
Nominal Discrete ContinuousOrdinal
2 categories
>=2 categories
Order mattersOnly wholenumbers as values
Data thatcan take anyvalue
2015
-11-
02
11
Types of data - exercise
• Categorize the following measurements in binary/nominal/ordinal/discrete/continuous
1. Blood serum bilirubin (μg/ml)
2. Hair colour (Blonde Brunette Redhead and Grays)
3. Vital status (Dead/alive)
4. BMI (kg/m2)
5. # Bacteria in a sample
6. Smoking status (Non-smoker/0-10 cigarettes per day/>10 cigarettes per day)
2015
-11-
02
12
Types of data
Binary
Categorical Quantitative
Nominal Discrete ContinuousOrdinal
Discrete variables with only a few possiblevalues are often analysed with the same methods as for ordinal variables.
Discrete variables with many possible valuesare often analysed with the same methods as for continuous variables.
2015
-11-
02
13
Summary measures & Graphical presentationChapter 2 & 3 in Norman & Streiner
Chapter 3 & 4 in Kirkwood and Sterne
2015
-11-
02
14
Graphical presentation 1: HISTOGRAM
2015
-11-
02
15
Split the data in intervals count the number (proportion) in eachinterval:
• The width of the bar tells you the interval
• The height of the bar tells youthe number (proportion) of observations in each interval
Summary measures:• Central Tendency measures
Describe a “center” around which the measurements
in the data are distributed
• Dispersion (or Variability) measures
Describe “data spread” or how far away
the measurements are from the center.
2015
-11-
02
16
Central tendency measures• Median
‐ The middle observation if data are sorted
• Mean‐
‐ The sum of the observations devided by the number of observations
• Mode‐ The most frequently occuring value
𝑿 =𝑿𝟏 + 𝑿𝟐 + ⋯+ 𝑿𝑵
𝑵=𝟏
𝑵
𝒊=𝟏
𝑵
𝑿𝒊
2015
-11-
02
17
Central tendency - exercise
Exercise 3.2 a-c
• Calculate the mean median and mode of a dataset with the following values: 4 8 6 3 4
2015
-11-
02
18
Central tendency cont.
Maternal Vitamin D:• Mean = 2.3• Median= 2.2
Child Vitamin D:• Mean = 1.4• Median = 1.2
2015
-11-
02
19
Central tendency
Mean or medianThe choice depends on the distribution of the data:
• Symmetric data • Asymmetric data • Ordinal data
Symmetric distribution Asymmetric distribution(positive skew)
2015
-11-
02
20
Central tendency
Symmetric continuous data
Maternal height:Mean=166 cm Median= 166.5 cm
Symmetric data:• Mean = median• Use the mean
2015
-11-
02
21
Central tendency
Assymetric continuous data
Vitamin D in child:Mean= 1.4Median= 1.2
Asymmetric data:• Mean ≠ median• Use the median
2015
-11-
02
22
Central tendency
Assymetric continuous dataCD16 in % of granulocytes
CD16 in % of granulocytes:Mean= 7.3Median= 4.8
2015
-11-
02
23
Central tendency
Ordinal data
(Kasner 2006)
(Hacke et al. 2008)
Use the median!
Exercise:• What is the median in the Alteplase group?• What is the median in the Placebo group?
2015
-11-
02
24
Central tendency
Nominal data
Measures of central tendency are not meaningful.
Use number of observations and proportions
2015
-11-
02
25
Barchart
Central tendency measures Summary
Type of data Central tendencymeasure
Symmetric data Mean
Asymmetric data Median
Ordinal Median
Nominal -
2015
-11-
02
26
Measures of dispersion
• Symmetric distribution – measure based on mean
• Assymetric distribution or ordinal data – measure NOT basedon the mean
A measure of dispersion refers to how closelythe data cluster around the measure of central tendency
2015
-11-
02
27
Spread/distributionSmall spread
Big spread
2015
-11-
02
28
Descibing the spread of the data
• If we look at the average diviation from the mean:
n
xxi
n
xxi
• The average diviation from the mean equals 0.
xi(xi- 𝐱)
150
152
161
177
155
160
162
158
-9.375
-7.375
1.625
17.625
-4.375
0.625
2.625
-1.375
0
X= 159,375
2015
-11-
02
29
Describing the spread of the
dataIf we square every term we solve the problem with 0,
then divide by n to get mean deviation:
n
xxi 2
1
2
n
xxi
To get a better estimate we use n-1 in the denominator
This is called the VARIANCE!
The variance is expressed in cm which is unpractical
since the mean length is expressed in cm2
150
152
161
177
155
160
162
158
-9.375
-7.375
1.625
17.625
-4.375
0.625
2.625
-1.375
(x- 𝐱)2
87.89
54.39
2.64
310.64
19.14
0.39
6.89
1.89
0 483.87
= 60.48
(x- 𝐱)2
2015
-11-
02
30
Descibing the spread of the data
1
2
n
xxs
i
By taking the square root of the variance, you
get the standard deviation (standard deviation
= SD) which has the same units as what you
measured
2015
-11-
02
31
Ex: Variance = s2 = 60.5 cm2
s = sqrt(60.48) = 7.8 cm
PercentileDescribes how many percent of the observations that lies below ex:
• 10% found below 10th percentile
• 20% found below 20th percentilen etc
Quartile
• Divide data into four equal groups;
• Lower quartile – 25th percentile
• Median – 50th percentile
• Upper quartile – 75th percentile
Q1 = (n+1)/4, Q2 = 2(n+1)/4 (Median), Q3 = 3(n+1)/4 of ordered observations
• Interquartile range (IQR) = The difference between the upper and the lower
quartiles
2015
-11-
02
32
Measures of dispersion
• Standard deviation – The mean deviation from the mean value
• Percentiles & quartiles
– Splits the data in fixed proportions
• Range – The difference between min and max
2015
-11-
02
33
Measures of dispersion -exercise
Exercise 3.2 d-e
• Calculate the standard deviation and rangefor a dataset with the following values: 4 8 6 3 4
2015
-11-
02
34
RobustnessHighly skewed data
Fig 3-10
Measure WITHOUT largestobservation
WITH largestobservation
Mean 3.9 6.1
Median 4 4
Range 5 42
SD 1.4 9.3
QL; QU 3; 5 3; 5
Robust to extreme observations
Sensitive to extreme observations
Use Median & Quartiles for skeweddata
Graphicalpresentation!
+
2015
-11-
02
35
Summary: Summary measures
Type of data Central tendencymeasure
Dispersion measure
Symmetric data Mean Standard deviation
Asymmetric data Median Percentiles (e.g. QL andQU )
Ordinal Median Percentiles
Nominal - -
2015
-11-
02
36
Graphical presentation 2: BOX-PLOT
2015
-11-
02
37
383339N =
Fiskkonsumtionsgrupp
HögMediumLåg
CB
_153 (
ng/g
lip
idvik
t)
3000
2000
1000
0
Low medium high
Fish consumption
Outlier O
Observationes more than 1.5 IQR outside the box
Extreme values *Observations more than 3 IQR outside the box
Lowest ”normal” value
Lower quartile QL
Median
Upper quartile QU
Highest ”normal” value
(Inner fence)
IQR=QU –QL = Box-length
Box-plot - exercise
2015
-11-
02
38
• How can you use the boxplot to judge if a distribution is symmetricor asymmetric?
Use the examples in yourdiscussion
Box-plot: Exercise 2
Blood pressure was mesured in 39 women:
BP=138 140 141 142 142 142 142 142 143 143 144 144 144 144 145 147 147 147 147 149 149 150 150 151 152 154 154 157 157 157 158 159 161 162 162 166 167 167 170 mmHG
(Results are sorted)
• Create a boxplot of Blood-pressure
2015
-11-
02
39
2013
-11-
04
40
Box-plot vs histogram, ex contBlood pressure was measured in 39 women:
BP=138 140 141 142 142 142 142 142 143 143 144 144 144 144 145 147 147 147 147 149 149 150 150 151 152 154 154 157 157 157 158 159 161 162 162 166 167 167 170 mmHG
2015
-11-
02
4105
10
15
Fre
qu
en
cy
130 140 150 160 170bp_before
median: 149Q1: 143Q3: 157
min: 138max: 170
IQR=Q3-Q1=14
Box-plot cont
2015
-11-
02
42
Whats wrong with Figure 3-7?
2015
-11-
02
43
Summary:‐ Types of variables (binary/nominal/ordinal/discrete/continuous)- Descriptive statistics
- Central tendency measures (mean median)- Dispersion measures (standard deviation percentiles)
- Graphical presentation- Barplot- Histogram- Boxplot
Wednesday lecture:
Subject Norman &Streiner
Kirkwood and Sterne
Normal distribution 4 5
Population, samples generalisability
6 7
Reference interval, Confidence interval
6 4.5, 6, 7