Data Analysis, Presentation, and Statistics

25
Data Analysis, Presentation, and Statistics Fr Clinic I

description

Overview Tables and Graphs Populations and Samples Mean, Median, and Standard Deviation Standard Error & 95% Confidence Interval (CI) Error Bars Comparing Means of Two Data Sets Linear Regression (LR)

Transcript of Data Analysis, Presentation, and Statistics

Page 1: Data Analysis, Presentation, and Statistics

Data Analysis, Presentation, and Statistics

Fr Clinic I

Page 2: Data Analysis, Presentation, and Statistics

Overview

• Tables and Graphs• Populations and Samples• Mean, Median, and Standard Deviation• Standard Error & 95% Confidence Interval (CI)• Error Bars• Comparing Means of Two Data Sets• Linear Regression (LR)

Page 3: Data Analysis, Presentation, and Statistics

Warning• Statistics is a huge field, I’ve simplified considerably

here. For example:– Mean, Median, and Standard Deviation

• There are alternative formulas

– Standard Error and the 95% Confidence Interval• There are other ways to calculate CIs (e.g., z statistic instead of

t; difference between two means, rather than single mean…)

– Error Bars• Don’t go beyond the interpretations I give here!

– Linear Regression• We only look at simple LR and only calculate the intercept, slope

and R2. There is much more to LR!

Page 4: Data Analysis, Presentation, and Statistics

Should I Use a Table or Graph?

• Tables– Presenting large amount of different data– Comparing multiple characteristics

• Graphs– Visual presentation quickly gives

information– Compare one or two characteristics– Showing trends

Page 5: Data Analysis, Presentation, and Statistics

TablesWater

(1)

Turbidity (NTU)

(2)

True Color (Pt-Co)

(3)

Apparent Color

(Pt-Co) (4)

Pond Water 10 13 30 Sweetwater 4 5 12

Hiker 3 8 11 MiniWorks 2 3 5 Standard 5a 15 15

a Level at which humans can visually detect turbidity

Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters

Consistent Format, Title, Units, Big FontsDifferentiate Headings, Number Columns

4 5 12

Page 6: Data Analysis, Presentation, and Statistics

Figures

11

Figure 1: Turbidity of Pond Water, Treated and Untreated

0

5

10

15

20

25

Pond Water Sweetwater Miniworks Hiker Pioneer Voyager

Filter

Turb

idity

(NTU

)

20

107

5

1

11

Consistent Format, Title, UnitsGood Axis Titles, Big Fonts

Page 7: Data Analysis, Presentation, and Statistics

Graphing Suggestions

• 1, 2, 5 rule – – Set gradations so smallest division of the axis

is a positive integer power of 10 times 1, 2, or 5.

• Huh?

• Set your scale up so that the smallest division is an integer increment.

Page 8: Data Analysis, Presentation, and Statistics

Graphing Suggestions

• Labels– All axes should be labeled– Include units on the label

• Points, lines, curves– Play around with options– Color can be your friend– Color can be your enemy

Page 9: Data Analysis, Presentation, and Statistics

Trans #1

-5000

0

5000

10000

15000

20000

-0.2 0 0.2 0.4 0.6 0.8 1

Trans #1

Page 10: Data Analysis, Presentation, and Statistics

Deflection of Beam 1 vs. applied load

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 0.25 0.5 0.75 1 1.25

Deflection (inches)

Load

(pou

nds)

Page 11: Data Analysis, Presentation, and Statistics

Deflection of Beam 1 vs. applied load

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 0.25 0.5 0.75 1 1.25

Deflection (inches)

Load

(pou

nds)

Page 12: Data Analysis, Presentation, and Statistics

Comparison of Beam Deflections

0

5000

10000

15000

20000

25000

0 0.25 0.5 0.75 1 1.25

Deflection (inches)

Load

(pou

nds)

Beam 1 Beam 2 Beam 3

Page 13: Data Analysis, Presentation, and Statistics

Comparison of Beam Deflections

0

5000

10000

15000

20000

25000

0 0.25 0.5 0.75 1 1.25

Deflection (inches)

Load

(pou

nds)

Beam 1 Beam 2 Beam 3

Page 14: Data Analysis, Presentation, and Statistics

Populations and Samples• Population

– All of the possible outcomes of experiment or observation • US population• Particular type of steel beam

• Sample– A finite number of outcomes measured or observations

made• 1000 US citizens• 5 beams

• We use samples to estimate population properties– Mean, Variability (e.g. standard deviation), Distribution

• Height of 1000 US citizens used to estimate mean of US population

Page 15: Data Analysis, Presentation, and Statistics

Mean and Median

• Turbidity of Treated Water (NTU)Mean Mean = Sum of values divided by number of = Sum of values divided by number of samples samples

= (= (1+3+3+6+8+101+3+3+6+8+10)/6 )/6 = 5.2 NTU= 5.2 NTU

Median = The middle number Median = The middle number Rank - Rank - 1 2 3 4 5 61 2 3 4 5 6Number - Number - 1 3 3 6 8 101 3 3 6 8 10

For even number of sample points, average middle twoFor even number of sample points, average middle two

= (3+6)/2 = 4.5= (3+6)/2 = 4.5

1336810

Excel: Mean – AVERAGE; Median - MEDIAN

Page 16: Data Analysis, Presentation, and Statistics

Variance

• Measure of variability– sum of the square of the deviation about the

mean divided by degrees of freedom

1n

xxs

2i2

n = number of data points

Excel: variance – VAR

Page 17: Data Analysis, Presentation, and Statistics

• Square-root of the variance• For phenomena following a Normal

Distribution (bell curve), 95% of population values lie within 1.96 standard deviations of the mean

• Area under curve is probability of getting value within specified range

Standard Deviation, s

Normal Distribution

-4 -2 0 2 4

Standard Deviation

-1.96 1.96

95%

Standard Deviations from Mean

2ss

Excel: standard deviation – STDEV

Page 18: Data Analysis, Presentation, and Statistics

• Standard deviation of mean – Of sample of size n – taken from population with standard deviation s

– Estimate of mean depends on sample selected– As n , variance of mean estimate goes down, i.e.,

estimate of population mean improves– As n , mean estimate distribution approaches normal,

regardless of population distribution

Standard Error of Mean

nssX

Page 19: Data Analysis, Presentation, and Statistics

• Interval within which we are 95 % confident the true mean lies

• t95%,n-1 is t-statistic for 95% CI if sample size = n– If n 30, let t95%,n-1 = 1.96 (Normal Distribution)– Otherwise, use Excel formula: TINV(0.05,n-1)

• n = number of data points

95% Confidence Interval (CI) for Mean

X1n%,95 stX

Page 20: Data Analysis, Presentation, and Statistics

• Show data variability on plot of mean values

• Types of error bars include:• ± Standard Deviation, ± Standard Error, ± 95% CI• Maximum

and minimum value

Error Bars

0

2

4

6

8

10

Filter 1 Filger 2 Filter 3

Filter Type

Turb

idity

(NTU

)

Page 21: Data Analysis, Presentation, and Statistics

• Standard Deviation– Demonstrates data variability, but no comparison

possible

• Standard Error– If bars overlap, any difference in means is not statistically

significant– If bars do not overlap, indicates nothing!

• 95% Confidence Interval– If bars overlap, indicates nothing!– If bars do not overlap, difference is statistically significant

• We’ll use 95 % CI

Using Error Bars to compare data

Page 22: Data Analysis, Presentation, and Statistics

Example 1Turbidity Data

1 2 3 mean St Dev n St Error t95%,2 +/- 95% CINTU NTU NTU NTU NTU NTU

Filter 1 2.1 2.1 2.2 2.1 0.06 3 0.03 4.30 0.14Filter 2 3.2 4.4 5 4.2 0.92 3 0.53 4.30 2.28Filter 3 4.3 4.2 4.5 4.3 0.15 3 0.09 4.30 0.38

2.1

4.2 4.3

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

Filter 1 Filter 2 Filter 3

Portable Water Filter

Turb

idity

(NTU

)

Create Bar Chart of Name vs Mean. Right click on data. Select “Format Data Series”.

Page 23: Data Analysis, Presentation, and Statistics

Example 2Turbidity Measurements

Time 1 2 3 mean St Dev n St Error t95,2 +/- 95% CIMin NTU NTU NTU NTU NTU NTU1 4.3 4.5 4.6 4.5 0.15 3 0.09 4.30 0.382 4.4 4.4 4.5 4.4 0.06 3 0.03 4.30 0.143 4.3 4.2 4.2 4.2 0.06 3 0.03 4.30 0.14

0.0

1.0

2.0

3.0

4.0

5.0

6.0

0 1 2 3 4

Time (min)

Turb

idity

(NTU

)

Page 24: Data Analysis, Presentation, and Statistics

Linear Regression

• Fit the best straight line to a data set

y = 1.897x + 0.8667R2 = 0.9762

0

5

10

15

20

25

0 2 4 6 8 10 12

Height (m)

Gra

de P

oint

Ave

rage

Right-click on data point and use “trendline” option. Use “options” tab to get equation and R2.

Page 25: Data Analysis, Presentation, and Statistics

R2 - Coefficient of multiple Determination

2

i

2i

2i

2ii2

yyyy

yyyy

1R

ŷi = Predicted y values, from regression equationyi = Observed y values

R2 = fraction of variance explained by regression (variance = standard deviation squared)= 1 if data lies along a straight line