A Crash Course on Basic Statistics (2013) - Stony Brook Astronomy
Statistics crash course
-
Upload
aleksandarha -
Category
Documents
-
view
223 -
download
0
Transcript of Statistics crash course
-
7/30/2019 Statistics crash course
1/22
Probability and statistics crash course
http://www.comp.leeds.ac.uk/hannah/mathsclub
Probability 1 (for dummies:-)
Stats 1 (averages and deviations)
Probability 2 (Trials and distributions)
Stats 2 (significance)
Stats 3 (errors)
. 1/
-
7/30/2019 Statistics crash course
2/22
Preliminaries
So what is statistics?
Applied branch of mathematics
Concerning data and its representation
Descriptive Statistics (today) are concerned withrepresenting and summarising data
Analytical Statistics (in a few weeks) are concerneddrawing conclusions from data
... probability theory enables us to find the consequencesof a given ideal world, while statistical theory enables us toto measure the extent to which our world is idealSkiena, 2001.
. 2/
-
7/30/2019 Statistics crash course
3/22
Descriptive statistics: Why?
Summarising data.
32 7 16 33
33 10 13 35
22 11 15 34
21 13 17 32
23 16 15 24
Max, Min, Mean(s), Median, Mode, Variance, StandardDeviation, Interquartile range, ...
All ways of presenting numerical data in such a way that welearn something of its spread and tendency and deviation.
. 3/
-
7/30/2019 Statistics crash course
4/22
What is an average?
Average originally meant Financial loss incurred throughdamage to goods in transit, from the Italian avaria, a wordfrom 12c. Mediterranean maritime trade. Sometimes traced
to Arabic arwariya damaged merchandise, but this is lesscertain.
Later, the meaning of the word shifts to equal sharing ofsuch loss by the interested parties.
. 4/
-
7/30/2019 Statistics crash course
5/22
Measures of central tendency
Arithmetic Mean (often what we think of when we say theword Average).Add em all up and divide by the number there are.
x =1
n
ni=1
xi
. 5/
-
7/30/2019 Statistics crash course
6/22
An aside about samples and populations
Often we cant measure an entire population, and insteadhave to measure a subset (a sample). The mean on theprevious slide x is, strictly speaking, a sample mean. The
population mean is usually referred to as , and the size ofthe whole population as N.
= 1N
Ni=1
xi
. 6/
-
7/30/2019 Statistics crash course
7/22
The other two
Median = put them all in order, and choose the middle one.IF there are an even number, then there are two middleones, so use the number halfway between these.
Mode = choose the most frequent one.
. 7/
-
7/30/2019 Statistics crash course
8/22
Symmetricity/Skewness
I am just going to mention this in passing today, but...
0 10 20 30 40 50 60 700
100
200
300
400
500
600
700A fictitious but nastily skewed dataset
Count
Number
Figure 1: A skewed dataset
This dataset has a mean of 21.8, a median of 12 and amode of 12.
. 8/
-
7/30/2019 Statistics crash course
9/22
An aside about types of data
There are various types of data we can consider withinstatistics. Not all measures of central tendency apply to allof these
Data type Description Average
Nominal Categories or names Mode
Ordinal Orderings (e.g., First,Second, Third . . . )
Median
Interval Proper numbers Mean (symmetrical)
and Ratio Median (skewed)
. 9/
-
7/30/2019 Statistics crash course
10/22
nd now over to my sequinned assistant. .
. 10/
-
7/30/2019 Statistics crash course
11/22
To conclude the average bit
Arithmetic Mean; Median; Geometric median; Mode;Geometric Mean; Harmonic Mean; Quadratic Mean (orRMS); Generalised Mean (like quadratic mean but with
different powers); Weighted Mean (some matter more thanothers); Truncated Mean (leave out the tricky outliers);Interquartile Mean (uses the interquartile range, of whichmore later); Midrange (max+min/2); Winsorized mean (Liketruncated but not quite); Annualization (to do with financestuff).
All of these have their own wikipedia page, so, you knowwhere to start!
. 11/
-
7/30/2019 Statistics crash course
12/22
Boring practical bit
32 7 16 33
33 10 13 35
22 11 15 3421 13 17 32
23 16 15 24
. 12/
-
7/30/2019 Statistics crash course
13/22
Boring practical bit: answers
32 7 16 33
33 10 13 35
22 11 15 3421 13 17 32
23 16 15 24
Mean 26.2 11.4 15.2 31.6Median 23 11 15 33
Mode ? ? 15 ?
. 13/
-
7/30/2019 Statistics crash course
14/22
Deviation
As well as knowing some kind of average of a particularsample, you might want to know something of its spread.
1.5 1 0.5 0 0.5 1 1.5 2 2.5 3 3.50
1
2
3
4
5
6x 10
4 More fictitious data
Number
Count
Figure 2: Three datasets with the same mean but
different spreads.
. 14/
Th ll i l
-
7/30/2019 Statistics crash course
15/22
The really simple one
The range is the simplest way of describing the spread ofdata - find the max, find the min, subtract the min from themax, there you go.
. 15/
-
7/30/2019 Statistics crash course
16/22
Deviation
The deviation of a sample is measured with reference tosome measure of central tendency you want to know howmuch the sample deviates from something. With average
deviation, variance, and standard deviation, this is themean or the sample mean x.
. 16/
-
7/30/2019 Statistics crash course
17/22
Measures of deviation
Average deviation =
|x |N
Variance = 2 =
(x )2N
Standard deviation = =
(x )2
N
For reasons you will now be familiar with, when consideringsamples, becomes s, and becomes x. To account forbias, sample standard deviation is divided by n 1 ratherthan n.
. 17/
W k d l
-
7/30/2019 Statistics crash course
18/22
Worked example
This examplea involves the rainfall in Liberiab.
J F M A M J J A S O N D
1 2 4 6 18 37 31 16 28 24 9 4
The mean of this data is
1 + 2 + 4 + 6 + 18 + 37 + 31 + 16 + 28 + 24 + 9 + 412
= 15
The range of this data is 36; (max-min, or 37-1)
ataken from Sternsteins StatisticsbNo, Ive never been there either
. 18/
A d i ti
-
7/30/2019 Statistics crash course
19/22
Average deviation
The average deviation
= |1
15|
+|2
15|
+|4
15|
+|6
15|
+|18
15|
+ ...
12
=14 + 13 + 11 + 9 + 3 + 22 + 16 + 1 + 13 + 9 + 6 + 11
12(10.7 Inches)
. 19/
V i d t d d d i ti
-
7/30/2019 Statistics crash course
20/22
Variance and standard deviation
The variance
= 14
2
+ 13
2
+ 11
2
+ 9
2
+ 3
2
+ 22
2
+ 16
2
+ 1
2
+ 13
2
+ 9
2
+ 6
2
+ 1112
(143.7 Inches squared)
AND the standard deviation is the square root of thevariance, so...
=
143.7 = 12.0
and the units of the standard deviation are... the same asthe units of measurement.
. 20/
Interq artile range
-
7/30/2019 Statistics crash course
21/22
Interquartile range
One final measure of deviation is the interquartile range.
This is related to the median, and the first thing you do is
place your data in order.
Discard the lowest and the highest 14
of your data, and use
the range of what remains. This is much more robust tooutliers.
. 21/
A d t fi i h
-
7/30/2019 Statistics crash course
22/22
And to finish
If your data is normally distributed (of which more nextweek), knowing the standard deviation tells you all sorts ofuseful stuff.
Figure 3: Another graph stolen from wikipedia
. 22/