Using Visual Basic for Applications (VBA) Jake Blanchard University of Wisconsin Spring 2010.
Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1.
-
Upload
domenic-washington -
Category
Documents
-
view
217 -
download
0
Transcript of Summary Statistics Jake Blanchard Spring 2008 Uncertainty Analysis for Engineers1.
Uncertainty Analysis for Engineers 1
Summary StatisticsJake BlanchardSpring 2008
Uncertainty Analysis for Engineers 2
Summarizing and Interpreting Data
It is useful to have some metrics for summarizing statistical data (both input and output)
3 key characteristics are ◦central tendency (mean, median,
mode)◦Dispersion (variance)◦Shape (skewness, kurtosis)
Uncertainty Analysis for Engineers 3
Central TendencyMean
Median=point such that exactly half of the probability is associated with lower values and half with greater values
Mode=most likely value (maximum of pdf)
dxxfxxEpxxE i
n
ii )()()(
1
z
dxxf 5.0)(
Uncertainty Analysis for Engineers 4
For 1 Dice
5.3mod
5.3
5.3)(
6
16
6
15
6
14
6
13
6
12
6
11)()(
6
1
e
x
median
xE
xpxxE
mean
ix
i
i
Uncertainty Analysis for Engineers 5
Radioactive DecayFor our example, the mean, median,
and mode are given by
The mode is x=0
)2ln(
5.0
1)()(
0
0
z
dte
median
dtetdtttftE
mean
zt
t
Uncertainty Analysis for Engineers 6
Other CharacteristicsWe can calculate the expected
value of any function of our random variable as
iii xpxh
dxxfxh
xhE
)()(
Uncertainty Analysis for Engineers 7
Some Results
n
jjj
n
jjj
n
jj
n
jj
xEbxbE
xExE
xcEcxE
ccE
11
11
)()(
)(
Uncertainty Analysis for Engineers 8
ii
ki
k
kk
xpx
dxxfx
xE
dxxfx
)(
)(
)(
1
1
1
1
Moments of DistributionsWe can define many of these
parameters in terms of moments of the distribution
Mean is first moment. Variance is second momentThird and fourth moments are
related to skewness and kurtosis
Uncertainty Analysis for Engineers 9
Spread (Variance)Variance is a measure of spread or
dispersion
For discrete data sets, the biased variance is:
and the unbiased variance is
The standard deviation is the square root of the variance
dxxfxxE )(21
212
2
n
i
xxn
s1
22 1
n
i
xxn
s1
22
1
1
Uncertainty Analysis for Engineers 10
Skewnessskewness is a measure of
asymmetry
For discrete data sets, the biased skewness is related to:
The skewness is often defined as
dxxfxxE )(31
313
n
i
xxn
m1
33
1
33
1
Uncertainty Analysis for Engineers 11
Skewness
Uncertainty Analysis for Engineers 12
Kurtosiskurtosis is a measure of
peakedness
For discrete data sets, the biased kurtosis is related to:
The kurtosis is often defined as
dxxfxxE )(41
414
n
i
xxn
m1
44
1
344
2
Uncertainty Analysis for Engineers 13
Kurtosis
Pdf of Pearson type VII distribution with kurtosis of infinity (red), 2 (blue), and 0 (black)
Uncertainty Analysis for Engineers 14
Using Matlab
Sample data is length of time a person was able to hold their breath (40 attempts)
Try a scatter plotload RobPracticeHolds; y = ones(size(breathholds));h1 = figure('Position',[100 100 400
100],'Color','w');scatter(breathholds,y);
Uncertainty Analysis for Engineers 15
Adding Informationdisp(['The mean is ',num2str(mean(breathholds)),' seconds
(green line).']);
disp(['The median is ',num2str(median(breathholds)),' seconds (red line).']);
hold all;
line([mean(breathholds) mean(breathholds)],[0.5 1.5],'color','g');
line([median(breathholds) median(breathholds)],[0.5 1.5],'color','r');
Uncertainty Analysis for Engineers 16
Box Plot
title('Scatter with Min, 25%iqr, Median, Mean, 75%iqr, & Max lines');
xlabel('');
h3 = figure('Position',[100 100 400 100],'Color','w'); boxplot(breathholds,'orientation','horizontal','widths',.5);
set(gca,'XLim',[40 140]);
title('A Boxplot of the same data'); xlabel(''); set(gca,'Yticklabel',[]); ylabel('');
Uncertainty Analysis for Engineers 17
Box Plot
Min
MaxMedia
n
Outlier
Box represents
inter-quartile
range (half of data)
Uncertainty Analysis for Engineers 18
Empirical cdfh3 = figure('Position',[100 100 600
400],'Color','w');
cdfplot(breathholds);
Uncertainty Analysis for Engineers 19
Multivariate Data SetsWhen there are multiple input
variables, we need some additional ways to characterize the data
If x and y are independent, then Cov(x,y)=0
)()()(),(
,),(
),(),(),(
yExExyEyxCov
discreteyxpyxh
continuousdxdyyxfyxhyxhE
i jjiji
Uncertainty Analysis for Engineers 20
Correlation Coefficients
Two random variables may be relatedDefine correlation coefficient of input (x)
and output (y) as
=1 implies linear dependence, positive slope
=0 no dependence=-1 implies linear dependence, negative
slope
)()(
),(
1 1
22
1, yx
yxCov
yyxx
yyxxm
k
m
k kk
m
k kkyx
Uncertainty Analysis for Engineers 21
Example
=0.98
=-0.38
=1
=-0.98
Uncertainty Analysis for Engineers 22
Examplex=rand(25,1)-0.5;y=x;corrcoef(x,y)subplot(2,2,1), plot(x,y,'o')y2=x+0.2*rand(25,1);corrcoef(x,y2)subplot(2,2,2), plot(x,y2,'o')y3=-x+0.2*rand(25,1);corrcoef(x,y3)subplot(2,2,3), plot(x,y3,'o')y4=rand(25,1)-0.5;corrcoef(x,y4)subplot(2,2,4), plot(x,y4,'o')