Stats Midterm
-
Upload
nodirprimkulov -
Category
Documents
-
view
251 -
download
7
description
Transcript of Stats Midterm
-
SummaryPart1
Population=fullResearchResults(N)Sample=PartofPopulation(n)TypesofData:1.categorical=responses(eg.eyecolour,levelofsatisfaction...)a)Nominal=yes/noanswersb)ordinal=values(eg.1.poor,2.average,3.good)2.Numericala)continues=counting(fullnumberseg.age)b)discrete=measurement(eg.weight)ComulativeFrequencyDistribution=SumofPopulationsorSamplestoCertainPointeg.
Class Frequency Percentage CumulativeFr. CumulativeP.
10butlessthan20
3 15% 3 15%
20butlessthan30
6 30% 9 45%
30butlessthan40
5 25% 14 70%
addingfr.values addingp.values
Xi=ithvalueofthevariableX
ArithmeticMean(Population)= = Ni
N
i=1x
= Nx1+x2+x3+...xN
x1,x2,x3=PopulationValues
ArithmeticMean(Sample)=x = ni
n
i=nx
= nx1+x2+x3+...xn
x1,x2,x3=SampleValueMedian=ValuewhichStandsintheMiddle(eg.1,2,2,3,3,4,5Medianis3)
1
-
PositionalsoCalculatedby: 2n+1
Note:IfevenAmountofNumberstheAverageoftheTwointheMiddleisMedian
Variance(Sample)= s2 = n1n
i=1(xix)2
Variance(Population)=sameFormulaotherSymbols: s2 = 2 x = n1=N
StandardDeviation(Sample)= s = n1n
i=1(xix)2
StandardDeviation(Population)=sameFormulaotherSymbolss.o.:BasicallyStandardDeviationisforboth:variance
CoefficientofVariation= V ) 00%C = ( sx 1 MeasuresrelativeVariation expressesthestandarddeviationasapercentageofthemean AlwaysinPercentage ShowsVariationrelativetoMean CanbeusedtoComparetwoormoresetsofDatameasuredindifferentunits
Covariance(Sample) OV (x, )C y = sx,y = n1(xix)(yiy)
n
i=1 Covariance(Population)=SameFormulaotherSymbolss.o.
MeasuresthestrengthoflinearRelationshipbetweentwovariables
2
-
ResultsofCovariance:COV(x,y)>0=xandytendtomoveinthesamedirectionCOV(x,y)=0=thereisnolinearrelationshipbetweenxandyCOV(x,y) = 2 = (x ) P (x)
x 2 =(x ) (x ) x ) (x ).....1 2 P 1 + ( 2 2 P 2
FunctionsofRandomVariables P(x)istheProbabilityforXg(x)isafunctiondescribingX
ExpectedValue:E(g(x))= (x)P (x)
xg
Ifg(x)=Xwegetthenormalfunction Ifg(x)=(x wegettheformulaforvariance) 2
SpecialcaseifXisalwaysthesamevariablethanwecansaythattheMeanisXandtheVariance=0
IfthereisavariablebeforeourXwejustmultiplythemfortheExpectedValue
3
-
E(bX)=b 2
IfthereisavariablebeforeourXwejustsquareittomultiplyitwiththeVariancetogettheVarianceofthatEquation
Var(bX)= b2 2 Example:ConsiderZ=a+bXXhasMeanof andVarianceofx x2=> (a x) z = E + b = a + b x => =>standarddeviationofZ=ar(a x) z2 = V + b = b2 2
b| | x
SPECIALCASE!!!!(abitcomplicatedbutstepbystepeasy)Z= x
Xx ExpectedValueZ:
((X )/ ) (E(X) )/ ( )/ / z = E x x = x x = x x x = 0 x = 0WearesimplyusingtherulesthatwecanexcludetheVariableXfromtheotherconstants.Insteadofa+bxwehavetheopposite:(xa)/binwhich(a= =>wecanusetheandb ) x = x ruleVarianceofZ:
ar((X )/ ) ar(X/ ) ar( / ) ar(1/ ) 1/ ) ar(X) 1/ ) z2 = V x x = V x V x x = V x X = ( x2 V = (
2 x2 = x2x2 = 1
Looksworsethanitis.aswellweareusingtherule.FirstweareseparatingtheXvaluefromtheavalue( thanwecanjustletitfallbecausewhenwelookforvariancewedonttake)x intoaccounttheconstantwhichweaddorsubtract.ThanwesimplytakeXseparatelyfromthebvalue .BecauseweknowtogettheVariancewesimplytakethebvaluetothe) (x squareandsolvetheVariancevalueforX.BecausethevariancevalueandthebValueareboth andwehavetodividethemfromeachotherwegetthevalueof1.x2 BernoulliDistribution:
justtwopossibilities:success/failure P=probabilityofsuccess 1P=probabilityoffailure Randomvariablexdefinedas1ifsuccessand0iffailure
P(X=1)=PandP(X=0)=1P
Mean: P = n TheVariance: P (1 ) 2 = n P TheNumberofsequencesofx(success)inntrials: Cxn = n!x!(nx)!
4
-
BernoulliProbabilityDistribution: Hastohaveafixednumberofn Pofsuccessandfailureaddupto1anddontchangeduringtheexperiment,
independentfromeachotherP(x)= P (1 )n!x!(nx)!
x P nx =>ProbabilityofxsuccessesinntrialswiththeprobabilityofPoneachtrialJointProbabilityFunction
XtakesthespecificvaluexandYtakesthevalueyasafunctionofxandy P(x,y)=P(X=x ) y = Y MarginalProbabilitiesare: P(x)= (x, )
yP y P(y)= (x, )
xP y
ConditionalProbabilityFunction
YtakesthevalueofyandxisspecifiedforX=P(yIx)= P(x)P(x,y)
XtakethevalueofxandyisspecifiedforY=P(XIY=y)= P(Y=y)P(X ,Y=y)
(slightlydifferent) IndependentwhenP(x)P(y)=P(x,y) Covariance:Thestrengthoflinearrelationship
Cov(X,Y)=E((X )(Y )) (x )(y )P (x, )x y =
x
y x y y
Correlation:p=Corr(X,Y)= x yCOV (X ,Y )
p=0norelationship p>0positiverelationship=>whenXishighYaswell pwhenXhighYlow
ComulativeDistributionFunction
expressestheprobabilitythatXdoesntexceedxF(x)=P(X ) x example:aandb,twovaluesofX,aP(a
-
NormalDistributionFunction lookslikeabell symmetrically Mean,MedianandModeareequal Locationisdeterminedby >changingitshiftsthedistributionto
leftorright Spreadisdeterminedby >changingitspreadsorcloses Therandomvariablehasaninfiniterange anynormaldistributionfunctioncanbeturnedintoastandardized
normaldistribution(Z)=Z >thestandardizednormal= X
distributionshavegenerallyameanof0andavarianceof1 UseTable1inthebooktogetfromaZvaluetheF(Z)value
JointCumulativeDistributionFunction
SupposeXandYarecontinuousrandomvariables Thefunctionisdescribed:F(x,y) ItdefinesthatXislessthanxsimultaneouslyYislessthany F(x,y)=P(X
-
theirdifferenceis:Var(XY)= Cov(X, ) x2 y2 2 Y
linearcombinationofXandY(whereaandbareconstant),W=aX+bY
themeanofWis, (W ) (aX Y ) w = E = E + b = a x + b y thevarianceofWis, abCorr(X, ) w2 = a2 x2 + b2 y2 + 2 Y x y
Note:ifXandYarenormallydistributedWisaswell
Part3 DescriptiveStatistics:Collecting,presentinganddescribingdata InferentialStatistics:Drawingconclusionsordecisionsconcerninga
PopulationbasedonSampleDataSamplingDistributions
distributionofallvaluesofasamplefromapopulationTheStepstodevelopaSampleDistribution:
1. Listthegivenvalues(example,N=4,X=ageofthe4individually,ValuesofX=18,20,22,24
2. CalculateMeanandStandardDeviation(Population)a. 1 = 418+20+22+24 = 2
b. .236 = N(Xi)2 = 2
3. Allpossiblesamplecombinationsinatable: 18 20 22 24
18 18,18 18,20 18,22 18,24
20 20,18 20,20 20,22 20,24
22 22,18 22,20 22,22 22,24
24 24,18 24,20 24,22 24,24
7
-
4. Thandrawameantable 18 20 22 24
18 18 19 20 21
20 19 20 21 22
22 20 21 22 23
24 21 22 23 245. =>16SampleMeans6. SummaryofSamplingDistribution:
a. 1 = NXi = 1618+19+19+20+20+20+21+21+21+21+22+22+22+23+23+24 = 2
b. .58 X
= N(Xi)2 = 16(1821) +(1921) +(1921) ...(2421)2 2 2 2 = 1
7. ComparingthePopulationandSample:a. Population:
i. N=4ii. 1 = 2 iii. .236 = 2
b. Sample:i. n=2ii. 1 = 2 iii. .58 = 1
ExpectedValueofSampleMeanDistribution
iX = n1 n
i=1X
StandardErroroftheMean DescribestheVariabilityintheMean:
X= n
DecreaseswhenSampleSizeincreases
8
-
IfthePopulationisNormal samplingdistributionalsonormallydistributed and
X=
X= n
ZValueforSampleMeanDistributions
=Z = X(X)
n
(X)
amplemean X = s opulationmean = p populationstandarddeviation = n=samplesize
IfPopulationisnotnormal
approximatelynormalifn>25Example: = 3 = 8 n= 63 Probabilitythat between7.8and8.2=?X n>25=>approxnormal=> & =
XX
= n .5
X= 336 = 0
P(7.8
-
ChiSquareDistribution dependsondegreesoffreedom=n1=d.f. table7 n12 = 2
(n1)s2 exampletofindProbability
Freezerhastoholdtemperaturewithlittlevariation standarddeviationofnomorethan4=> = 4 Sample14Freezeraretested=>14=n=>d.f.=13 Whatistheprobabilitythatthesamplevarianceexceeds
27.52?=> ?s2 = 7.52) ( ) ( 2.36) ( 2.36) .05 P (s2 > 2 = P 2
(n1)s2 > 16(141)27.52 = P 16
(n1)s2 > 2 = P 132 > 2 = 0 P( 2.36) .05 132 > 2 = 0 Table7:d.f.1322.36as =>P=0.05
FindingtheChiValue n1=141=13=d.f. 0.05 = 2.36 132 = 2
PointandIntervalEstimates
Pointestimateisasinglenumber Intervalestimatesisthewidthofalowerbutstillreliablepointtoa
upperbutstillreliablepointalsoknownasconfidenceinterval IfP(a<
-
confidencetDistribution
Considerasampleofnobservations meanof andstandarddeviationsx normallydistributedpopulationwithmeanof n1degreesoffreedom
Thenvariable: t = sn
x Weusetdistributionwhenpopulationstandarddeviationis
unknownanduseinstead(s=samplestandarddeviation) =>notthataccuratebecauseweusejustasample Assumption:
Populationstandarddeviationisunknown populationisnormallydistributed ifpopulationisnotnormalusbiggersample
UseTDistribution (1 )confidenceintervalEstimate x tn1,/2 sn < < x + tn1,/2
sn
tdependsondegreesoffreedom useTable8forsolving Example
Samplen=25 s=8forma95%confidenceinterval0x = 5 for
d.f.251=24(1 )=0.95=>0.05= =0.025 /2 =2.0639tn1/2 = t 240.025
50(2.0639)
-
ToexplainIusethefollowingexample: Randomsampleof100people25arelefthanded95%
confidenceintervalforthetrueproportionoflefthanders
p Za/2 np(1p) < P < p + Za/2 np(1p) = =>10.95=0.050.05/2=0.025p 25100 Za/2
0.95+0.025=0.975ZTable:lookintheF(Z)for0.975=>1.96n=100
.96 .96 25100 1 1000.25(10.25) < P < 25100 + 1 1000.25(10.25) 0.1651
-
x ) ( y Za/2 x ) nxx2 + nyy2
< x y < ( y + Za/2 nxx2 + nyy2
areunknownand x2 y2
Assumption: Samplesareindependentandrandom Populationsarenormallydistributed PopulationVariancesareunknownandassumedunequal
Useatdistributionwithvdegreesoffreedom
v =( + )nxsx2
nysy2 2
( )+( )(n 1)x( )nxsx2 2
(n 1)y( )nysy2 2
TheconfidenceIntervalisdescribedasfollows: n1,a2 =
-
Firstdetermineeverything: n1=171=16 =(10.95)/2=0.0251 =0.975thanfindChi2a 2a
Values: > 8.85Xn1,a/2
2 = X171,0.0252 2
> .91Xn1,1a/22 = X171,1a/2
2 6 Thanfind 4s2 = 7 2 Nowfillitintheformula: H0 : = 3
Referstostatusquo(notguilty) containsalways=, or mayormaynotberejected
AlternativeHypothesis assumestheoppositeof (inourexample: )H0 = H1 : / 3 containsalways=, or / < > Mayornotmaybesupported Example:thepopulationmeanageis50=> 0H0 : = 5
nowweselectasampleandcalculatethemean.Letssupposeitwas 20=>unlikelyNullhypothesisistrueX =
14
-
Levelofsignificance Definestherejectionregionofthesampledistribution writtenas typicalvaluesare0.01,0.05,0.1 isselectedbyresearcher providesthecriticalvalues Typesoftests(3isanexampleforanynumber)
TwoTailtest: H0 : = 3 = H1 : / 3
UpperTailtest: H0 : 3 >3H1 :
LowerTailtest: H0 : 3
-
ConsidertheTest:
H0 : = 0 H1 : > 0
TheDecisionRuleis: Reject if zH0 =
n
x0 > za AlternateRule:
Reject if H0 X > 0 + Za n PValue
ProbabilityobtainingaTeststatisticmoreextremethantheobservedsamplevaluegiventhat istrueH0
alsocalledobservedValueofSignificance showsthesmallestvalueof forwhich canberejected H0
Convertsampleresult(eg. )toteststatistic(eg.zstatistic) x Exampleofuppertailtest:
obtainpvalue pvalue=(P> =>, giventhatH istrue)/n
x0 0 (Z )P > /n
x0 = 0 DecisionRulecomparethepvalueto
Ifpvalue< ,reject H0 Ifpvalue ,dontreject H0
OneTailTest
alternativeHypothesisfocusesononeDirection if is" thensomething, itsanuppertailtest H1 > " if is" thensomething, itsalowertailtest H1 < " LoweranduppertailtestshavejustonecriticalValuesince
therejectionareaisinonlyonetailTwoTailTest
twocriticalvaluesdefiningthetwoareasofrejection
16
-
tTestofHypothesisfortheMean( Unknown) convertsampleresults( toatteststatistic)x
ConsidertheTest: H0 : = 0 H1 : > 0
TheDecisionRuleis: Reject if tH0 = s
n
x0 > tn1,a Foratwotailedtest:
H0 : = 0 = H1 : / 0
TheDecisionRuleis tn1,a/2 TestofthePopulationProportion
involvescategoricalvalues twooutcomes
success(acertaincharacteristicispresent) failure(acertaincharacteristicisnotpresent)
ProportionofthepopulationiswrittenasP SampleSizeislarge SampleProportioninthesuccessareaiswritten p"
p= xn = samplesize
numberof successesinsample
ifnP(1P)>9, canbeseenasapproximatelynormaldistributed p ThereforeMean=
p= P
andstandardDeviation= p
= nP(1P) HypothesisTestforProportion(nP(1P)>9)
ZVALUEbecausenormaldistributed Z = pP
0
nP (1P )0 0
17