Answers for Q1 through Q3stjensen/stat111/midterm.solutions.pdf · Answers for Q1 through Q3 Q1 a....

7
Answers for Q1 through Q3 Q1 a. Answer is greater than (i). b. Prob (46<X<66), by the 68-95-99.7 rule, then 68%. Both of the extremes are 1 standard deviation away from the mean. If standardization is done the answer is 68.26%. c. No because the data is skewed to the left. Partial credit was given for several answers. If they mentioned the CLT and how it is affected by increasing sample size, 1 point partial credit was given. Student had to say no and skewed to the left in order to receive full credit. Q2 a. ! !"# = !"#$ !",!"# ! !""(!",!"") !"#$!!"" = $!", !"#. !" b. ! !"#$%&! = 23,702 + 20,150 = $!!!" ! !"#$%&! = (13,590) ! + (12,500) ! = $!", !"!. !" c. !"#$ ! < 0 !"#$ ! < !!!!!" !"#$#.! !"#$ ! < 0.30 = .3821 = !". !"% Partial credit was given at each part. If the student wrote down the equation correctly and did the problem but with different numbers (especially in part c) they were only docked one point. Q3 a. r 2 = 0.4958, r= -0.7041 (because it is negatively correlated). One point was deducted if the student missed the negative part. One point was given if the student jotted down the r 2 value. b. slope = ! !"#$% !"#$% ! !"#$%& !"#$%&$ , !"#$ !! !"!"# !! !"#$% !" 0.05665 0.05665 = Δ Crime Total 500 = !"#. !! !"#!$%!& !"#$%& !"#$"%&" (!"#$%& !"## !"## ) c. ! = ! + !", !"#$% !"!#$ = 4.304 ! 10 !! + 5.665 ! 10 !! ! ! = !"#$%& !" 41000 !"##$%&, !"#$% !"!#$% = !"#!. !"

Transcript of Answers for Q1 through Q3stjensen/stat111/midterm.solutions.pdf · Answers for Q1 through Q3 Q1 a....

Page 1: Answers for Q1 through Q3stjensen/stat111/midterm.solutions.pdf · Answers for Q1 through Q3 Q1 a. Answer is greater than (i). b. Prob (46

AnswersforQ1throughQ3Q1

a. Answerisgreaterthan(i).

b. Prob(46<X<66),bythe68-95-99.7rule,then68%.Bothoftheextremesare1standarddeviationawayfromthemean.Ifstandardizationisdonetheansweris68.26%.

c. Nobecausethedataisskewedtotheleft.Partialcreditwasgivenforseveralanswers.IftheymentionedtheCLTandhowitisaffectedbyincreasingsamplesize,1pointpartialcreditwasgiven.Studenthadtosaynoandskewedtotheleftinordertoreceivefullcredit.

Q2

a. !!"# = !"#$ !",!"# ! !""(!",!"")!"#$!!"" = $!",!"#.!"

b. !!"#$%&! = 23,702+ 20,150 = $!!!" !!"#$%&! = (13,590)! + (12,500)! = $!",!"!.!"c. !"#$ ! < 0

!"#$ ! < !!!!!"!"#$#.!

!"#$ ! < −0.30 = .3821 = !".!"%Partialcreditwasgivenateachpart.Ifthestudentwrotedowntheequationcorrectlyanddidtheproblembutwithdifferentnumbers(especiallyinpartc)theywereonlydockedonepoint.

Q3

a. r2=0.4958,r=-0.7041(becauseitisnegativelycorrelated).Onepointwasdeductedifthestudentmissedthenegativepart.Onepointwasgivenifthestudentjotteddownther2value.

b. slope = ! !"#$% !"#$%! !"#$%& !"#$%&$ , !"#$ !ℎ! !"ℎ!"# !ℎ! !"#$% !" − 0.05665

−0.05665 = Δ Crime Total500

= −!"#.!! !"#!$%!& !"#$%& !"#$"%&" (!"#$%& !"## !"##)

c. ! = ! + !", !"#$% !"!#$ = 4.304 ! 10!! + −5.665 ! 10!! ! ! = !"#$%& !" 41000 !"##$%&, !"#$% !"!#$% = !"#!.!"

Page 2: Answers for Q1 through Q3stjensen/stat111/midterm.solutions.pdf · Answers for Q1 through Q3 Q1 a. Answer is greater than (i). b. Prob (46

STAT111 - Homework 3 - Solutions

30 points in total

Problem 4

(a) Income better predicts Crime Total because it has larger R2statistic (or say,

stronger correlation) than Vacancy.

(b) Let Y “ a ` bX is the regression line, then

b “ rSDpY qSDpXq “ 0.542 ˆ 1094

0.099“ 5989.374

a “ meanpY q ´ b ˚ meanpXq “ 2848 ´ 5989.374 ˚ 0.2 “ 1650.125

(c) Crime Total = 1650.125 + 5989.374*0.15 = 2548.531

(d) The change will be

p0.15 ´ 0.2q ˚ 5989.374 “ ´299.4687

Problem 5

(a) Let A=“at least one Trump voter is selected”, Bi ““the ith voter is not a Trump

voter”. Then

P pBq “ 1 ´ 0.4 “ 0.6

and

P pAq “ 1 ´ P pAcq “ 1 ´ P pB1 X B2 X B3q“ 1 ´ P pB1qP pB2qP pB3q“ 1 ´ 0.63

“ 0.784

(b)X „ Np3, 0.1q, that is, X has binomial distribution with parameter n “ 3, p “ 0.1.

P pX “ 0q “ 0.93 “ 0.729

P pX “ 1q “ 3 ˚ 0.92 ˚ 0.1 “ 0.243

1

Page 3: Answers for Q1 through Q3stjensen/stat111/midterm.solutions.pdf · Answers for Q1 through Q3 Q1 a. Answer is greater than (i). b. Prob (46

P pX “ 2q “ 3 ˚ 0.9 ˚ 0.12 “ 0.027

P pX “ 3q “ 0.13 “ 0.001

(c) meanpXq “ n ˆ p “ 0.3

(d) SDpXq “anpp1 ´ pq “

?3 ˚ 0.1 ˚ 0.9 “ 0.52

2

Page 4: Answers for Q1 through Q3stjensen/stat111/midterm.solutions.pdf · Answers for Q1 through Q3 Q1 a. Answer is greater than (i). b. Prob (46

Stat 111 (Partial) Midterm Exam Solutions

February 29th, 2016

1 Question 6 (12 Points)

(a) Let X be the weight in ounces of the burrito that Shane receives. Then, X has a normaldistribution with mean 14 and standard deviation 2.5. We want to find the probabilitythat X < 10. We find the Z-score to be

Z = 10 ≠ 142.5 = ≠1.6.

By looking in the attached Z-table, we find that

P{X < 10} = P{Z < ≠1.6} = 0.0548.

(b) Let A1, A2, A3, and A4 be the events that the first, second, third, and fourth person geta burrito that weighs less than 10 ounces, respectively. Then, we’re interested in knowing

P(A1 fi A2 fi A3 fi A4).

Our first step is to use the complement rule to get

P(A1 fi A2 fi A3 fi A4) = 1 ≠ P((A1 fi A2 fi A3 fi A4)c) = 1 ≠ P(Ac1 fl Ac

2 fl Ac3 fl Ac

4).

Note that the probability in the last expression is just the probability that none of thepeople get small burritos. Now, we can use our independence assumption to get

P(Ac1 fl Ac

2 fl Ac3 fl Ac

4) = P(Ac1)P(Ac

2)P(Ac3)P(Ac

4).

Finally, using the fact that the burrito weights are identically distributed and the comple-ment rule one more time gives

P(Ac1)P(Ac

2)P(Ac3)P(Ac

4) = P(Ac1)4 = (1 ≠ P(A1)4

Putting this all together means we have

P(A1 fi A2 fi A3 fi A4) = 1 ≠ (1 ≠ P(A1))4.

This is good, because the probability we computed in part (a) is P(A1). This means wehave

P(A1 fi A2 fi A3 fi A4) = 0.201831.

Note that this is slightly less than four times the probability that one of them gets a smallburrito because more than one of them can get small burritos at a time.

(c) We need to find the distribution of the average weight of the burritos. Let X1, X2, X3,and X4 be the burrito weights in ounces. Then, we are interested in

X = 14(X1 + X2 + X3 + X4).

This is a normal random variable, but we need to find its mean and standard deviation.Using the linearity of expectation, we see that

E#X

$= 1

4(E[X1] + E[X2] + E[X3] + E[X4]) = E[X1] = 14.

Page 5: Answers for Q1 through Q3stjensen/stat111/midterm.solutions.pdf · Answers for Q1 through Q3 Q1 a. Answer is greater than (i). b. Prob (46

2

Now, we only need to check the standard deviation. Using the standard deviation formula,we have

sd(X) = 14

sd(X1)2 + sd(X2)2 + sd(X3)2 + sd(X4)2,

sd(X1)2

4 ,

= sd(X1)2 ,

= 1.25.

At this point we do the same Z-table technique of part (a), but this time with a di�erentstandard deviation. We have

Z = 10 ≠ 141.25 = ≠3.2.

Using the Z-table, we see

P)

X < 10*

= P{Z < ≠3.2} = 0.0007.

(d) This is a reverse-standardization problem. We need to find the X such that

X = µ + ‡Z

where µ = 14, ‡ = 2.5, and Z is such that the Z-table value of Z is 0.9. In symbols, thismeans

�(Z) = 0.9.

As a result, we have

X = 14 + 2.5 ◊ �≠1(0.9) = 14 + 2.5 ◊ (1.29) = 17.225,

which is a lot of burrito but still not necessarily impressive.

2 Question 7 (12 Points)

(a) We will again need to use our Z-table skills. Let X be the weight of a linebacker, and letZ be the standardized weight. We see that

P{225 Æ X Æ 265} = P;

225 ≠ 24533 Æ Z Æ 265 ≠ 245

33

<,

= P{Z Æ 20/33} ≠ P{≠20/33 Æ Z},

= 2P{Z Æ 20/33} ≠ 1,

= 0.4514.

Thus there’s a good chance the weight is within those limits.(b) We need to do another Z-score standardization, but first we need to find the distribution

of the mean weight. Note that this is normal with the same mean, 245. Additionally, thestandard deviation of the mean is the standard deviation of the population divided by

Page 6: Answers for Q1 through Q3stjensen/stat111/midterm.solutions.pdf · Answers for Q1 through Q3 Q1 a. Answer is greater than (i). b. Prob (46

3

Ô25 = 5. Thus, the standard deviation of the mean is 33/5 = 6.6. By a similar calculation

as last time, we have

P{225 Æ X Æ 265} = P;

225 ≠ 2456.6 Æ Z Æ 265 ≠ 245

6.6

<,

= P{Z Æ 3.030303} ≠ P{≠3.030303 Æ Z},

= 2P{Z Æ 3.030303} ≠ 1,

= 0.9976,

which is a lot higher.(c) If Y is the number of linebackers in 25 that weigh between 225 and 265 pounds, then Y is

a binomial random variable with parameters n = 25 and p = 0.4514 from part (a). Theformula for the mean of a binomial is

E[Y ] = np = 11.285.

The standard deviation is

sd(Y ) =

np(1 ≠ p) = 2.488162.

(d) We want to know the probability that Y is at least 18. For this, we will use a normalapproximation

P{Y Ø 18} = P;

Y ≠ 11.2852.488162 Ø 18 ≠ 11.285

2.488162

<,

= P;

Y ≠ 11.2852.488162 Ø 2.698779

<,

= 1 ≠ �(2.698779),= 0.0036.

If we had alternatively doneP{Y > 17},

we would have gotten 0.01081292 instead.The answer without the normal approximation is 0.001719722, which would have requiredthe computation of a lot of binomial terms.

Page 7: Answers for Q1 through Q3stjensen/stat111/midterm.solutions.pdf · Answers for Q1 through Q3 Q1 a. Answer is greater than (i). b. Prob (46

Q8

(a) The mean of X is np = (175)(0.85) = 148.75 and the standard devia-

tion is

pnp(1� p) =

p175⇥ 0.85⇥ 0.15 = 4.724.

(b) The mean of p̂ is p = 0.85 and the standard deviation is

qp(1�p)

n =

q0.85⇥0.15

175 = 0.0270.

(c) We want P(p̂ > 0.87) (fine to use � too). We have np > 10 and

n(1�p) > 10, so we’re good to use the normal approximation (a hassle

to compute exactly in the exam...). Convert to z scores in the usual

way to get P(

p̂�0.850.0270 > 0.741) = P(Z > 0.741). The closest value in our

table is 0.74, which gives a probability of 0.2296 (R gives 0.2294, very

close).

Q9

(a) Definitely not representative - these students were sampled at the gym!

So it almost certainly won’t generalise to the general student population

of UPenn, and the real figure is almost certainly lower, due to the fact

that it’s far easier to find these over-exercisers at the gym.

(b) Lots of answers are reasonable here, but it must address the central

issue from part (a): your survey must avoid a sampling method that

oversamples either over-exercisers or the other two groups (optimal,

under). Sampling from e.g. a core class could work. Your answer

doesn’t have to be absolutely perfect, but you’ll lose symbolic marks

(so, none) if your answer is ridiculously infeasible (e.g. too expensive

etc).