EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as...

134
EXCEL STATISTICAL Functions

Transcript of EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as...

Page 1: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

EXCEL

STATISTICAL Functions

Page 2: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Average vs. Weighted Average

AverageWeighted

Average

Page 3: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Average Price

Transaction

DateItem

Price Per

Unit

Units

Sold

5/7/2007 Coffee $ 20 500

6/7/2007 Coffee $ 25 750

7/6/2007 Coffee $ 35 200

8/9/2007 Coffee $ 30 300

Total: 20+25+35+30 = 110

Average Price: 110 / 4 = 27.50

Sum of Prices / Count of Prices

Page 4: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Weighted Average Price

Date ItemPrice Per

Unit

Units

Sold

5/7/2007 Coffee $ 20 500

6/7/2007 Coffee $ 25 750

7/6/2007 Coffee $ 35 200

8/9/2007 Coffee $ 30 300

500+750+200+300 = 1,750

Weighted Average Price: 44,750 / 1,750 = $25.57

∑(Price per Unit * Units Sold) /∑ Units Sold

= 10,000

= 18,750

= 7,000

= 9,000

44,750

XXX

X+

+

Page 5: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Weighted Average Exercise

“Weighted_Avg” sheet

Page 6: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Weighted Average

Solutions

“Weighted_Avg” sheet

Page 7: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Wt. Avg. with

SumProduct() Exercise

“Weighted_Avg_SumProduct” sheet

Page 8: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

=SumProduct(Array1,Array2)

Multiplies corresponding numbers in two or more different ranges together then sums them.

Date ItemPrice Per

Unit

Units

Sold

5/7/2007 Coffee $ 20 500

6/7/2007 Coffee $ 25 750

7/6/2007 Coffee $ 35 200

8/9/2007 Coffee $ 30 300

→ 20*500 = 10,000

→ 25*750 = 18,750

→ 35*200 = 7,000

→ 30*300 = 9,000

44,750Array 1 Array 2

Page 9: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Wt. Avg. with

SumProduct() Solution

“Weighted_Avg_SumProduct” sheet

Page 10: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Summations

10

1

2i

Page 11: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Summations

Sum(12 , 22 , 32 , 42 , 52 , 62 , 72 , 82 , 92 , 102)

10

1

2iStartingNumber

EndingNumber Mathematical

Operation

Page 12: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Using Sum() as an Array

Sum(1^2,2^2,3^2,4^2,5^2,6^2,7^2,8^2,9^2,10^2)10

1

2i

Sum(A1:A10^2)

However, if you just press Enter, Excel cannot evaluate the equation. We need to tell it to process each cell in the range one at a time.

Page 13: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Using Sum() as an Array

Press Control + Shift + Enter to make Excel Process cells in a range one at a time.

Sum(A1:A10^2)

Sum(A12,A22,A32,A42,A52,A62,A72,A82,A92,A102)

Sum(12,22,32,42,52,62,72,82,92,102)

Sum(1,4,9,16,25,36,49,64,81,100) 385

Page 14: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Summation Exercises

“Summations” sheet

Page 15: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Summation Solutions

“Summations” sheet

Be sure to press Control + Shift + Enter after typing or editing the formulas.

Page 16: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Factorials in Excel

!)!1(

!1

kn

kn

Page 17: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Factorials (!)

4 * 3 * 2 * 1 4! = = 24

6 * 5 * 4 * 3 * 2 * 16! = = 720

=Fact(4)4! = = 24

=Fact(6)6! = = 720

Page 18: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Factorial Exercise using

Combinations

You have an unlimited supply of Coconuts and Pineapples. How many different ways can you fill a bowl which will contain exactly 3 items?

Page 19: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Combinations

Order of Selection is Not Important

Page 20: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Combinations

Items can be Repeated (with Replacement)

Page 21: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Combinations

(With Replacement, Order Not Important)

!)!1(

!1

kn

kn

n = The number of different items available (2).K = The number of items selected (3).

Page 22: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Factorial Exercise with

Combinations

“Factorials” sheet

Page 23: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Factorial Exercise with

Combinations: Solution

“Factorials” sheet

Page 24: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Frequency

Nu

mb

er

Occ

urr

ing

Grade

F D C B A

7

6

5

4

3

2

1

Page 25: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

>89.99 and <=100 ---------> A

>79.99 and <=89.99 ------> B

>69.99 and <=79.99 ------> C

>59.99 and <=69.99 ------> D

<= 59.99 ---------------------> F

We wish to determine the number of students getting an A, B, C, D and F based on the conditions below.

Page 26: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

=Frequency(Data Array, Bins Array)

This is the range of values you are analyzing.

This is your grouping.

The values listed are the maximums for the group and are inclusive.

In this example the groupings are:

>89.99 and <=100>79.99 and <=89.99>69.99 and <=79.99>59.99 and <=69.99<=59.99

Page 27: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

=Frequency(Data Array, Bins Array)

You must highlight your output area prior to typing the formula.

When done typing, you must press: Control +Shift + Enter

Page 28: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

=Frequency(Data Array, Bins Array)

You must:• Highlight First• Press Control + Shift + Enter

“Frequency” sheet

Page 29: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

=Frequency(Data Array, Bins Array)

You must:• Highlight First• Press Control + Shift + Enter

Page 30: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Rank and Percentile

(Comparisons within the Group Only)

Page 31: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Rank & Percentile: Rank

Two Factors to Consider with Rank:

• Order (Ascending or Descending)

• How Ties are Handled

Page 32: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

RankHandling Ties

Rank.Eq()Ties Given the Same First Rank

Rank.Avg()Ties Given the Same Average Rank

Page 33: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

=Rank.Eq(Score,Range,Order)

Value being ranked

Set of values being ranked

0 Low scores have low ranking

1 Low scores have higher rankings

“Ranking” sheet

Page 34: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

=Rank.Eq(Score,Range,Order)

Value being ranked

Set of values being ranked

0 Low scores have low ranking

1 Low scores have higher rankings

“Ranking” sheet

Page 35: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

=Rank.Avg(Score,Range,Order)

Value being ranked

Set of values being ranked

0 Low scores have low ranking

1 Low scores have higher rankings

“Ranking” sheet

Page 36: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

=Rank.Avg(Score,Range,Order)

Value being ranked

Set of values being ranked

0 Low scores have low ranking

1 Low scores have higher rankings

“Ranking” sheet

Page 37: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Percent Rank

Expressing a Score as a Percent of Scores it Greater Than

PercentRank.Inc()This returns the percent of students who ranked less than the score in question. The "Inc" indicates that the ranks includes 0 and 100 in its calculation. The outcome is that the highest score gets 100% indicating that 100% of the scores are less than this score in question.

PercentRank.Exc()This also returns the percent of students who ranked less than the score in question. The "Exc" indicates that the ranks exclude 0 and 100. The outcome is that the highest score might get a 95% but it does not express the score as being better than 100% of the other scores. This is a more commonly used method of expressing percentiles.

Page 38: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Percent Rank

PercentRank.Exc(Array,k)

Specific ScoreSet of Scores

“PercentRank” sheet

Page 39: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Percent Rank

PercentRank.Exc(Array,k)

Specific ScoreSet of Scores

“PercentRank” sheet

Page 40: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Percent Rank

PercentRank.Inc(Array,k)

Specific ScoreSet of Scores

“PercentRank” sheet

Page 41: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Percent Rank

PercentRank.Inc(Array,k)

Specific ScoreSet of Scores

“PercentRank” sheet

Page 42: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Percentile

Returns the Sore for a Given Percentile (i.e. Breakpoint)

Percentile.Inc(Range of Scores, Specific Score)This returns the score associated with a given breakpoint. For example, you wanted to know what test score begins the 90th percentile. The "Inc" indicates that 0 and 100 percent are included when calculating the score associated with the percent given.

Percentile.Exc(Range of Scores, Specific Score)Just like "Percentile.Inc" except that it excludes 0 and 100 percent.

Page 43: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Rank, Rank Percent

& Percentile Exercises

“Percent_and_Rank” sheet

Page 44: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Rank, Rank Percent

& Percentile Exercises

“Percent_and_Rank” sheet

Page 45: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Normal Distribution

Page 46: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Is the Data Normally Distributed?

“Normal_Distribution” sheet

Creating a Histogram

Page 47: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Is the Data Normally Distributed?

“Normal_Distribution” sheet

Skew(Range)Degree of Symmetry. If the skew is greater than + 1 or lower than - 1 then the skew is substantial and the distribution is far from symmetrical.

+-

Page 48: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Is the Data Normally Distributed?

“Normal_Distribution” sheet

Kurt(Range)Kurtosis is the degree of flatness. In Excel, greater than zero indicates a sharp curve while less than zero indicate a flat curve. 0 is a normal curve.

Page 49: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Is the Data Normally Distributed?

“Normal_Distribution” sheet

Average(Range)Answers the questions: Where are most of the data points?

Average is less useful when data is not normally

distributed.

Page 50: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Skew(), Kurt(), & Average()

“Normal_Distribution” sheet

Page 51: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Deviation

How spread out from the average is the data?

If your data is normally distributed then you can use Standard Deviation to see how spread out the data is.

Page 52: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Deviation

How spread out from the average is the data?

For data that is normally distributed: 68% of your data will be within 1 standard deviation from the average. 95% of your data will be within 2 standard deviation from the average. 99% your data will be within 3 standard deviation from the average.

Page 53: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Deviation

How spread out from the average is the data?

Stdev.P(Range)

Stdev.S(Range)

Standard Deviation of a Population

Standard Deviation of a Sample

“Normal_Distribution” sheet

Page 54: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Deviation

How spread out from the average is the data?

“Normal_Distribution” sheet

We already know that that average is 50.7. Using the normal distribution, we now know that:

68% of the students scores are within 21 points of the average (i.e. 29.7 to 71.7). 95% of the students scores are within 42 points (2 x 21) of the average (i.e. 8.7 to 92.7). 99% of the students scores are within 63 points (3 x 21) of the average (i.e. -12.3 to 113.7).

Note because of the thinness of the tails at 3 standard deviations, you should think of this as 0 to 100.

Page 55: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution

The Standard Normal Distribution is the Normal Distribution adjusted so the mean falls on zero and the standard deviation is one.

The image above shows test scores (0-150) plotted on a histogram. The data is normally distributed.

Page 56: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution

and Z Scores

The data points are converted to Z-Scores and represent the number of standard deviations from the mean using this formula:

𝑍 𝑆𝑐𝑜𝑟𝑒 =𝐷𝑎𝑡𝑎 𝑃𝑜𝑖𝑛𝑡 − 𝐴𝑣𝑒𝑟𝑎𝑔𝑒

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑡ℎ𝑒 𝑆𝑎𝑚𝑝𝑙𝑒

For example: • A Z-Score of 0 indicates that the score is the same

as the mean. • A positive Z-Score of 1.62 means that the data

point is 1.62 standard deviations above the mean.

• A negative Z-Score means it is below the mean.

Standardize(Data point, Mean, Standard Deviation)or

Page 57: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution

and Z Scores

Z-Scores allow you to:

• Compare scores from different normal distributions.

• Combine scores from data with different means and standard deviations.

Page 58: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution

Z Scores as Probabilities

Z-Scores are often converted to probabilities percentages to make them easier to understand. Z-scores are expressed as a 0% to 100% probability of occurring.

For example, a test score of 85 might turn out to have a z-score of 1.62 which in turn might have a corresponding probability of 94%. This indicates that getting a score of 85 means you did better than 94% of the population. It also can be taken to mean that if you randomly sample a score, there is a 94% probability that it will be less than 85.

Score of 85

Z Score of 1.62

94% Probability

Page 59: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Functions for Returning Z Scores

and Probabilities in Excel

• Data Point to Probability: Norm.Dist(Data point , Mean, Standard deviation, Cumulative?)

• Z-Score to a Probability: Norm.S.Dist(Z-Score, Cumulative?)

Data Point to a Z Score:

Returning a Probability

Returning a Z Score

Standardize(DataPoint , Mean , Standard Deviation)

• Probability to Data Point: Norm.Inv(Probability Percent, Mean, Standard deviation)

Returning Data Point for a Given Probability

Page 60: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

About Cumulative

• Data Point to Probability: Norm.Dist(Data point , Mean, Standard deviation, Cumulative?)

• Z-Score to a Probability: Norm.S.Dist(Z-Score, Cumulative?)

Returning a Probability

Cumulative is set to either True or False: TrueExcel returns the percentage of students whose scores were less than yours.FalseExcel returns the percentage of students who have the same score as yours.

Page 61: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

Compare Local Test Scores with National

Use Norm.Dist() to see where our student’s scores are when comparted with National test Scores.

National Average Score: 65National Standard Deviation: 15

“Probability_Ex1_National” sheet

Page 62: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

Compare Local Test Scores with National

What is the National cutoff Score for the 90th

Percentile? Norm.Inv()

How many of our students are in the National 90th

Percentile? CountIf()

Percent of our Students in the National 90th

Percentile? # in 90th/Total Count

“Probability_Ex1_National” sheet

Page 63: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

Compare Local Test Scores with National

“Probability_Ex1_National” sheet

Solution

Page 64: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

What is the 90th

Percentile Cutoff Score?

“Probability_Ex1_National” sheet

Solution

Page 65: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

Number of Local Students Who Made

the National 90th

Percentile?

“Probability_Ex1_National” sheet

Solution

Page 66: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

Percent of Local Students Who Made

the National 90th

Percentile?

“Probability_Ex1_National” sheet

Solution

Page 67: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

“Probability_Ex2” sheet

Using Norm.Dist() to Calculate a Z-Score ProbabilityA volleyball coach is considering hiring a new player who is 78" tall but is wondering if his height is really that impressive. The coach knows that in the USA, the average height of a man is 69.1" with a standard deviation of 2.9". What percent of the national population is the player taller than?

Page 68: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

“Probability_Ex2” sheet

Using Norm.Dist() to Calculate a Z-Score ProbabilityA volleyball coach is considering hiring a new player who is 78" tall but is wondering if his height is really that impressive. The coach knows that in the USA, the average height of a man is 69.1" with a standard deviation of 2.9". What percent of the national population is the player taller than?

Solution

Page 69: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

“Probability_Ex3” sheet

Using Norm.Inv() to Calculate a Data Point from a ProbabilityOur volleyball coach also wants to make sure his male players are all taller than 95% of the population. What is the minimum height his players must meet to get on the team? The coach knows that in the USA, the average height of a man is 69.1" with a standard deviation of 2.9".

Page 70: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

“Probability_Ex3” sheet

Using Norm.Inv() to Calculate a Data Point from a ProbabilityOur volleyball coach also wants to make sure his male players are all taller than 95% of the population. What is the minimum height his players must meet to get on the team? The coach knows that in the USA, the average height of a man is 69.1" with a standard deviation of 2.9".

Solution

Page 71: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

“Probability_Ex4” sheet

Using Norm.Dist() for Probability Between Two Data PointsA new students is being sent to our height minded Volleyball coach to be on his team. Again, Google tells him that in the USA, the average height of a man is 69.1" with a standard deviation of 2.9".

• What is the probability that the student will be exactly 70 inches tall?

• What is the probability that the student will be 70 inches tall or less?

• What is the probability that the student will be between 70 to 74 inches tall?

• What is the probability that the student will be greater than 70 inches tall?

Page 72: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

(Norm.Dist(DataPoint , Mean, Standard Deviation , Cumulative True)

“Probability_Ex4” sheet

• What is the probability that the student will be 70 inches tall or less?

• What is the probability that the student will be between 70 to 74 inches tall?

• What is the probability that the student will be greater than 70 inches tall?

Page 73: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Standard Normal Distribution Exercise

“Probability_Ex4” sheet

Using Norm.Dist() for Probability Between Two Data PointsA new students is being sent to our height minded Volleyball coach to be on his team. Again, Google tells him that in the USA, the average height of a man is 69.1" with a standard deviation of 2.9".

Solution

Page 74: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Test

Used to determine how similar two sets of data are. More specifically, they test the hypothesis that the means of the two sets of data do not differ (i.e. the null hypothesis). In other words, are they different simply due to chance or over time will they continue to be different?

Requirements:• Standard Deviation of the Population is

unknown.• Sample Size is less than 30 (n<30)• Data is Normally Distributed

Page 75: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Tests

(Comparing Two Means)

Interpreting the Results• T-Test results are typically expressed in probabilities (p).• Because it T-Tests evaluates the hypothesis that there is no real

difference, the smaller the p value the more likely it is that there is a true difference between the means. A 95% level of confidence is typically used so:

A p-value less than .05 indicates that the data sets do differ and not just do to randomness

Page 76: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Tests

(Comparing Two Means)

Factors to Consider: Number of TailsOne Tailed: testing is in just one direction e.g. The mean of sample A is less than the mean of sample B

Two Tailed: testing is in both directionse.g. Testing that the mean of sample A is less than or greater than the mean of sample B.

Page 77: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Tests

(Comparing Two Means)

Factors to Consider: Paired vs. UnpairedPaired: Each value in a sample is paired with a value in another sample. Each sample has the same number of data points. Often used in before and after comparison.

• Testing a person’s reflexes before and after drinking alcohol.• Weighing each individual before and after taking a new

weight loss drug.

Page 78: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Tests

(Comparing Two Means)

Factors to Consider: Paired vs. UnpairedUnpaired: data is gathered by randomly sampling each set of data separately under different conditions. The size of each group does not have to be equal.

• Comparing the performance of two different stocks.• Comparing the battery life of batteries from two different

suppliers.

Page 79: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Tests

(Comparing Two Means)

Factors to Consider: Paired vs. UnpairedUnpaired: data is gathered by randomly sampling each set of data separately under different conditions. The size of each group does not have to be equal.

Unpaired can have either an equal or unequal population variance. Because you generally don’t know the population variance, Unequal variances is used much more often.

Page 80: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Tests

(Comparing Two Means)

Factors to Consider: One-Sample vs. Two-SampleOne-Sample: You wish to compare your data to a known value or population mean.

• Measuring a school’s test scores against the national average.• Measuring birth weight in a certain city against a mean taken

50 years ago in the same city.

Page 81: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Tests

(Comparing Two Means)

Factors to Consider: One-Sample vs. Two-SampleTwo-Sample: The means was sampled independently from two different population groups.

• Comparing the mean of people on a new weight loss drug against the mean of a control group taking a placebo.

• Comparing the battery life of batteries from two different suppliers.

Page 82: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Tests with the data Analysis Tookpak

Page 83: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Test: Example 1

A researcher gave one group a drug to lower toxins in their cells and another

group a placebo. She wishes to test the hypothesis that the amount of toxins in

the group treated with the drug will be less than the group given the placebo.

Number of Tails?

One or Two Sample?

Paired or Unpaired?

“Ttest_Ex1” Sheet

One - Because we are testing that the difference will be one direction (less than), this is a one-tailed test.

Two Sample - Because the data comes from two independent samples (drug group and placebo group).

Unpaired- Two separate groups are measured. Also, we don't know the variance of the two populations so we will assume they are unequal.

Page 84: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Test: Example 1

A researcher gave one group a drug to lower toxins in their cells and another group a

placebo. She wishes to test the hypothesis that the amount of toxins in the group treated

with the drug will be less than the group given the placebo.

Because P is less than .05 (.029), we reject the null hypothesis. The difference is not due to randomness and the drug does reduce the amount of toxins in the blood.

Page 85: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Test: Example 2

A quality control manager wishes to see if there is any difference between the numbers of

defects per day produced by machines maintained by company employees versus those

maintained by an outside contractor. Over a 10 day period she has recorded the number of

defects per day produced by all in-house maintained machines and all outside contractor

maintained machines.

Number of Tails?

One or Two Sample?

Paired or Unpaired?

“Ttest_Ex2” Sheet

Two- Because we are testing for any difference (less than or greater than).

Two - Because the data comes from two independent samples, it is a two-sample test.

Unpaired- Two separate groups are measured. Also, we don't know the variance of the two populations so we will assume they are unequal.

Page 86: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Test: Example 2

A quality control manager wishes to see if there is any difference between the numbers of

defects per day produced by machines maintained by company employees versus those

maintained by an outside contractor. Over a 10 day period he has recorded the number of

defects per day produced by all in-house maintained machines and all outside contractor

maintained machines.

“Ttest_Ex2” Sheet

Because P is greater than .05 (.118), we accept the null hypothesis. There is no real difference between the two means.

Page 87: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Test: Example 3

A company wishes to see if a new drug it has developed for diabetes has any effect on

people's weight. They weighed several individuals before giving them the drug and then

again after giving them the drug.

Number of Tails?

One or Two Sample?

Paired or Unpaired?

“Ttest_Ex3” Sheet

Two- Because we are testing for any difference (less than or greater than).

Two - Because we are the data comes from two samples, it is a two-sample test.

Paired – We are weighting the same individual(s) both before and after the test

Page 88: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Test: Example 3

A company wishes to see if a new drug it has developed for diabetes has any effect on

people's weight. They weighed several individuals before giving them the drug and then

again after giving them the drug.

“Ttest_Ex3” Sheet

Because P is les than .05 (.0000103), we reject the null hypothesis. The drug does affect people’s weight.

Page 89: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Test: Example 4

"TV's R Us" is opening a store in Sunnydale, CA and wishes to see if the town’s average

differs from the national average of 290 minutes per day. The data from the 15 residents

they surveyed is below.

Number of Tails?

One or Two Sample?

Paired or Unpaired?

“Ttest_Ex4” Sheet

Two- Because we are testing for any difference (less than or greater than).

One (Sort of)- We are comparing our data with a national average so this is a one sample test; however, the Data Analysis Tookpakdoesn’t do have a one sample test so we must make a column of 290 and specify a Two Sample test.

Unpaired- Two separate groups are measured. Also, we don't know the variance of the two populations so we will assume they are unequal.

Page 90: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T-Test: Example 4

"TV's R Us" is opening a store in Sunnydale, CA and wishes to see if the town’s average

differs from the national average of 290 minutes per day. The data from the 15 residents

they surveyed is below.

“Ttest_Ex4” Sheet

Because P is les than .05, we reject the null hypothesis. Sunnydale’s viewing does differ from the national average.

Page 91: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T.Test() Function

T.Test(Array1,Array2,Tails,Type)

Tails has Two Choices1 for a one tailed test.2 for a two tailed test.

Type has Three Choices:1 for Unpaired and Unequal Variance.2 for Unpaired and Equal Variance.3 for paired

Page 92: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

T.Test() Function

Page 93: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Z-Test (When N>30)

Like a T-Test this is used to determine how similar two sets of data are. When the number of data points is greater than 30, use a Z-Test rather than a T-Test.

Requirements:• Variance of Samples should be close (F-Test).• Does not have to be normally distributed.• Samples sizes don’t have to be the same size

but should be close. • Data points should be independent of each

other.

Note that if you know the standard deviation of the population you should always use a Z-Test regardless of the sample size.

Page 94: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Z-Test (n>30)

Steps in Excel

Computer the Variance of both data setsVar.S(Range)

Perform a Z-TestData Analysis Toolpak

Page 95: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Z-Test (n>30)

Exercise 1

A company manufacturing “Easy-Start” file logs wants to know if a new manufacturing treatment increases the log's burning time. They have tested 41 of their old longs and 35 of their new logs and have recorded the burning times in the table shown to the right. They now wish to see if the difference they are seeing in the two means is due to randomness or the new logs actually do burn longer.

“Ztest_Ex1” Sheet

Page 96: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Z-Test (n>30)

Exercise 1

Step 1: Find the Variance for both the Old and New logs.

=VAR.S(A5:A45)And

=VAR.S(B5:B39)

“Ztest_Ex1” Sheet

Page 97: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Z-Test (n>30)

Exercise 1

Step 2: Perform a Z-Test to see if the samples differ

Because our P value for a one-tail test (0.00913) is less than 0.05, we will reject the null hypothesis that the means are the same. Or in other words, the difference in our means is not due to randomness and the new logs do burn longer than the older logs.

Page 98: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Linear Regression

Cause and Effect of Two Variables

Page 99: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Linear Regression - Slope

Rise Over Run

The slope of a line tells you how much a dependent variable (Sales) changes given the values of an independent variable (Advertising Costs) and is the RISE / RUN.

Slope = (Y2-Y1)/(X2-X1) = (230-200)/(20-10) = 30/10 = 3

A slope of 3 indicates that for every dollar of advertising we spend, our sales increase by 3 dollars.

Slope(Independent Range Y, Dependent Range X)

Page 100: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Linear Regression – Slope Exercise 1

For each dollar of advertising we spend, how much do our sales go up?

Use Excel’s Slope() functionIt uses the Least Squares Method (Line of Best Fit) to find the slope using multiple data points

“Slope_and_Intercept” Sheet

Page 101: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Linear Regression – Slope Exercise 1

For each dollar of advertising we spend, how much do our sales go up?

“Slope_and_Intercept” Sheet

SolutionFor every dollar of advertising we spend, our sales go up $2.86

Page 102: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Linear Regression – Y Intercept

The Y-Intercept is where your lines intersects the Y axis and tells you what the value of your Y coordinate would be if its corresponding X coordinate were zero. In this example, it tells us what sales would be if we spent nothing on advertising?

Intercept: y = mx+b

Where:y = the y coordinate of any data point on the line.x = the corresponding x coordinate for the y coordinate given above.m = the slope of the line.b = the y-intercept.

Intercept(Independent Range Y, Dependent Range X)

Page 103: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Linear Regression – Intercept Exercise 1

If we didn’t spend anything on advertising, what would our sales be?

Use Excel’s Intercepet() function to find the Y Intercpet.

Intercept(Independent Range Y, Dependent Range X)

“Slope_and_Intercept” Sheet

Page 104: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Linear Regression – Intercept Exercise 1 Solution

If we didn’t spend anything on advertising, what would our sales be?

“Slope_and_Intercept” Sheet

SolutionIf we spent nothing on advertising our sales would be $179,192

Page 105: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Plotting The Line

Page 106: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Plotting The Line

Page 107: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Plotting The Line

Meaning of R SquaredR2 is the square of the correlation and represents the percentage that the independent variable (advertising) "explains" the dependent variable (sales). Or in other words, checks to see how much advertising affects sales. It is always positive and between 0 and 1 with 1 being a 100% correlation and 0 being no correlation.

In this example R2 is .8548 (or 85.5%) which is a pretty strong correlation indicating that advertising does indeed affect sales. It also means there is some other unknown variable affecting 14.5% of sales.

Page 108: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Forecasting Future Values

(Linear Regression)

Hard

ware

Sale

s

New Home Startups

800K 900K 600K 400K 425K 500K

2.25M

2.0M

1.75M

1.5M

1.25M

1M

20

05

20

06

20

07

20

08

20

09

20

10

Regression Formula - Predicts a single data point.Forecast() - Predicts a single data point.Trend() - Predicts multiple data points.

Page 109: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise 1: Predicting a Y Value Using Slope,

Y-Intercept, and the corresponding X Value

𝒀 𝑽𝒂𝒍𝒖𝒆 = Slope * X value + Y Intercept Y=mX+b

“Regression_Ex1” Sheet

Forecast sales for the second half of the year based on sales for the first half using Y=mX+b

Page 110: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise 1: Predicting a Y Value Using Slope,

Y-Intercept, and the corresponding X Value

𝒀 𝑽𝒂𝒍𝒖𝒆 = 𝑺𝒍𝒐𝒑𝒆 ∗ 𝑿 𝑽𝒂𝒍𝒖𝒆 + Y Intercept Y=mX+b

“Regression_Ex1” Sheet

1. Calculate Slope in F2:=SLOPE(C2:C7,A2:A7)

2. Calculate Y-Intercept in F3:=Intercept(C2:C7,A2:A7)

3. Predict July Sales in C8:=Slope * X Value + Y Intercept=$F$2*A8+$F$3

4. Copy down.

Solution

Page 111: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise 2: Forecasting a Y Value using Forecast()

“Forecast_Ex2” Sheet

Forecast(x, Known y's, known x's)

X is the known matching independent variable for the dependent variable Y you are trying to predict.

Known Y's are the known dependent variables. (Hardware sales in this example.)

Known X's are the known independent variables. (New Home Start-ups in this example.)

We believe that the number of new home start-ups in our state has a strong impact on our sales and is a good indicator of what our hardware sales will be. Predict what our Hardware Sales will be in 2014.

Page 112: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise 2: Forecasting a Y Value using Forecast()

“Forecast_Ex2” Sheet

Forecast(x, Known y's, known x's)

X is the known matching independent variable for the dependent variable Y you are trying to predict.

Known Y's are the known dependent variables. (Hardware sales in this example.)

Known X's are the known independent variables. (New Home Start-ups in this example.)

SolutionWe believe that the number of new home start-ups in our state has a strong impact on our sales and is a good indicator of what our hardware sales will be. Predict what our Hardware Sales will be in 2014.

Page 113: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise 3: Forecasting a Y Value using Forecast()

“Forecast_Ex3” Sheet

Use Forecast() to predict future sales based upon Advertising Cost and Sales

Page 114: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise 3: Forecasting a Y Value using Forecast()

“Forecast_Ex3” Sheet

Use Forecast() to predict future sales based upon Advertising Cost and Sales

Solution

Page 115: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise 4: Forecasting a Y Value using Forecast()

“Forecast_Ex4” Sheet

How much should you spend on advertising to achieve your desired sales?

Switch what you use for your dependent and independent variable ranges.

Page 116: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise 4: Forecasting a Y Value using Forecast()

“Forecast_Ex4” Sheet

How much should you spend on advertising to achieve your desired sales?

Solution

Page 117: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Trend() Exercise 1: Using Trend() to Predict Multiple Values

“Trend_Ex1” Sheet

=Trend(Known Y's, Known X's, New X's, Constant T/F)

Known Y'sThese are the historic dependent variables (e.g Sales for the 1st

half of the year.)

Known X'sThese are the historic independent variables (e.g Ad Costs for the 1st

half of the year.)

New X'sThese are the independent variables we plan to use (e.g Ad Costs for the 2nd half of the year.)

Constant (Optional)If True or omitted, b is calculated normally. If False, b is set to 0 and m-values are adjusted so y=mx.

Trend() is an array function so:

a. Highlight where you want your results (C9:C14).b. Type your formula.c. Press Control + Shift + Enter

Page 118: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Trend() Exercise 1: Using Trend() to Predict Multiple Values

“Trend_Ex1” Sheet

=Trend(Known Y's, Known X's, New X's, Constant T/F)

Trend() is an array function so:

a. Highlight where you want your results (C9:C14).b. Type your formula.c. Press Control + Shift + Enter

Solution

Page 119: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Trend() Exercise 2: Using Trend() For Multiple Regression

“Trend_Ex2” Sheet

=Trend(Known Y's, Known X's, New X's, Constant T/F)

If the array of known Y values (E3:E13) are in a single column, then Trend() interprets each column of known X values (A3:D13) as a separate variable.

ExerciseIn this example we have an office building we wish to sell but we are unsure of what our asking price should be. To help us determine our asking price, we have gathered data on office buildings similar to ours and in our area that have sold recently (A3:E13).

Page 120: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Trend() Exercise 2: Using Trend() For Multiple Regression

“Trend_Ex2” Sheet

=Trend(Known Y's, Known X's, New X's, Constant T/F)

=Trend(E3:E13,A3:D13,A17:D17)

Page 121: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Chi-Square Test

Used to test how likely it is that observed data is due to chance. Example, we randomly asked 44 men and 44 women whether they preferred red or white wine. As the table to the right shows, it appears that men prefer red and women prefer white but is this true or just random chance?

Page 122: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

About Chi-Square

• Chi-Square Tests the Null HypothesisChi-square tests the null hypotheses (that there is no relationship between two measured phenomena) against our data. Therefore, you will need two data sets to compare: expected and actual. A value of less than .05 indicates the null hypotheses is weak and our assumed results is not due to randomness or in other words, or data does appear to reflect reality.

• Difference Between Chi-Square and a T or Z TestChi-Square tests the null hypotheses about a relationship between two (usually) variables. (e.g. do men and women have a preference for red wine, white wine, or beer?) The t-test tests the null hypothesis between two means. (e.g. whether boys and girls in kindergarten have the same average height.

Page 123: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Using Chi-Square

• Use Chi-Square with Categorized Data Only(i.e. Data that has been counted and divided into categories)

• Do not use Chi-Square with uncategorized continuous data such as height, weight, or scores. (However, you can use Chi-Square if you have categorized your ranges. For example: Count of ages 0-25, 26-50, 50-75, etc.)

Page 124: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Using Chi-Square

• Do Not use with Derived Data, Percentages, or Ratios(e.g. estimations, assumptions, averages or some other statistical manipulation)

• Do Not use Chi-Square if any Category is less than 5

• Variables must be Mutually Exclusive(e.g. a person cannot like both red and white wine. It is one or the other)

• Data in the Set Should Never be Excluded (e.g. if you asked people if they liked red wine, white wine or Champaign, you must use all three.

Page 125: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise 1: Chi-Square

We asked 44 men and 44 women if they preferred red or white wine. Our results are shown in the table to the right.

“Chi_Square_2x2” Sheet

Page 126: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise 1: Chi-Square

Testing our data involves three steps:

Step A: Categorize the Results

Step B: Build a Model of Expected Data Representing the Null Hypothesis

Step C: Perform Chi-Square Test

“Chi_Square_2x2” Sheet

Page 127: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise: Chi-SquarePart A: Categorize the Results Using CountifS()

“Chi_Square_2x2” Sheet

1. =COUNTIFS($A:$A,E$3,$B:$B,$D4)

2. Copy and Total

=CountifS(Criteria Range1, Criteria1, Criteria Range2, Criteria2)

Page 128: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise: Chi-SquarePart B: Build a Model of Expected Data Representing the Null Hypothesis

“Chi_Square_2x2” Sheet

Use the same column and row totals as above. We had 44 males and 44 females and 38 people liked red and 50 people linked white wine.

The Null Hypotheses states that there is no preference so we need the male to female figures to reflect that. In E11 type the formula below then copy.

=E$13*$G11/$G$11

Page 129: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Exercise: Chi-SquarePart C: Determine Chi-Square using Chisq.Test()

“Chi_Square_2x2” Sheet

=ChiSq.Test(E4:F5,E11:F12)

=ChiSq.Test(Actual Range, Expected Range)

.000107 is less than .05 indicating that the null hypothesis is weak and our results are not due to random chance.Or in other words, for the people who answered our survey, the assertion that more women preferred white wine while more men preferred red wine is valid and not due to chance.

Page 130: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

CorrelationIs there really a correlation between our X and Y variables?

Page 131: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Correlation

Is there really a correlation between our hardware sales and the number of new housing starts?

=Correl(Array1,Array2)

-1 -.75 -.5 -.25 0 .25 .5 .75 1

PerfectCorrelation

InverseCorrelation

NoCorrelation

Forecast_Ex2 sheet

Page 132: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Correl(Array1,Array2)

Forecast_Ex2 sheet

Correlation Exercise 1:Determine if there really is a correlation between our hardware sales and new home start ups.

Page 133: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Correl(Array1,Array2)

Forecast_Ex2 sheet

Solution to Correlation Exercise 1There is nearly a perfect correlation. New home start ups is an excellent predictor of the sales we can expect in our hardware store.

Page 134: EXCEL STATISTICAL ANALYSISinfo.marshall.usc.edu/.../Excel/Statistics_Workshop_V3.pdfUsing Sum() as an Array Sum(1^ 2,2^ ,3^ 2,4^ ,5^ 2,6^ ,7^ 2,8^ ,9^ 2,10^ ) 10 1 i2 Sum(A1:A10^2)

Correl(Array1,Array2)

Is there a correlation between Interest rates and average home price in the last few years?

Is there a correlation between the age of an automobile and its average resale value?