Excel Statistics 2010 Manual

download Excel Statistics 2010 Manual

of 29

description

Manual of Excel Statistics Usage. Detailed Explanation and suitable examples

Transcript of Excel Statistics 2010 Manual

  • 5/23/2018 Excel Statistics 2010 Manual

    1/29

    UCL

    INFORMATION SERVICES DIVISION

    INFORMATION SYSTEMS

    Document No. IS-113

    Excel

    Statistical

    Functions and

    Formulae

  • 5/23/2018 Excel Statistics 2010 Manual

    2/29

  • 5/23/2018 Excel Statistics 2010 Manual

    3/29

    Document No. IS-113 September 2008

    ContentsCREATING AND EDITING VALUES...................................................................................................................... 1

    CALCULATING A NEW VALUE............................................................................................................................................. 1

    RECODING A VARIABLE...................................................................................................................................................... 3

    MISSING VALUES................................................................................................................................................................ 3

    TASK:RECODING AND COMPUTING................................................................................................................................... 3

    CONDITIONAL FORMATTING OF DATA............................................................................................................ 4

    CONDITIONAL FORMATTING TO SHOW OUTLIERS.............................................................................................................. 4

    DESCRIPTIVE MEASURES...................................................................................................................................... 6

    MEASURES OF CENTRAL TENDENCY................................................................................................................ 7

    CALCULATING THE MEAN,MEDIAN OR MODE USING EXCEL FUNCTIONS......................................................................... 7

    USING FORMULAE IN CELLS TO CALCULATE DESCRIPTIVE STATISTICAL MEASURES............................................................. 8

    Mode.............................................................................................................................................................................. 8

    Median........................................................................................................................................................................... 8

    Mean.............................................................................................................................................................................. 8

    CALCULATING THE MEAN BY HAND................................................................................................................................... 8

    Summing the data values..................................................................................................................................................... 8

    Computing N................................................................................................................................................................... 8

    The mean........................................................................................................................................................................ 8

    TASK:SIMPLE CALCULATIONS AND DESCRIPTIVES............................................................................................................. 8

    MEASURES OF DISPERSION................................................................................................................................ 10

    Range........................................................................................................................................................................... 10

    Variance....................................................................................................................................................................... 10

    Calculating the sample variance by hand............................................................................................................................... 10

    Standard Deviation......................................................................................................................................................... 11

    Quartiles and the Interquartile Range.................................................................................................................................. 11

    CONDITIONAL FORMATTING TO SHOW OUTLIERS............................................................... ERROR!BOOKMARK NOT DEFINED.

    TASK:DISPERSION............................................................................................................................................................ 11

    INDICATORS OF SHAPE........................................................................................................................................ 12

    SKEWNESS........................................................................................................................................................................ 12

    KURTOSIS......................................................................................................................................................................... 12

    FREQUENCY............................................................................................................................................................ 13

    TASK:FREQUENCIES........................................................................................................................................................ 14

    MEASURES OF ASSOCIATION- CONTINUOUS VARIABLES........................................................................... 15

    CORRELATION COEFFICIENT........................................................................................................................................... 15

    Using an Excel function................................................................................................................................................... 15SIMPLE LINEAR REGRESSION........................................................................................................................................... 15

    Using Excel functions...................................................................................................................................................... 15

    MORE REGRESSION:VISUALISATION................................................................................................................................ 16

    Linear regression equations by hand.................................................................................................................................... 17

    Implicitly applying regression to the sample data..................................................................................................................... 17

    TASK:REGRESSION.......................................................................................................................................................... 18

    TRENDS.................................................................................................................................................................... 19

    CHI-SQUAREDNON-PARAMETRIC TESTING............................................................................................... 20

    TASK SIX:INDEPENDENCE OF NOMINAL VARIABLES........................................................................................................ 20

    THE ANALYSIS TOOLPAK..................................................................................................................................... 21

    ANOVA...................................................................................................................................................................... 22

  • 5/23/2018 Excel Statistics 2010 Manual

    4/29

    Document No. IS-113 September 2008

  • 5/23/2018 Excel Statistics 2010 Manual

    5/29

    Document No. IS-113 September 2008

    IntroductionThis workbook has been prepared to help you to:

    Manage and code data for analysis in Excel including recoding, computing new values and dealingwith missing values;

    develop an understanding of Excel Statistical Functions;

    learn to write complex statistical formulae in Excel worksheets.

    The course is aimed at those who have a good understanding of the basic use of Excel and soundstatistical understanding.

    It is assumed that you have attended the Introduction to Excel Formulae & Functions course or have agood working knowledge of all the topics covered on that course. In particular, you should be able to dothe following:

    Edit and copy formulae;

    Use built-in functions such as Sum, Count, Average, SumIf, CountIf and AutoSum;

    Use absolute and relative cell referencing;

    Name cells and ranges.

    You should also have some familiarity with basic statistical measures and tests. If you are uncertain aboutthe statistical knowledge assumed by the course you may wish to use the list of key terminology andsymbols to revise.

    Excel has a number of useful statistical functions built in, but there are also some caveats about itsstatistical computations. For this reason and to facilitate more flexibility, in this course we demonstratesome handcrafted techniques as well First we look at some techniques to help you manage data, then

    descriptive statistics, and measures of association (covering correlation and regression). We move on tosome special Excel functions using the goal seeking and solver techniques and then we introduce theAnalsysis ToolPak, which we demonstrate by way of a single factor Anova.

    This guide can be used as a reference or tutorial document. To assist your learning, a series of practicaltasks are available in a separate document. You can download the training files used in this workbookfrom the IS training web site at:www.ucl.ac.uk/isd/common/resources/

    We also offer a range of IT training for both staff and students including scheduled courses, one-to-onesupport and a wide range of self-study materials online. Please visit

    www.ucl.ac.uk/isd/common/resources/ for more details.

  • 5/23/2018 Excel Statistics 2010 Manual

    6/29

  • 5/23/2018 Excel Statistics 2010 Manual

    7/29

    UCL Information Systems 1 Creating and Editing Values

    Creating and Editing ValuesAlthough Excel doesnt provide the sophisticated data coding techniques of a specialist statisticalapplication, there are useful methods for accomplishing some common data management tasks.

    Calculating a new value

    Open the file results.xls. You will see the following data in sheet 1:

    In a spreadsheet we use the term rangeto mean a rectangle of data. A range might look like this forexample

  • 5/23/2018 Excel Statistics 2010 Manual

    8/29

    Creating and Editing Values 2 UCL Information Systems

    which is the range A1:D9; or like this

    which is the range A1:A18. We specify a range by giving its first cell, a semi-colon, and the last cell.You can name a rangethe simplest way is by highlighting a range and typing the name in the cellname box like this

    In a formula we can now refer to the range B2:B13 as maths. You should name the English and Historycolumns in the same way.

    We label column G Mean Resultand then enter the following formula in cell G2

    =sum(maths,english,history)/3

    and then copy the formula using the fill handle down to row 31. This will calculate the average examscore for each pupil.

  • 5/23/2018 Excel Statistics 2010 Manual

    9/29

    UCL Information Systems 3 Creating and Editing Values

    Recoding a variable

    Often analysis requires that we recodea variable. Sometimes this is straightforwardly because we wish,for example, to change the designation of gender asMor Fto 1or 2. On other occasions we wish tocollapse a continuous value variable into a categorical variable. In the latter case we should usuallyrecode into a new variable, ie non-destructively.

    To recode a continuous into a categorical variable we will use the iffunction to compute a new variableGenderin the results.xls spreadsheet that assigns each pupil to the valueM if the variable Sexhas value 1and the value Fif Sexhas the value 2.

    The general format of an IF statement is

    If(logical_test,value_if_true,value_if_false)

    In our example the formula could be this:

    =IF(B2=1,M,F)

    But notice that this would code any empty cells as Fwhich is probably not what we want. Be aware

    that we could have a nested IF statement and that if we do, our catch all, default condition comes asthe last argument of the nested IF. Suppose that we wish to recode the Maths score into three grades.Our formula might look like

    =IF(maths

  • 5/23/2018 Excel Statistics 2010 Manual

    10/29

    Conditional Formatting of Data 4 UCL Information Systems

    Conditional Formatting of Data

    Conditional formatting to show outliers

    It is often useful to identify atypical data valuesfor example outliers that are very much larger or very

    much smaller than the mean. Several characterisations of outlier have been proposed and in whatfollows I take an outlier to be a value less or greater than one and half times the interquartile rangefrom the mean. Consider

    Here two cells are coloured by conditional formatting because they are outliers by my definition. Theformulae in the cells are

  • 5/23/2018 Excel Statistics 2010 Manual

    11/29

    UCL Information Systems 5 Conditional Formatting of Data

    Then in the conditional formatting dialog enter the following

    The result is to highlight the outliers. You may also find it useful to highlight missing values.

  • 5/23/2018 Excel Statistics 2010 Manual

    12/29

    Descriptive measures 6 UCL Information Systems

    Descriptive measuresBelow is a list of common Excel functions used for descriptive statistical measures.

    Function What it does

    SUM(range)(SUMIF(range,criteria,sum_range)

    Adds a range of cellsAdds cells from sum_rangeif the condition specified in criteriaonrangeis met.

    AVERAGE(range) Calculates the mean (arithmetic average) of a range of cells

    MEDIAN(range) Calculates the median value for a data set; half the values in thedata set are greater than the median and half are less than themedian

    MAX(range) Returns the maximum value of a data set

    MIN(range) Returns the minimum value of a data set

    SMALL(range,k)LARGE(range,k)

    Returns the kth smallest or kth largest value in a specified datarange

    COUNT(range)COUNTA(range)COUNTBLANK(range)

    COUNTIF(range,value)

    Counts the number of cells containing numbers in a rangeCounts the number of non-blank cells within a rangeCounts the number of blank cells within a range

    Counts the number of cells in range that are the same as value.

    VAR(range) andVARP(range)

    Calculates the variance of a sample or an entire population(VARP); equivalent to the square of the standard deviation

    STDEV(range) and STEVP(range) Calculates the standard deviation of a sample or an entirepopulation (STDEVP); the standard deviation is a measure ofhow much values vary from the mean.

    Each of these can be accessed from the menu sequence Insert |Functionor using thefunction wizardor by writing a formula in a cell. Some of these are discussed in more detail below.

  • 5/23/2018 Excel Statistics 2010 Manual

    13/29

    UCL Information Systems 7 Measures of central tendency

    Measures of central tendencyThe most common measures of central tendency are the mean, medianand mode.

    Calculating the Mean, Median or Mode using Excel

    functionsFirst, open a spreadsheet containing the numeric data.

    Click on a blank cell where you will paste a function to calculate the mean, median or mode.

    Using the series fill function, enter the series of integer values 1 to 10 in cells A6 to A15.

    Next click on the function wizard button.

    From the drop down list Or select a category, selectStatistical.

    Click on Average to highlight it, then on OK.

    Using the mouse, I highlight the cells containing the data range just entered or you can select data byfirst clicking the collapse icons.

    These are the collapse icons and are used inselecting ranges in many Excel dialogues.

    Excel previews the result of applying the functionhere.

    Notice that as you fill in the ranges Excel previews the value that will result from applying the function.

    Click OK.

    The value of the mean will now appear in the blank cell you selected in step 2.

    To calculate the median or mode, follow the same procedure but highlight MEDIAN or MODE instep 4. Alternatively you can enter the formulae directly into spreadsheet cells as shown below. All thestatistical functions are accessed in the same way and have a similar interface.

  • 5/23/2018 Excel Statistics 2010 Manual

    14/29

    Measures of central tendency 8 UCL Information Systems

    Using formulae in cells to calculate descriptivestatistical measures

    ModeThe syntax for this computation is

    =mode(range)

    MedianThe syntax for this computation is

    =median(range)

    MeanThere is a built in Excel function that returns the mean as its value

    =average(range)

    It is often useful to put the result of this function into a suitably named cell in a spreadsheet.

    Calculating the mean by hand

    We will break down the formula

    into two parts: the summation of the values ofxand the calculation ofN.

    Summ ing the data values

    In a blank cell enter the formula =sum(range).

    Compu t ing N

    Before we calculate the mean, we need to find out the value of Nthe number of subjects orobservations. The way to do this in excel is to use the Count()function over the range of values. In ablank cell enter the formula =count(range).

    The mean

    The mean can now be calculated by the division of the sum of the xdata range divided byN. Weenter =sum(range)/count(range)

    Task: Simple Calculations and Descriptives

    Using results.xls

    Find the mean exam score for each subject (ie English, History, Maths);

    Find the median exam score in each subject

    Find the modal exam score in each subject.

    Find the mean and the mode againbut this time withoutusing the built in Excel modeand averagefunctions.

    You will need to use the functions

    o Frequency

    o Max

  • 5/23/2018 Excel Statistics 2010 Manual

    15/29

    UCL Information Systems 9 Measures of central tendency

    o Count

    o Sum

    Using medicaltrialX.xls

    What is the average score for hbefore for men?

    Use sumifand countif for this task. Sumif will sum just the scores where the gender variable indicatesmaleand countifwill count just those.

  • 5/23/2018 Excel Statistics 2010 Manual

    16/29

    Measures of Dispersion 10 UCL Information Systems

    Measures of Dispersion

    RangeThe range of a sample is the largest score minus the smallest score. This can be calculated using theExcel Formula

    =(max(range))-(min(range))

    Variance

    The variance is calculated as follows.

    N

    xxS

    2

    2

    gives the population variance and

    1

    2

    2

    N

    xx

    S

    gives the sample variance.

    This formula depends upon first calculating and N which we have already seen above. Indeed youwill see that this is just a variation on the formula for calculating the mean: it calculates the meansquared deviations.

    The Excel function to calculate the variance for a population is

    varp(range)

    And for a samplevar(range)

    You can access both from the function wizard or use them by typing formulae in cells.

    Calculat ing the sample variance by hand

    As with the mean we break down the formula into its constituent parts. We will calculate as=sum(range)/count(range)

    which is the mean of x (see above). Put this value into a blank cell. Next, for each value of x compute

    That is B1-average(B1:B31) for example in our data. Copy this formulain a new data range (lets imagineit is F1:F31 for this example). We then calculate for each of these its square which will give us

    ( )That is F1^2copied for each data item.

    We sum this range to get

    2 xx (That is the sum of squares) It is straightforward to divide this byN-1withNcalculated as above.

    X

  • 5/23/2018 Excel Statistics 2010 Manual

    17/29

    UCL Information Systems 11 Measures of Dispersion

    Standard Deviation

    The Standard Deviation is the square root of the variance. You could calculate it with the formula

    =sqrt(var(range))or by using the appropriate function, either stdev(range)or stdevp(range). Alternatively youcould calculate the variance by hand as above and take the square root.

    Quartiles and the Interquartile Range

    The quartiles can be found using

    =quartile(range,q)

    where qis just the rank of the quartile you require (first, second, third).

    The interquartile rangeis given by subtracting the first from the third quartiles:

    =quartile(range,3)-quartile(range,1)

    Task: dispersion

    Using medicaltrialX.xls

    What is the range of hbefore?

    What is the range of dh

    What is the variance of hafter?

    o Calculate this using the Excel functiondecide whether you should use varp orvar

    o Calulate this by hand using one of the formulae

    Variance (population) = N

    xx 2

    Variance (sample) =

    1

    2

    N

    xx

    What are the standard deviations of hbefore, hafter and dh?

  • 5/23/2018 Excel Statistics 2010 Manual

    18/29

    Indicators of Shape 12 UCL Information Systems

    Indicators of Shape

    Skewness

    To compute the degree of skewness Excel uses the formula

    () ()() Rather than calculate this by hand we simply not that Excel has a straightforward function that you canuse.

    =skew(range)

    The result is a signed numeric value. A negative result is indicative of negative skew, a positive resultof positive skew. The normal distribution with a skew of 0 is the reference value.

    Figure 1 from http://en.wikipedia.org/wiki/Skewness

    Kurtosis

    Excel calculates kurtosis according the formula:

    { ( )( )( )( )

    } ( )( )( )The function to compute kurtosis is

    =kurt(range)

    A negative value indicates a platykurtotic shape while a positive value is indicative of leptokurtoticdistributions. The normal distribution has a kurtosis of 0 and can be used as a reference value. Someindicative shapes are given inFigure 2.

    Figure 2 http://en.wikipedia.org/wiki/Kurtosis

  • 5/23/2018 Excel Statistics 2010 Manual

    19/29

    UCL Information Systems 13 Frequency

    FrequencyAnother useful Excel function isfrequency. Given a set a data and a set of intervals,frequencycounts howmany of the values in the data occur within each interval. The data is called a data arrayand theinterval set is called a bins array.

    The format for thefrequencyfunction is:frequency(data,bins)

    FREQUENCYis an array function. This means that the function returns a set of values rather than justone value. To enter an array function, the range that the array is to occupy must first be selected andthe function must be entered by pressing Shift+Ctrl+Enterinstead of just Enteror using the mouse.

    The following worksheet contains the examination results for 14 students. The numbers in the columnheaded Score Belowis the bins array.

    Before keying in the function, you must select the range of the array for the result. In this case it will be

    F8:F17.

    With this range selected, the following function is keyed into the Formula bar:

    =frequency(C4:C17,E8:E17)

    or entered in the dialog

    When you are ready be sure to end by pressing Shift+Ctrl+Enter.

  • 5/23/2018 Excel Statistics 2010 Manual

    20/29

    Frequency 14 UCL Information Systems

    The array is now filled with data. This data shows that no student scored below 30, 1 student scoredbetween 30 and 39, 3 between 40 and 49, 1 between 50 and 59, 3 between 60 and 69, 1 between 70 and79, 3 between 80 and 89, and 2 scored between 90 and 100.

    If any of the results are changed, the data in theNo. In Rangecolumn will be updated automatically.

    Task: frequencies

    Using results.xls

    What is the most frequent average exam score?

    Recode average exam scores into

    Stream A: scores of 60 and above

    Stream B: scores of 50 and above

    Stream C: scores below 50

    Determine how many pupils are in each stream.

    Recode maths scores into

    Maths Stream A: scores of 60 and above

    Maths Stream B: scores of 50 and above

    Maths Stream C: scores below 50

    Determine how many pupils are in each stream.

  • 5/23/2018 Excel Statistics 2010 Manual

    21/29

    UCL Information Systems 15 Measures of Association- continuous variables

    Measures of Association- continuousvariables

    Correlation Coefficient

    The Correlation Coefficient (for a sample) can be calculated according to the following formula:

    [ ()][ ()]We would build a complicated formula like this in stepsincrementally - having broken it down to itscomponent parts, each of which could be written simply using standard Excel features.

    Begin with the top part of the fraction. First, we notice that there are two data series x andytheseare normally represented by columns of data in our worksheet and it is best to name the ranges for usein calculationI assume you name themXand Y.

    We must compute nusing count(range). To compute rwe assume that count(X)= count(Y)so countingeither is sufficient.

    Now we require a new column containing the product ofX and Y, that is =(X*Y). From this columnwe could calculate the sum or depending on the complexity of a formula and your confidence youcould compute the complete top half of the formula as

    =(n*sum(X,Y))-((sum(X)*sum(Y))

    or you can continue to calculate in a piecemeal saving all the intermediate values you require If we havetime, we will construct this formula in the training session.

    Using an Excel function

    The Excel function is

    =correl(x range,y range)

    If you have built your own calculation of r, you can compare your result with that in the spreadsheetpearson.xls.

    Simple Linear Regression

    If the correlation coefficient indicates a sufficiently strong relationship (direct or inverse) betweenvariables, you may wish to explore that relationship using regression techniques.

    Using Excel functions

    The syntax to calculate each of the terms in the regression is as follows:

    Slope, m: =slope(y range,x range) y-intercept, b: =intercept(y range x range) Correlation Coefficient, r: =correl(x range, y range) R-squared, r2: =rsq(y range, x range)

    As an example, let's examine the equation of motion, vv iax 22

    2 for a car coming to a stop. If we

    measure the car's position and velocity we can determine its acceleration and its initial velocity with theuse of the slope( )and intercept() functions. The equation of motion has the form of bmxy , so if the

  • 5/23/2018 Excel Statistics 2010 Manual

    22/29

    Measures of Association- continuous variables 16 UCL Information Systems

    square of the car's velocity is plotted along the y-axis and its position along the x-axis, then the slope is

    a2 , and the y-intercept is simplyvi2

    . 1

    Note that the correl( )function was used to ensure that the data did display a linear trend -- otherwise,the slope and y-intercept values are meaningless! It is always a good idea to plot the data as well as usethese statistics functions because sometimes trends are not obvious. Additionally, a plot of the dataallows us to visualize the data and gross blunders and errant data points are easily detected. The graphbelow tells us immediately that our data appears reasonable.

    More Regression: visualisation

    Assuming two data series, x and y shown below, if we believe that there is a linear relationship betweenthe variables xandy, we can plot the data and draw a "best-fit" straight linethrough it. This relationshipis governed by the linear equationy=mx+b. We can then find the slope, m,andy-intercept, b,for thedata, which are shown in the figure below.

    1Note that in order to find the acceleration, we must divide the slope by 2 and to find the initial velocity, we must take thesquare root of the y-intercept.

  • 5/23/2018 Excel Statistics 2010 Manual

    23/29

    UCL Information Systems 17 Measures of Association- continuous variables

    Enter the above data into an Excel spread sheet, plot the data, create a trend line and display its slope,y-intercept and R-squared value. Recall that the R-squared value is the square of the correlationcoefficient.

    Enter your data as we did in columns B and C. The reason for this is strictly cosmetic as you will soonsee.

    Linear regression equations by hand.

    Given a set of data xi,yiwith ndata points, the slope and y-intercept, can be determined as follows andras discussed above.

    () () ()

    Implicitly applying regression to the sample data.

    It may appear that the above equations are quite complicated, however upon inspection, we see that

    their components are nothing more than simple algebraic manipulations of the raw data. We canexpand our spread sheet to include these components.

  • 5/23/2018 Excel Statistics 2010 Manual

    24/29

    Measures of Association- continuous variables 18 UCL Information Systems

    1. First, we add three columns that will be used to determine the quantities xy, x2andy2, for eachdata point.

    2. Now use Excel to count the number of data points, n. To do this, use the count() function asbefore.

    3. Finally, use the above components and the linear regression equations given in above tocalculate the slope (m),y-intercept (b)and correlation coefficient (r)of the data.

    The spread sheet will look like that below. Note that our equations for the slope, y-intercept andcorrelation coefficient are highlighted in yellow.

    These formulae give us the same results as Excels built in functions with a high degree or reliability.

    Task: regression

    Imagine that you are a University admissions tutor for the Department of History. A level results forthis years History exams have been lost! Investigate the possibility of reliably estimating a candidates

    History result from the results you have in results.xls. Could you reliably predict likely History resultsin any way?

    Create a scatterplot with trend line and error bars. The error bars should show plus or minus onestandard deviation.

  • 5/23/2018 Excel Statistics 2010 Manual

    25/29

    UCL Information Systems 19 Trends

    TrendsThe trendfunction is particularly useful. Using trend, it is possible to analyse a pattern of numbers, andpredict accurately the next number, using corresponding data. The function uses the knowninformation and finds a trend to predict the new information.

    The format of the trendfunction is:=trend(known ys, x range new xs)

    This worksheet contains data relating tothe number of people visiting givendestinations. TheAdvanced Booking, Hoursof Sunshine, andMean Temperaturewererecorded for each of the destinations(these are the x range. The number ofVisitorsfor each destination is recorded

    (the knownys). The Advanced Booking, Hours of Sunshine, and Mean Temperature were recordedfor Mexico (thenew xs). We want to predict the number of people who will visit Mexico using all theavailable data.

    Cell C10will hold the following formula: =trend(C4:C9,D4:F9,D10:F10)

    This function looks at the range D4:F9 and its relationship with the number of visitors (C4:C9). It thenapplies that relationship to the new information for Mexico (D10:F10) to predict the attendance forMexico, 83,426.

    If you change any of the data in the table, the figure for the number of visitors to Mexico will changeaccordingly.

  • 5/23/2018 Excel Statistics 2010 Manual

    26/29

    Chi-Squarednon-parametric testing 20 UCL Information Systems

    Chi-Squared non-parametric testingWe will use the chi-squared test on the results.xls data to determine whether class is associated withgender.

    Independence of nominal variables

    Make a tabulation of class against gender. You must have the data for the observed values in a singlerow (or column) array and the expected values in a single row (or column) array. So, assuming youhave columns M and F and class 1, 2, and 4, you should end up with

    How would you determine whether there is an association between these two variables?

    1. Compute the expected cell counts if the two variables are independent.

    2.

    Find the chi-squared statistic.You are looking for a result like this:

    1. Compute (observed countexpected count)2/(expected count) for each cell2. Sum the results. This is Pearsons chi-square statistic.

    Compute the p-value using Excels chidist function.

    Here is an example of the formulae required

    Check your result against Excels built in chitest()function.

  • 5/23/2018 Excel Statistics 2010 Manual

    27/29

    UCL Information Systems 21 The Analysis ToolPak

    The Analysis ToolPakMicrosoft Excel provides a set of data analysis tools - called theAnalysis ToolPak- that you can useto save time when you perform complex statistical analyses.

    You input the data and parameters for each analysis and Excel computes the appropriate statistical

    measures or test results and displays the results in an output table. Some tools generate charts inaddition to output tables.

    Before using an analysis tool, you must arrange the data you want to analyze in columns or rows onyour worksheet. This is your input range.

    If the Data Analysiscommand is not on the ExcelTools menu, you need to install the AnalysisToolPak:

    1. On theToolsmenu, clickAdd-Ins.

    2. Select theAnalysisToolPakcheck box.

    3. Install.

    To use the Analysis ToolPak:

    1. On theTools menu, click Data Analysis.

    2. In the Analysis Tools box, click the tool you want to use.

    3. Enter the input range and the output range, and then select the options you want:

    The Analysis ToolPak also contains the following tools: Anova

    Correlation analysis tool

    Covariance analysis tool

    Descriptive Statistics analysis tool

    Exponential Smoothing analysis tool

    Fourier Analysis tool

    F-Test: Two-Sample for Variances analysis tool

    Histogram analysis tool

    Moving Average analysis tool

    Perform a t-Test analysis

    Random Number Generation analysis tool

    Rank and Percentile analysis tool

    Regression analysis tool

    Sampling analysis tool

    z-Test: Two Sample for Means analysis tool

    In this section we will perform an single factor analysis of variance to demonstrate the use of theAnalysis ToolPak.

  • 5/23/2018 Excel Statistics 2010 Manual

    28/29

    Anova 22 UCL Information Systems

    AnovaAn ANOVA is a test for determining whether or not the level of the dependent is affected by the levelof the independent variable. You can use ANOVA to group cases by a categorical factor and thenobserve the effect of the factor on the independent variable.

    Once you are sure you have the Analysis ToolPak installed, open the file results.xls. We would like toknow if there is any significant difference between the mean scores in the three subjects, English,History and Maths. We cant use a student t-test because that test will only compare two groups ofscores.

    The F ratio is the measure produced by an ANOVA. It is the ratio of the variance between groups tothe variance within groups. If Fis significant then there is a model with a main effect.

    An ANOVA can be used to evaluate differences between data sets. It can be used with any number ofdata sets, recorded from any process. The data sets need not be equal in size. Data sets suitable for an

    ANOVA can be as small as three or four.

    Here is how you use an Excel ANOVA to determine whether class membership affects a pupils

    performance in a maths test. First the scores have to be tabulated by class in columns.

    Go to Tools and select Data Analysis as shown. If Data Analysis does not appear as the last choice onthe data tab of your ribbon, you must add it through the options section of backstage.

    Step 3. Click OK to the first choice, ANOVA: Single Factor.

  • 5/23/2018 Excel Statistics 2010 Manual

    29/29

    UCL Information Systems 23 Anova

    Step 4. Click and drag your mouse from the first class number (ie class 1) name to the last score in therectangle of data. This automatically completes the Input Range for you. Mine is $G$1:$H$11. Clickthe box labeled "Labels in First Row." Click New Worksheet Ply. Click OK.

    Step 5. Interpret the results by evaluating the F ratio. If the F ratio is larger than the F critical value, Fcrit, there is a statistically significant difference. If it is smaller than the F crit value, the scoredifferences are best explained by chance.

    The F ratio 0.42 is smaller than the F crit value 3.35. There is in this case no difference between themathematics test scores of the three classes. Excel calculates the p-value for you. Excel automaticallycalculates the average and the variance.