Experimental Statistics - week 4

Post on 02-Jan-2016

44 views 2 download

description

Experimental Statistics - week 4. Chapter 8: 1-factor ANOVA models Using SAS. EXAM SCHEDULE: Exam I – Take-home exam (handed out Thursday, March 3, due 8:00 AM Tuesday, March 8) Exam II – Take-home exam (handed out Thursday, April 14, - PowerPoint PPT Presentation

Transcript of Experimental Statistics - week 4

1

Experimental StatisticsExperimental Statistics - week 4 - week 4Experimental StatisticsExperimental Statistics - week 4 - week 4

Chapter 8: 1-factor ANOVA models

Using SAS

2

EXAM SCHEDULE: 

Exam I – Take-home exam (handed out Thursday, March 3, due 8:00 AM Tuesday, March 8) 

Exam II – Take-home exam (handed out Thursday, April 14, due 8:00 AM Tuesday, April 19) 

Final Exam – optional (scheduled for 8:00 AM – 11:00 AM Friday, May 6)  

GRADE COMPUTATION: 

Exam Grades (75%)Daily Assignments (25%)

3

ANOVA Table Output - hostility data - calculations done in class 

 

Source SS df MS F p-value 

Between 767.17 2 383.58 16.7 <.001  samples

Within 205.74 9 22.86  samples

Totals 972.91 

4

      

SPSS ANOVA Table for Hostility Data

5

ANOVA Models

Consider the random sample

Population has mean .

1 2, ,..., ny y y

1 2 35.5, 3.8, 6.0,y y y where etc.

1 2, ,...,

,

, 1,...,

n

i i

y y y

y i n

2

If is a sample from a population that is

normal with mean and variance then we

can write

Note:

Example:

6

11 1 11

12 1 12

y

y

We can write

etc.

For 1-factor ANOVA

7

Alternative form of the 1-Factor ANOVA Model

2 ' are (0, )ij s NID

General Form of Model: ij i ijy

(pages 394-395)

- random errors follow a Normal (N) distribution, are independently distributed (ID), and have zero mean and constant variance

1

0t

ii

Note:

i i

ij i ijy

1

1

t

iit

-- i.e. variability does not change from group to group

8

0 1 2:

:t

a

H

H

Testing the hypotheses:

at least 2 means a unequal

0 :

:a

H

H

is equivalent to testing the hypotheses:

9

Analysis of Variance TableAnalysis of Variance TableAnalysis of Variance TableAnalysis of Variance Table

2

0 2( 1, )B

TW

sH F F t n t

s We reject at significance level if

1F - if factor effects, we expect

2B is 22 estimates constant -

1F - if no factor effects, we expect ;

Recall:

In our model:2 2Ws estimates

Introduction to SAS Introduction to SAS Programming LanguageProgramming Language

11

Recall CAR DATA

For this analysis, 5 gasoline types (A - E) were to be tested. Twenty carswere selected for testing and were assigned randomly to the groups (i.e. the gasoline types). Thus, in the analysis, each gasoline type was tested on 4 cars. A performance-based octane reading was obtained for each car, and the question is whether the gasolines differ with respect to this octane reading.  

  A

91.7 91.2 90.9 90.6

B

91.7 91.9 90.9 90.9

C

92.4 91.2 91.6 91.0

D

91.8 92.2 92.0 91.4

E

93.1 92.9 92.4 92.4

12

 The CAR data set as SAS needs to see it:  A 91.7A 91.2A 90.9A 90.6B 91.7B 91.9B 90.9B 90.9C 92.4C 91.2C 91.6C 91.0D 91.8D 92.2D 92.0D 91.4E 93.1E 92.9E 92.4E 92.4

13

Case 1:  Data within SAS FILE : DATA one;INPUT gas$ octane;DATALINES;A 91.7A 91.2 . . . E 92.4E 92.4 ;PROC GLM; (or ANOVA) CLASS gas; MODEL octane=gas; TITLE 'Gasoline Example - Completely Randomized Design'; MEANS gas/duncans;RUN;PROC MEANS mean var;RUN;PROC MEANS mean var;class gas;RUN;

SAS file for CAR data

14

Brief Discussion of Components of the SAS File:

DATA Step

  DATA STATEMENT - the first DATA statement names the data set whose variables are defined in the INPUT statement -- in the above, we create data set 'one'

   INPUT STATEMENT - 2 forms

1.  Freefield - can be used when data values are separated by 1 or more blanks

       INPUT   NAME $  AGE SEX $   SCORE;          ($ indicates character variable)

  2.  Formatted - data occur in fixed columns

       INPUT    NAME $ 1-20  AGE 22-24  SEX  $ 26   SCORE 28-30;  

DATALINES STATEMENT       -  used to indicate that the next records in the file contain the actual data and the semicolon after the data indicates the end of the data itself  

15

SPECIFYING THE ANALYSISSPECIFYING THE ANALYSIS --  PROC STATEMENTS

 GENERAL FORM   PROC xxxxx; implies procedure is to be run on most recently created data set  PROC xxxxx  DATA = data set name; Note:  I did not have to specify DATA=one in the above example

  Example PROCs:

PROC REG - regression analysisPROC ANOVA - analysis of variance PROC GLM - general linear model PROC MEANS - basic statistics, t-test for H0:

PROC PLOT - plottingPROC TTEST - t-tests PROC UNIVARIATE - descriptive stats, box-plots, etc.

PROC BOXPLOT - boxplots

16

PROC GLMPROC GLMPROC GLMPROC GLM

• Proc GLM data = fn ;

– Class … ; List all the factors.

– Model … / options; e.g., model octane = gas;

– Means … / options;

– Run;

17

SAS SyntaxSAS SyntaxSAS SyntaxSAS Syntax

• Every command MUSTMUST end with a semicolon– Commands can continue over two or more lines

• Variable names are 1-8 characters (letters and numerals, beginning with a letter or underscore), but no blanks or special characters

– Note: values for character variables can exceed 8 characters

• Comments – Begin with *, end with ;

18

Titles and LabelsTitles and LabelsTitles and LabelsTitles and Labels

• TITLE ‘…’ ;– Up to 10 title lines: TITLE ‘include your title here’;

– Can be placed in Data Steps or Procs

• LABEL name = ‘…’ ;– Can be in a DATA STEP or PROC PRINT

– Include ALL labels, then a single ;

Note: For class assignments, place descriptive titles and labels on the output. Print the data to the output file.

19

Case 2:  Data in an External File

FILENAME f1 ‘complete directory/file specification’;  

FILENAME f1 ‘a:car.data';DATA one;INFILE f1; INPUT gas$ octane;PROC GLM; (or ANOVA) CLASS gas; MODEL octane=gas; TITLE 'Gasoline Example - Completely Randomized Design';RUN;PROC MEANS mean var;RUN;PROC MEANS mean var;class gas;run;

20

The SAS Output for CAR data:   Gasoline Example - Completely Randomized Design   General Linear Models Procedure Dependent Variable: OCTANE Sum of MeanSource DF Squares Square F Value Pr > F Model 4 6.10800000 1.52700000 6.80 0.0025 Error 15 3.37000000 0.22466667 Corrected Total 19 9.47800000  R-Square C.V. Root MSE OCTANE Mean  0.644440 0.516836 0.4739902 91.710000  Source DF Type I SS Mean Square F Value Pr > F GAS 4 6.10800000 1.52700000 6.80 0.0025 Source DF Type III SS Mean Square F Value Pr > F GAS 4 6.10800000 1.52700000 6.80 0.0025 

21

Text Format for ANOVA Table Output - car data 

 

Source SS df MS F p-value 

Between 6.108 4 1.527 6.80 0.0025  samples

Within 3.370 15 0.225  samples

Totals 9.478 19 

22

PC SAS on Campus

Library

BIC

Student Center

http://support.sas.com/rnd/le/index.html

SAS Learning Edition $125

23

1. Calculate the average, standard deviation, minimum, and maximum for the 20 octane readings. CS pp. 25 - 32

2. Graph a histogram of OCTANE. CS pp. 37

3. Calculate descriptive statistics in (1) above for OCTANE for each of the 5 gasolines. CS pp. 32-34

0 : A BH Run 4. t-test to test using GA S typesA and B. CS pp. 138-141

“Lab” AssignmentUsing CAR Data, run the following in this order with one set of code:

5. Plot side-by-side box plots for OCTANE for the 5 levels of the variable GAS

6. Compute a 1-factor ANOVA for the CAR data using only the first 3 GAS types. CS pp.150-155