Statistics manual

92
Statistics Handouts Page 1 of 92 MANUAL IN STATISTICS … statistics made simple … 19 th edition Ms. Yumi Vivien V. De Luna, MSME Subject Teacher

Transcript of Statistics manual

Page 1: Statistics manual

Statistics Handouts

Page 1 of 92

MANUAL

IN

STATISTICS … statistics made simple …

19th

edition

Ms. Yumi Vivien V. De Luna, MSME

Subject Teacher

Page 2: Statistics manual

Statistics Handouts

Page 2 of 92

TABLE OF CONTENTS

Exercise No. Title Page

1

2

3

4

5

6

7

8

9

Variables and the Summation Notation

Frequency Distribution Table

Numerical Descriptive Measures

Weighted Means

FPC, Combination and Permutation

Probability

Normal Distribution

Test of Hypothesis I

Test of Hypothesis II

6

17

31

38

54

60

68

77

80

Lesson No. Title Page

1

2

3

4

5

6

7

8

9

10

11

12

Methods of Data Collection and Presentation

Frequency Distribution Table

Numerical Descriptive Measures

Weighted Means

Sampling

FPC, Combinations and Permutations

Probability

Normal Distribution

Estimation

Test of Hypothesis

Two-way ANOVA

Pearson Moment Correlation

7

14

19

32

40

51

55

66

69

72

84

88

Page 3: Statistics manual

Statistics Handouts

Page 3 of 92

Sources/ References:

Concepts, sample problems and information given by this manual were taken from the following :

1. Fundamental Statistics for College Students by Pagoso, et al.

2. Graduate Research Manual – Guide to thesis and Dissertations (Aquinas Graduate School)

3. How to Design and Evaluate Research Education by Fraenkel and Wallen

4. Introduction to Statistics by Walpole

5. Introduction to Statistical Methods by Parel, Alonzo, et al.

6. Laboratory Manual in Statistics I, UPLB

7. Manual on Training on Microcomputer-Based for the Social Sciences (Richie Fernando Hall

AdeNU, 2005)

8. Statistics for the Health Sciences by Kuzma

9. Applied Basic Statistics by Flordeliza Reyes

10. Fundamental Concepts and Methods in Statistics by George Garcia

11. Simplified Statistics for Beginners by Dr. Cesar Bermundo

12. http://statistics.about.com/od/Descriptive-Statistics/a/What-Is-Kurtosis.htm

Page 4: Statistics manual

Statistics Handouts

Page 4 of 92

I. Statistics and its Scope

STATISTICS encompasses all the methods and procedures used in the

collection, presentation, analysis and interpretation of data. DESCRIPTIVE STATISTICS comprise those methods concerned with

collecting and describing a set of data so as to yield meaningful information.

STATISTICAL INFERENCE comprises those methods concerned with the analysis of a subset of data leading to predictions or inferences about the

entire set of data

Population vs Sample Population is the set of all entities and elements under study. Sample is

the subset of population.

Parameters vs Statistics

Parameters refer to all descriptive measures or characteristics of population while statistics refer to sample characteristics.

Census vs Survey

Census is the process of gathering information from every element of the population while survey is the process of gathering information from

every element of the sample.

II. Variables and its Level of Measurement Variable is an observable characteristics of a person or object which is

capable of taking several values or of being expressed in several different categories. It can be either quantitative (discrete or continuous) or qualitative data.

MEASUREMENT SCALES

a. Nominal – are simply labels, names or categories. Number assignment is used for identification purposes, no meaning can be attached to the

magnitude or size of such numbers. Examples are gender, civil status, telephone numbers, etc..

b. Ordinal - whereas nominal scales only classify, ordinal scales do not

only classify but also order the classes. Examples are job position, military ranks, etc...

c. Interval – quantitative but has no true zero point. Examples are IQ,

room temperature, etc... d. Ratio – quantitative and has true zero point. Examples are number of

children; physics test scores, etc…

Page 5: Statistics manual

Statistics Handouts

Page 5 of 92

SUMMATION NOTATION

For a given universe, suppose we observe a variable, say X. We may denote

the first value as X1, the second as X2 and so on. In general, Xi is the observation on variable X made on the ith individual.

Given a set of N observations or data values represented by X1, X2, …, XN, we express their sum as

where Σ is the summation symbol; i is the index of the summation; and

Xi is the summand.

1 is the lower limit N is the upper limit

Theorem 1. If c is a constant, then

Theorem 2. If c is constant, then

Theorem 3. If a and b are constants, then

∑( ) ∑

Page 6: Statistics manual

Statistics Handouts

Page 6 of 92

Exercise # 1 – Variables and the Summation Notation At the end of this exercise, the student must be able to: 1. identify different types of variables 2. classify data according to level of measurement

3. employ summation notation

I. Identify the level of measurement.

A. From all patients admitted in a hospital, the following information are collected:

1. name of patient

2. age

3. sex

4. body temperature 5. blood pressure

6. amt. of deposit

7. first time to see a doctor regarding ailment? (yes/no)

8. heartbeat per minute

9. weight

10. height 11. no. of glasses of fluid intake per day

12. no. of meals taken in a day

B. The following information are of interest for selected students of AdeNU who are cigarette

smokers. 1. age when first smoked

2. average no. of sticks consumed per day

3. main source of allowance

4. amt. of weekly allowance

5. Is your father a smoker? (yes/no)

6. occupation of father 7. brand of cigarette

8. position in the family

II. Instruction will be given by your teacher.

Date Set 1. Data on Head Circumference (in cm) and Foot Length (cm) of 8 Newborn Babies.

Baby no. 1 2 3 4 5 6 7 8

Head

circumference (x)

32 33 37 38 35 32 38 34

Foot length (y) 5.6 6.2 6.8 6.6 6.4 5.4 6.0 6.1

Data Set 2. Data on Height (cm) and Weight (lbs) of 8 Stat Students.

Student no. 1 2 3 4 5 6 7 8

Height(x) 168 141 165 180 165 156 150 147

Weight (y) 110 90 120 125 142 97 105 110

Page 7: Statistics manual

Statistics Handouts

Page 7 of 92

Lesson #1 – Methods of Data Collection and Presentation

METHODS OF DATA COLLECTION

Various methods for data gathering are available. A researcher should be able to use the most appropriate.

1. Survey Method – questions are asked to obtain information, either through self administered questionnaire or interview (personal, telephone or internet)

Ways Advantages Disadvantages

Personal Interview

Flexibility in obtaining

answers

More in-depth answers

Can observe the respondent’s behavior

expensive

field interviews are hard to control

errors in interviewing

time consuming

Mailed Questionnaires

wider geographic distribution of

respondents possible

respondents can answer

at their convenience

no personal interviewer’s

bias

centralized control o

people doing the survey

relatively inexpensive

respondent may be more candid if he/she can

answer anonymously

responses rate may be low

hard to obtain in-depth information

usable mailing list may be unavailable

respondent not the addressee

cannot observe respondent’s behavior

Phone Interview

relatively inexpensive

fast

centralized control of

people doing survey

respondents maybe more

candid

unlisted telephone

number

outdated telephone

directory

interview time needs

to be relatively short

selected sample may

not have telephones

Page 8: Statistics manual

Statistics Handouts

Page 8 of 92

2. Observation Method – makes possible the recording of behavior but only at a

time of occurrence (e.g., observing reactions to a particular stimulus, traffic count).

Advantages over Survey Method:

does not rely on the respondent’s willingness to provide information

certain types of data can be collected only by observation (e.g., behavior

patterns of which the subject is not aware of or ashamed to admit)

the potential bias caused by the interviewing process is reduced or

eliminated Disadvantages over Survey Method:

things such as awareness, beliefs, feelings and preferences cannot be observed

the observed behavior patterns can be rare or too unpredictable thus increasing the data collection costs and time requirements

3. Experimental Method – a method designed for collecting data under controlled conditions. An experiment is an operation where there is actual human interference with the conditions that can affect the variable under

study. This is an excellent method of collecting data for causation studies. If properly designed and executed, experiments will reveal with a good deal of

accuracy, the effect of a change in one variable on another variable. 4. Use of Existing Studies – e.g., census, health statistics, and weather bureau

reports

Two types:

documentary sources – published or written reports, periodicals, unpublished documents, etc.

field sources – researchers who have done studies on the area of

interest are asked personally or directly for information needed

5. Registration method – e.g., car registration, student registration, and

hospital admission

Page 9: Statistics manual

Statistics Handouts

Page 9 of 92

METHODS OF DATA PRESENTATION

1. Textual form – data are incorporated to a paragraph.

Advantages:

This method is appropriate only if there are few numbers to be presented.

Gives emphasis to significant figures and comparisons

Disadvantages:

It is not desirable to include a big mass of quantitative data in a “text” or

paragraph, as the presentation becomes incomprehensible.

Paragraphs can be tiresome to read especially if the same words are

repeated so many times

2. Tabular Presentation – systematic organization of data in rows and columns

Advantages:

More concise than textual presentation

Easier to understand

Facilitates comparisons and analysis of relationship among different categories

Presents data in greater detail than a graph

PARTS OF A STATISTICAL TABLE:

a. Heading – consists of a table number, title and head note. The title explains what are presented, where the data refers and when the data apply.

b. Box Head – contains the column heads which describes the data in each

column, together with the needed classifying and qualifying spanner heads.

c. Stub – these are classification or categories found at the left. It describes

the data found in the rows of the table.

d. Field – main part of the table

e. Source Note – an exact citation of the source of data presented in the table (should always be placed when figures are not original)

Page 10: Statistics manual

Statistics Handouts

Page 10 of 92

Illustration:

Table 4.4 Philippines Crime Volume and Rate by Type in 1991

1991

Type Volume Crime Rate

Total

Index Crimes

Murder

Homicide Physical Injury

Robbery

Theft

Rape

Non Index Crimes

11,326

77,261

8,707

8,068 21,862

13,817

22,780

2,026

44,065

195

124

8,707

8,069 21,862

13,817

88,780

2,026

71

Source: Philippines National Police

Guidelines:

Title should be concise, written in telegraphic style, not in complete

sentence

Column labels should be precise.

Categories should not overlap.

Unit of measure must be clearly stated

Show any relevant total, subtotals, percentages, etc..

Indicate if the data were taken from another publication by including a source note

Tables should be self-explanatory, although they may be accompanied by a paragraph that will provide an interpretation or direct attention to

important figures

BOXHEAD

d

STUB FIELD

HEADING

SOURCE NOTE

Page 11: Statistics manual

Statistics Handouts

Page 11 of 92

3. Graphical Presentation- a graph or chart device for showing numerical

values or relationship in pictorial form

Advantages:

main feature and implication of a body of data can be grasped at a glance

can attract attention and hold the reader’s interest

simplifies concepts that would otherwise have been expressed in so many words

can readily clarify data, frequently bring out hidden facts and relationship

Common Types of Graph

a. Line Chart – graphical presentation of data especially useful for showing trends

over a period of time.

b. Pie Chart – a circular graph that is useful in showing how a total quantity is

distributed among a group of categories. The “pieces of the pie” represent the proportions of the total that fall into each category.

c. Bar Chart – consists of a series of rectangular bars where the length of the bar

represents the quantity or frequency for each category if the bars are arranged horizontally. If the bars are arranged vertically, the height of the bar represents the quantity

d. Pictorial Unit chart – a pictorial chart in which each symbol represents a definite

and uniform value

Page 12: Statistics manual

Statistics Handouts

Page 12 of 92

THE STEM-AND-LEAF DISPLAY

The stem-and-leaf display is an alternative method for describing a set of data.

It presents a histogram-like picture of the data, while allowing the experimenter to retain the actual observed values of each data point. Hence, the stem-and-leaf display is partly tabular and partly graphical in nature.

In creating a stem-and-leaf display, we divide each observation into two parts,

the stem and the leaf. For example, we could divide the observation 244 as follows: Stem Leaf

2

Alternatively, we could choose the point of division between the units and

tens, whereby Stem Leaf

24 The choice of the stem and leaf coding depends on the nature of the data set.

Steps in Constructing the Stem-and –Leaf Display

1. List the stem values , in order, in a vertical column 2. Draw a vertical line to the right of the stem value 3. For each observation, record the leaf portion of that observation in the row

corresponding to the appropriate stem 4. Reorder the leaves fro lowest to highest within each stem row. Maintain

uniform spacing for the leaves so that the stem with the most number of

observations has the longest line. 5. If the number of leaves appearing in each row is too large, divide the stem into

two groups, the first corresponding to leaves beginning with digits 0 through 4 and the second corresponding to leaves beginning with digits 0 through 4 and the second corresponding to leaves beginning with digits 5 through 9. This

subdivision can be increased to five groups if necessary. 6. Provide a key to your stem-and-leaf coding so that the reader can recreate the

actual measurements from your display.

Page 13: Statistics manual

Statistics Handouts

Page 13 of 92

Example: Typing speeds (net words per minute) for 20 secretarial applicants

68 72 91 47

52 75 63 55

65 35 84 45

58 61 69 22

46 55 66 71

Stem Leaf (unit=1)

2 3

4 5 6

7 8

9

2 5

5 6 7 2 5 5 8 1 3 5 6 8 9

1 2 5 4

1

Note: The stem-and –leaf display should include a reminder indicating the units of the data value.

Example:

Unit = 0.1 1 2 represents 1.2 Unit = 1 1 2 represents 12

Unit = 10 1 2 represents 120

Page 14: Statistics manual

Statistics Handouts

Page 14 of 92

Lesson #2 – Frequency Distribution Table

Date Set. Given below is the distribution of statistics test scores of 50 students (Perfect score is 70

and passing score is 60% of it )

5

8

10

18

19

20

20

20

20

21

21

21

23

23

23

24

25

25

25

26

27

28

29

29

30

30

30

32

35

35

35

35

36

36

37

38

39

40

40

40

45

47

48

49

50

55

58

59

60

70

Steps in the construction of frequency distribution: 1. Determine the range R of the distribution.

R = highest observed value – lowest observed value = 70 - 5 = 65

2. Determine the number of classes, k, desired. By the square root rule

K = N , where N = total number of observations

= √ K 7

the number of classes is to be rounded off to the nearest WHOLE NUMBER.

3. Calculate the class size, c.

First find: c’ = R/K =

The class size is to have the SAME PRECISION AS TO THE RAW DATA and should take the value nearest to c’. Hence, c’ = 9

4. Enumerate the classes or categories based on the quantities calculated in steps

1-3 bearing in mind that:

a) the lowest class must include the lowest observed value and the highest class,

the highest observed value. (The lowest value of the data is the lower class

limit of the first class). b) That each observation will go into one and only class (that none of the values

can fall into possible gaps between successive classes and that the classes do not overlap).

Successive lower class limits may be obtained by adding c’ to the preceding lower class limit. And so with the upper limits.

Page 15: Statistics manual

Statistics Handouts

Page 15 of 92

I. Tally the observations to determine the class frequency or the number of observations falling into each class.

Classes Frequency 5 - 13 3

14 - 22 9

23 - 31 15

32 - 40 13

41 - 49 4 50 - 58 3

59 - 67 2

68 - 76 1

II. Add other informative columns.

1. True Class Boundaries (TCB) – remove discontinuity between classes and

consider the true range of values.

(Lower TCB) LTCB = LL – 0.5 (unit)

(Upper TCB) UTCB = UL + 0.5(unit)

a unit depends on the precision of data

example. 1st class: LTCB = 5 - 0.5(1) = 4.5 UTCB = 13 + 0.5(1) = 13.5

Note:

If data Unit of precision

is a whole number has 1 decimal place

has 2 decimal places

1 0.1

0.01

2. Class Mark (CM) = the center of a class. It is the midpoint of the class

interval where observations in a class tend to cluster about.

CM = ( )

3. Relative Frequency (RF) – proportion of observations falling in one class (in

%)

RF =

x 100%

Page 16: Statistics manual

Statistics Handouts

Page 16 of 92

FREQUENCY DISTRIBUTION TABLE

Classes

LL UL

True Class

Boundaries (TCB)

LTCB UTCB

CM

Freq

RF (%)

CF

< >

RCF

< >

5 - 13

14 - 22

23 - 31

32 - 40

41 - 49 50 - 58

59 - 67

4.5 - 13.5

13.5 - 22.5

22.5 - 31.5

31.5 - 40.5

40.5 - 49.5 49.5 - 58.5

58.5 - 67.5

9

18

27

36

45 54

63

3

9

15

13

4 3

2

6

18

30

26

8 6

4

Page 17: Statistics manual

Statistics Handouts

Page 17 of 92

Exercise # 2 – Frequency Distribution Table Objectives: At the end of the exercise, the student is expected to: 1. describe the different methods of data presentation; 2. organize data by constructing a frequency distribution table

A. On organizing data: Construct an FDT for the given data. Show computations for R, K and c.

Table 1. Stat Midterms Scores of Section N3 Students, 1

st sem 2014

Student # Scores

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

12 17 15 38 38 38 43 47 47 47 53 57 57 58 58 60 62 67 70 70 70 72 73 77 77 77 77 77 78 80 82 87 87 92 92 93 93 93 95 100

Page 18: Statistics manual

Statistics Handouts

Page 18 of 92

Table 2.

Average Life Expectancy of 30 Selected Countries 2011 (source: www.statista.com)

Country

Life Expectancy

Hongkong

Japan

Italy Iceland

Switzerland

France

Spain

Singapore Australia

Israel

Sweden

United Kingdom

Norway

Luxembourg South Korea

Canada

New Zealand

Netherlands

Austria Ireland

Belgium

United States

Poland

Mexico

China Indonesia

Philippines

India

Cambodia

South Africa

84

83

83 83

83

82

82

82 82

82

82

82

81

81 81

81

81

81

80 80

80

79

77

74

73 69

69

66

65

53

Page 19: Statistics manual

Statistics Handouts

Page 19 of 92

Lessons # 3 – Numerical Descriptive Measures

NUMERICAL DESCRIPTIVE MEASURES

I. Measure of Location – value within the range of the data which describes its location or position relative to the entire set of data. The more common measures are measures of central tendency, percentile, decile and quartile.

A. Measure of Central Tendency – describes the “center” of the data. It is a

single value about which the observations tend to cluster. The common

measures are mean, median and mode.

Characteristics

When to Use

1. Mean ( ) – sum of the observations

divided by the number of observations totaled

interval statistic

calculated average

value is determined

by every case in the distribution

affected by extreme values

variables are in at least

interval scale

value of each score is

desired

values are considerably

concentrated or closed to each other

2. Median (Md) – middle value of an array

ordinal statistic

rank or position

average

not affected by

extreme values

ordinal interpretation is needed

middle score is desired

presence of extreme

values

3. Mode (Mo) –

observations which occurs most

frequently in the data set

nominal statistics

inspection average

not unique; have more than one mode

most “popular” score

unaffected by

extreme values

represents the

majority

nominal interpretation

is needed

quick approximation of

central tendency desired

Page 20: Statistics manual

Statistics Handouts

Page 20 of 92

B. Percentile (Pi) – divides the data set into 100 equal parts, each part having

one percent of all the data values. For example, if patrick received a rating of 90th percentile in the National Secondary Achievement Test, this means

that 90% of the students who took the test had scores lower than Patrick’s. C. Decile (Di) – divides a data set into ten equal parts, each part having ten

percent of all data values. The first decile is the 10 th percentile, the second decile is the 20th pe4rcentile, and so on, up to the tenth decile which is the

100th percentile.

D. Quartile (Qi) – divides a data set into four equal parts, each part having

twenty-five percent of all data values. The first quartile is the 25th percentile, the second is the 50th percentile, the third is the 75th percentile,

and the fourth quartile is the 100th percentile.

II. Measure of Dispersion – describes the extent to which the data are dispersed.

The more commonly used measures are:

A. Range (R)

- not a stable measure of variation because it can fluctuate greatly with a change in just a single score, either the highest or the lowest

- easiest to compute but the LEAST SATISFACTORY because its value is dependent only upon the two extremes

B. Variance (s2/ ) - considers the position of each observation relative to the mean

of the set; denoted by 2

C. Standard Deviation (s/) - best measure of variation - important as a measure of heterogeneity or unevenness within

a set of observations - used when comparing two or more sets of data having the

same units of measurement

D. Coefficient of Variation ( CV )

- used to compare the variability of 2 or more sets of data even when the observations are expressed in different units of measurement.

Page 21: Statistics manual

Statistics Handouts

Page 21 of 92

III. Measure of Skewness (SK) – describes the extent of departure of the

distribution of the data from symmetry.

SK = 0, Symmetric Distribution

the median is the score pt. which bisects

the total area. Half of the area would fall to the left and half to the right

mode is the score pt. with the highest

frequency, the pt. on the x-axis corresponds to the tallest pt. of the curve

mean is the score pt on the x-axis that corresponds to the pt. of balance

SK > 0, Positively Skewed

bump on the left indicates that the mode

corresponds to a low value

tail extending to the right means that the

mean, which is sensitive to each score value, will be pulled in the direction of the extreme scores and will have a high

value

median which is unaffected by extreme

values will have a value between the mode and the mean

SK < 0, Negatively Skewed

mean will have a lower numerical value

than the median because the extremely low scores will pull the mean to the left

bump usually occurs at the right indicating that the mode has a high

numerical value

median will still be in the middle

Page 22: Statistics manual

Statistics Handouts

Page 22 of 92

IV. Measure of Kurtosis – measures the degree of peakedness of a data of distribution, denoted by k. If the distribution of the data is bell-shaped, k=3. If

the shape of the distribution is relatively peaked, k>3. If the shape is relatively flat, k<3.

K= 3

A distribution that is peaked in the same way as any normal distribution, not just the standard

normal distribution, is said to be mesokurtic. The peak of a mesokurtic distribution is neither high nor low, rather it is considered to be a baseline for

the two other classifications.

K> 3

A leptokurtic distribution is one that has kurtosis

greater than a mesokurtic distribution. Leptokurtic distributions are identified by peaks that are thin and tall. The tails of these

distributions, to both the right and the left, are thick and heavy. Leptokurtic distributions are

named by the prefix "lepto" meaning "skinny."

K<3

The third classification for kurtosis is platykurtic. Platykurtic distributions are those that have a

peak lower than a mesokurtic distribution. Platykurtic distributions are characterized by a certain flatness to the peak, and have slender

tails. The name of these types of distributions come from the meaning of the prefix "platy"

meaning "broad."

Page 23: Statistics manual

Statistics Handouts

Page 23 of 92

FORMULAS FOR UNGROUPED DATA

Data Set 1: 115 115 120 120 120 125 125 130 300

Data Set 2: 115 115 120 120 120 125 125 125 130 130

Numerical Measures

Computation

Data 1 Data 2

1. Mean = = Xi/N

2. Median

3 .Mode is determined by mere

inspection.

4. Variance

2 = Xi2 - 2

N Where is the mean of the ungrouped data

5. Standard Deviation = positive

square root of variance

6. Coefficient of Variation

CV = [ / ] x 100%

7. Measure of Skewness

SK =

)(3 MedianMean

Page 24: Statistics manual

Statistics Handouts

Page 24 of 92

Numerical Measures

Computation

Data 1 Data 2

8. Pi =

9. Di =

10. Qi =

Note: MEDIAN

If n is odd, the median position equals (n=1)/2, and the value of the (n+1)2th

observation in the array is taken as the median, i. e.,

Md = X( [n/1] / 2)

If n is even, the mean of the two middle values in the array is the median, i.e.,

Md =

where n is the number of samples

Page 25: Statistics manual

Statistics Handouts

Page 25 of 92

FORMULAS FOR GROUPED DATA

Data Set

TCB

LTCB UTCB

CM

(Xi)

Freq

(fi)

CF

<

Σ fi Xi = 256.7

Σ fi Xi2 = 1783.7

2.65 – 3.75 3.75 – 4.85

4.85 – 5.95 5.95 – 7.05 7.05 – 8.15

8.15 – 9.25

3.2 4.3

5.4 6.5 7.6

8.7

5 4

8 3 12

8

5 9

17 20 32

40

40

Numerical Measures

Computation

1. mean () = fiXi, where

N

fi = frequency of the ith class

Xi= classmark of the ith class N = total no. of observation

K = number of classes

2. median (Md)

= LTCBMd + c

Md

b

F

CFN

2

where LTCBMd = LTCB of the median class

C = class size <CFb = <CF of the class preceding

median class

FMd = frequency of the median class

N = total number of observations

NOTE: the middle class is the class which

contains the (n/2)th value of the array

Page 26: Statistics manual

Statistics Handouts

Page 26 of 92

3. mode (Mo)

= LTCBMo + c

abMo

bMo

FFF

FF

2

where LTCBMo = LTCB of the modal class C = class size FMo = frequency of the modal class Fb = frequency of the class preceding the

modal class Fa = frequency of the class following the

modal class

NOTE: the modal class is the class with the highest frequency

4. Variance ( 2)

= N

fiXi 2

2 where

fi = freq. Of the ith class

Xi= classmark of the ith class

N = total number of observations

2G = mean of the grouped data

5. Standard deviation () = positive

square root of the variance

6. Coefficient of Variation

CV = [ / ] x 100%

7. Measure of Skewness

SK =

)(3 medianmean

Page 27: Statistics manual

Statistics Handouts

Page 27 of 92

7. Percentiles

Pi = LTCBPi + c

Pi

b

F

CFNi

)100

(

where LTCBPi = LTCB of the PI class

C = class size

<CFb = <CF of the class preceding Pi class

FMd = frequency of the PI class

N = total number of observations

8. Deciles

Di = LTCBDi + c

Di

b

F

CFNi

)10

(

where LTCBDi = LTCB of the Di class

C = class size

<CFb = <CF of the class preceding Di

class

FMd = frequency of the Di class

N = total number of observations

9. Quartiles

Qi = LTCBQi + c

Qi

b

F

CFNi

)4

(

where LTCBQi = LTCB of the Qi class

C = class size

<CFb = <CF of the class preceding Qi

class

FMd = frequency of the Qi class

N = total number of observations

Page 28: Statistics manual

Statistics Handouts

Page 28 of 92

FORMULAS:

1. Mean () = N

fixi

2. Median (Md) = LTCBMd + c

Md

b

F

CFN

2

7. Variance (2) =

N

fiXi 2

2

3. Mode (Mo) = LTCBMo + c

abMo

bMo

FFF

FF

2

8. Standard Deviation () = iancevar

4. Pi = LTCBPi + c

Pi

b

F

CFNi

)100

(

9. CV = %100x

5. Di = LTCBDi + c

Di

b

F

CFNi

)10

(

10. SK =

)(3 medianmean

6. Qi = LTCBQi + c

Qi

b

F

CFNi

)4

(

Page 29: Statistics manual

Statistics Handouts

Page 29 of 92

THE BOXPLOT

Definition. The boxplot is a graph that is very useful for displaying the following features of the data:

Location

Spread

Symmetry

extremes

outliers

Steps in Constructing Boxplot

1. Construct a rectangle with one end of the first quartile and the other end at

the third quartile 2. Put a vertical line across the interior of the rectangle at the median 3. Compute for the interquartile range (IQR), lower fence (FL) and the upper fence

(FU) given by: IQR = Q3 – Q1

FL = Q1 – 1.5 IQR FU = Q3 – 1.5 IQR

4. Locate the smallest value contained in the interval [FL , Q1]. Draw a line from

this value to Q1. 5. Locate the largest value contained in the interval [Q3 , FU]. Draw a line from

this value to Q3. 6. Values falling outside the fences are considered outliers and are usually

denoted by “x”

Remarks:

1. The height of the rectangle is arbitrary and has no specific meaning. If several

boxplots appear together, however, the height is sometimes made proportional to the different sample sizes.

2. If the outlying observation is less than Q1 – 3 IQR or greater than Q3 + 3 IQR it

is identified with a circle at their actual location. Such an observation is called a far outlier.

Page 30: Statistics manual

Statistics Handouts

Page 30 of 92

Examples:

1. Data Set A: 1 15 21 22 24

10 18 22 23 25 14 20 22 24 28

2. Data Set B: 3 10 11 12 19 8 10 12 16 19

9 10 12 16 30

Page 31: Statistics manual

Statistics Handouts

Page 31 of 92

More Problems: 1. Suppose a teacher assigns the following weights to the various course requirements:

Assignment 15% Project 25% Midterms 20% Finals 40%

The maximum score a student may obtain for each component is 100. Sheila obtains marks of 83 for assignment, 72 for project, 41 for midterms and 49 for the finals. Find her mean mark for the score.

2. Two of the quality criteria in processing butter cookies are the weight and color

development in the final stages of oven browning. Individual pieces of cookies are scanned by a spectrophotometer calibrated to reflect yellow-brown light. The readout is expressed in per cent of a standard yellow-brown reference plate and a value of 41 is considered optimal (golden-yellow). The cookies were also weighed in grams at this stage. The means and standard deviations of 30 sample cookies are presented below.

Mean sd

Color 41.1 10 Weight 17.7 3.2

Which of the two quality criteria is more varied? 3. The following are weight losses (in pounds) of 25 individuals who enrolled in a five-week

weight-control program:

2 3 3 4 4 4 5 5 6 7 7 8 8 8 9 9 9 9 10 10 10 11 11 11 12

Compute for the 3rd quartile, 7th decile, and 89th percentile.

Page 32: Statistics manual

Statistics Handouts

Page 32 of 92

Exercise #3 – Numerical Descriptive Measures Objectives: A1t the end of the exercise, the student is expected to identify and compute appropriate numerical descriptive measures for ungrouped and grouped data, specifically,

measure of central tendency

measure of dispersion; and

measure of skewness

A. Using your raw data set and the FDT you constructed in exercise # 2, compute for the appropriate descriptive measures (ungrouped and grouped). Show solution for grouped data only.

B. Construct these tables in your worksheets and summarize the values obtained.

I. Measure of Central Tendency

Mean Median Mode

ungrouped grouped ungrouped grouped ungrouped grouped

II. Measure of Dispersion

Range Variance Standard Deviation Coeff. Of Variation ungrouped grouped Ungrouped grouped ungrouped Grouped ungrouped Grouped

III. Measure of Skewness

ungrouped Grouped

IV. Fractiles

P90 D6 Q3

Ungrouped Grouped Ungrouped Grouped Ungrouped Grouped

Page 33: Statistics manual

Statistics Handouts

Page 33 of 92

Lesson # 4 – Weighted Means

Weighted Means

Weighted Mean is a statistical measure obtained when data is gathered from a survey

questionnaire using the Likert Scale

A Likert scale is a psychometric scale commonly used in questionnaires and is the most widely

used scale in survey research. When responding to a Likert questionnaire item, respondents

specify their level of agreement to a statement.1 A Likert item is simply a statement the

respondent is asked to evaluate according to any kind of subjective or objective criteria.

Generally, the level of agreement or disagreement is measured. Often five ordered response

levels are used, although many psychometricians advocate using seven or nine levels. A recent

empirical study2 found that a 5- or 7- point scale may produce slightly higher mean scores

relative to the highest possible attainable score, compared to those produced from a 10-point

scale, and this difference was statistically significant.

Strategies: 5- Very Effective, 4- Effective,3-Moderately effective/Undecided,…

Practices: 5- Highly Observed/Always/Fully Aware, 4- Observed/Sometimes/Aware,…

Traits/Attitudes: 5-Very Evident, 4-Somewhat Evident, 3-Undecided, 2-Somewhat inevident, 1-

Not evident

1 http://en.wikipedia.org/wiki/Likert_scale

2 Dawes, John (2008). "Do Data Characteristics Change According to the number of scale points used? An experiment using

5-point, 7-point and 10-point scales". International Journal of Market Research 50 (1): 61–77.

Page 34: Statistics manual

Statistics Handouts

Page 34 of 92

Table 1. Illustration of a Likert Scale Questionnaire

Research Title: Solid Waste Management of Ateneo de Naga University

Below is a list of Solid Waste Management practices. Please check the boxes with the

appropriate number corresponding to your chosen answer as to how these are practices are observed. Scale: 5 - Very High 4 - High 3 - Moderate 2 - Low 1 - Very Low

5 4 3 2 1

A. GENERATION OF WASTE

Ateneo de Naga University

1.Provides information through campaigns or

seminars about solid waste generation

2. Introduces strategies on how to apply the 4R's

( Reuse, Recycle, Reduce and Respond ) of Solid Waste

Management

3. Provides campaign to patronize the use of reusable

and recycled materials

4. Rejects products which are harmful to the

environment such as foam, styrofoam, CFC aerosols,

oil-based paints, pesticides, insecticides, plastics,

wood preservatives, glues and adhesives

5. Encourages the use of unused side of old papers or

recycles its own paper ( as shown by the exam papers

used, handouts, memo, letters, etc)

6. Encourages or requires the use of refillable inks for pens, ballpens, printers, etc..

7. Allows the use of old notebooks from previous years

instead of requiring new ones

8. Encourages to reuse envelopes, boxes, packaging

materials and folders

9. Repairs or disposes defective computers in

laboratories or offices

Page 35: Statistics manual

Statistics Handouts

Page 35 of 92

Table 2. Tallied Data

5 4 3 2 1 Weighted

Means

A. GENERATION OF WASTE

Ateneo de Naga University 1.Provides information through campaigns or seminars about solid waste generation 2. Introduces strategies on how to apply the 4R's ( Reuse, Recycle, Reduce and Respond ) of Solid Waste Management 3. Provides campaign to patronize the use of reusable and recycled materials 4. Rejects products which are harmful to the environment such as foam, styrofoam, CFC aerosols, oil-based paints, pesticides, insecticides, plastics, wood preservatives, glues and adhesives 5. Encourages the use of unused side of old papers or recycles its own paper ( as shown by the exam papers used, handouts, memo, letters, etc) 6. Encourages or requires the use of refillable inks for pens, ballpens, printers, etc.. 7. Allows the use of old notebooks from previous years instead of requiring new ones 8. Encourages to reuse envelopes, boxes, packaging materials and folders 9. Repairs or disposes defective computers in laboratories or

offices

0

2

6

0

7

1

2

6

0

6

8

8

5

6

1

3

11

2

12

10

22

7

12

4

4

18

3

38

29

38

34

33

41

42

27

43

64

71

46

74

62

73

69

53

72

Cumulative Weighted Mean

Source: Valenzuela 2007, p.66

Page 36: Statistics manual

Statistics Handouts

Page 36 of 92

Table 3

Adjectival Interpretation of the Likert Scale (cumulative mean)

Rating Scale

Range

Interpretation

5 4 3

2 1

4.20 – 5.00

3.40 – 4.19

2.60 – 3.39

1.80 – 2.59

1.00 – 1.79

Very High – Almost all indicators are

practiced

High – 75% of the indicators were practiced

Moderate – 50% of the indicators were

practiced

Low – 25% of the indicators were practiced

Very Low – almost none of the indicators were practiced

Table 4

Adjectival Interpretation of the Likert Scale (per item)

Rating Scale

Range

Interpretation

5 4 3 2 1

4.20 – 5.00

3.40 – 4.19

2.60 – 3.39

1.80 – 2.59

1.00 – 1.79

Very High – Almost all respondents practice

the said indicator

High – 75% of the respondents

Moderate – 50% of the respondents

Low – 25% of the respondents

Very Low – almost none of the respondents…

Page 37: Statistics manual

Statistics Handouts

Page 37 of 92

Table 5 .

Extent of Solid Waste Management in AdeNU ( faculty and students) , 2007

Weighted

Mean

Interpretation

A. GENERATION OF WASTE

Ateneo de Naga University

1.Provides information through campaigns or seminars about

solid waste generation

2. Introduces strategies on how to apply the 4R's ( Reuse,

Recycle, Reduce and Respond ) of Solid Waste Management

3. Provides campaign to patronize the use of reusable and recycled materials

4. Rejects products which are harmful to the environment such as

foam, styrofoam, CFC aerosols, oil-based paints, pesticides,

insecticides, plastics, wood preservatives, glues and adhesives

5. Encourages the use of unused side of old papers or recycles its

own paper ( as shown by the exam papers used, handouts, memo,

letters, etc)

6. Encourages or requires the use of refillable inks for pens, ballpens, printers, etc..

7. Allows the use of old notebooks from previous years instead of

requiring new ones

8. Encourages to reuse envelopes, boxes, packaging materials and folders

9. Repairs or disposes defective computers in laboratories or

offices

1.67

1.68

2.08

1.52

1.86

1.47

1.56

2.04

1.46

Very Low

Very Low

Low

Very Low

Low

Very Low

Very Low

Low

Very Low

Cumulative Weighted Mean

1.7

Very Low

Page 38: Statistics manual

Statistics Handouts

Page 38 of 92

Generation of Waste

The extent of performance of SWM practices of students and faculty on the area of generation

of wastes is given in Table 5. The results show the respondents’ mean, based on the nine (9)

indicators used, ranged from 1.4 to 2.08 or from “ very low” to “low” ratings. The respondents gave

an overall mean that resulted to “very low” to the following indicators: “provides information through

campaigns or seminars about SWM (1.67)”, “introduces strategies on how to apply the 4R's of Solid

Waste Management (1.68)”,, “rejects products which are harmful to the environment such as foam,

Styrofoam, CFC aerosols, oil-based paints, pesticides, insecticides, plastics, wood preservatives, glues

and adhesives (1.52)” , “encourages the use of refillable ink (1.47)”, “allows the use of old notebooks

(1.56) “ and “repairs or disposes defective computers (1.46)”. The “very low” also implied that almost

none of the respondents observe the mentioned practices.

On the indicators stating that “provides campaign to patronize the use of reusable and

recyclable materials (2.08)”, “encourages the use of unused side of old papers or recycles its own

paper (1.86)”, “encourages or requires the use of refillable materials (3.2)”,and “encourages to reuse

envelopes, boxes, packaging materials and folders (2.04)” had an overall mean of “low”. Only 25% of

the respondents observe the mentioned indicators.

The students and faculty gave an overall weighted mean that resulted to “very low”. In

totality, the cumulative mean score resulted to 1.7. The result implied that almost none of the

indicators were being observed under the generation component of SWM.

Survey results reveal that there was a need for intensive information campaign about SWM

and that the University had yet to implement strategies on how to apply the 4R’s. Such an outcome

presents an opportunity to promote waste-saving measures among the student and teaching

population in the AdeNU in line with the future promotion of the 4R’s.

Page 39: Statistics manual

Statistics Handouts

Page 39 of 92

Exercise # 4 -Weighted Means

A. For the raw data given, obtain the weighted mean for each item and the

cumulative/total weighted mean.

B. Interpret the cumulative/total weighted mean.

C. What is the highest and lowest obtained weighted means. Interpret the values.

D. Conclusion. Make a discussion on the result of the test base on the objective of the study.

Rating Scale

Range of The

Likert’s Scale

Interpretation

5

4

3

2

1

4.20 – 5.00

3.40 – 4.19

2.60 – 3.39

1.80 – 2.59

1.00 – 1.79

Extremely Characteristic of Me – Almost all

indicators are evident.

Somewhat Characteristic of Me – 75% of the

indicators are evident.

Neither Un/Characteristic of Me – 50% of the

indicators are evident.

Somewhat Uncharacteristic of Me – 25% of the

indicators are evident.

Extremely Uncharacteristic of Me – almost

none of the indicators are evident.

Page 40: Statistics manual

Statistics Handouts

Page 40 of 92

Problem Set Thesis title: Portable Games and Devices towards Aggressive Behavior of the First Year BS Digital

Animation Students of Ateneo de Naga University Objective: To determine the level of influence of playing Portable Games and Devices on the behavior

specifically aggressiveness of the respondents

Table 1

Results from the Standard Questionnaire by Buss and Perry.

Indicators

5

4

3

2

1

Weighted

Means

1. Some of my friends think I am a

hothead.

18 12 15 12 13

2. If I have to resort to violence to protect

my rights, I will.

17 21 10

15 7

3. When people are especially nice to me, I

wonder what they want.

14 17 15 17 7

4. I tell my friends openly when I disagree

with them.

17 28 10 10 5

5. I have become so mad that I have broken

things.

10 17 14 15 14

6. I can’t help getting into arguments when

people disagree with me.

16 18 14 13 9

7. I wonder why sometimes I feel so bitter

about things.

9 23 15 17 6

8. Once in a while, I can’t control the urge

to strike another person.

12 16 10 16 16

9. I am an even/tempered person. 18 21 15 13 3

10. I am suspicious of overly friendly strangers.

11 19 17 13 10

Cumulative Weighted Mean

Page 41: Statistics manual

Statistics Handouts

Page 41 of 92

Lesson # 5 – Sampling

SAMPLE SIZE DETERMINATION

Slovin’s Formula: 21 Ne

Nn

Where n = sample size N = population size

e = margin of error (usually at 5%)

A researcher would want to make a socio-economic survey of a school with a population of 5000 students. If he allows a margin of error of 5%, how many

students must he take into sample?

n = 2)05.0(50001

5000

= )0025(.50001

5000

= 5.121

5000

= 5.13

5000

= 37.370 ~ 370

Important: Samples should be as large as a researcher can obtain with a

reasonable expenditure of time and energy. A recommended minimum number of subjects is 100 for a descriptive study, 50 for a correlational, and 30 in each group for experimental and causal- comparative study.

Page 42: Statistics manual

Statistics Handouts

Page 42 of 92

SAMPLING METHODS

Random Sampling Methods

Nonrandom Sampling Methods

every element in the population

has an equal chance of being

chosen

example: The dean of a school

of education in a large

midwestern university wishes

to find out how her faculty feel

about the sabbatical leave

requirements at the university.

She places all 150 names of the

faculty in a hat, mixes them

thoroughly , and then draws

out the names of 25 individuals

to interview.

not all elements are given a equal

chance of being included in the

sample

some elements may be deliberately

ignored (that is, giving them no

chance at all) in the choice of

elements for the sample

example: The manager of the

campus bookstore at a local

university wants to find out how

students feel about the services of

the bookstore provides. Every day for

two weeks during her lunch hour,

she asks every person who enters

the bookstore to fill out a short

questionnaire she has prepared and

drop it in a box near the entrance

before leaving. At the end of the two-

week period, she has a total of 235

completed questionnaires.

Page 43: Statistics manual

Statistics Handouts

Page 43 of 92

I. RANDOM SAMPLING METHODS

A. Simple Random Sampling (SRS) – is a method of selecting n units out of N units in the population in such a way that every distinct sample of size n

has an equal chance of being drawn.

Required : complete list of the elements of the population Features : each and every number of the population has an equal

chance of being chosen When to use : population size is not very large

population is homogeneous

Procedures : i. Lottery method/Chip-in-the-box/Fish-in-the-Bowl ii. Table of Random Numbers iii. Calculator/computer generated random numbers

Illustration: Table of Random Numbers

011723 223456 222167 912334 379156 233989

086401 016265 411148 059397 022334 080675

666278 106590 879809 051965 004571 036900 063045 786326 098000

560132 345678 356789 727009 344870 889567

000037 121191 258700 667899 234345 076567

Page 44: Statistics manual

Statistics Handouts

Page 44 of 92

B. Stratified Sampling – the population of N units is first divided into

subpopulations called strata. Then a simple random sample is drawn from each stratum, the selection being made independently in different strata.

Required : complete list of the elements of the population

Features : representative for each strata or subgroups of the

population are randomly chosen as elements of the sample

When to use : Population size is large; Population is heterogeneous but

elements can be grouped into homogeneous strata ; When we want

representative for each strata or subgroups

Procedure: Given a population N = 365, the researcher grouped the

respondents according to gender where there are 219 females and 146

males. Using stratified sampling, how many respondents will be obtained

from each strata?

N = 365 , use Slovins formula to get the sample size n

n = 2)05.0(3651

365

= )0025(.3651

365

= 9125.01

365

= 9125.1

365

= 190.849 ~ 191

Page 45: Statistics manual

Statistics Handouts

Page 45 of 92

Researcher identifies 2 subgroups or strata

219 females (60% = 365

219) 146 males (40% =

365

146)

using Slovins we compute the required sample size n,

then we multiply it by the percentage

191 x 0.60 191 x 0.40

Population of 365

115 females 76 males

Page 46: Statistics manual

Statistics Handouts

Page 46 of 92

C. Cluster Sampling – a method of sampling where a sample of distinct groups, or clusters, of elements is selected and then a census of every element

in the selected clusters is taken.

Features : population is grouped into clusters or small units

composed of population elements; each cluster

contains as varied a mixture as possible and at the

same time one cluster is nearly as alike as the other

: Sometimes referred to as an area sample because it

is frequently applied on a geographical basis, blocks

in a community or city are occupied by heterogeneous

groups

When to use : large population

: list of all members of the population is not available;

only a population list of clusters is required.

Procedure : 50 barangays in Naga City

Randomly choose 3 barangays

Page 47: Statistics manual

Statistics Handouts

Page 47 of 92

C. Multi-stage Sampling – the population is divided into hierarchy of

sampling units corresponding to the different sampling stages. In the first

stage of sampling, the population is divided into primary stage units (PSU)

then a sample of PSUs is drawn. In the second stage of sampling, each

selected PSU is subdivided into second-stage units (SSU) then a sample of

SSU is drawn. The process of subsampling can be carried to a third stage

fourth stage and so on, by sampling the subunits instead of enumerating

them completely at each stage.

Features :this technique uses several stages or phases in

getting sample from the general population

When to use : conducting nationwide surveys or any survey involving

a large universe

Page 48: Statistics manual

Statistics Handouts

Page 48 of 92

Illustration of Multistage Sampling:

Philippines (17 regions)

Choose randomly 5 regions

R1 R2 R3 R4 R5

Choose randomly 2 provinces for each region

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10

Choose randomly 1 city for each province

C1 C2 C3 C4 C5 C6 C7 C8 C9 C10

Choose randomly 2 barangays for each city

Then choose randomly 5 households for each barangay

Page 49: Statistics manual

Statistics Handouts

Page 49 of 92

Populations

CDE

MNO

MNO

F G

A H G K L I D W E R

T Y U O P S F G H J

Z X C V B N M

A B C D E

25%

F G H I J

K L M N O

50%

P Q R S T

25%

CDE

MNO

M N

O C M

B C 25%

F H M O 50%

Q S 25%

MNO

F G

SIMPLE RANDOM STRATIFIED

SAMPLING

Page 50: Statistics manual

Statistics Handouts

Page 50 of 92

Populations

C D

I J K

C D

AB

CDE

FG

HKL MNO

AB

G H

E F

D A

AB

FG

HKL

A B

CLUSTER SAMPLING

TWO-STAGE SAMPLING

Page 51: Statistics manual

Statistics Handouts

Page 51 of 92

II. Non-random sampling

A. Convenience - chooses sample at the researcher’s convenience

example. To find out how students feel about food service in the student union at an East Coast university, the manager stands outside the main door of the cafeteria one Monday morning and interviews the

first 50 students who walk out of the cafeteria B. Purposive - use their judgement to select a sample that

they believe will provide the data they need example. A graduate student wants to know how retired people aged 65 and over feel about their “golden years”. He has ben told by one of

his professors, an expert on aging and the aged population, that the local Association of Retired Workers is a representative cross section of retired people age 65 and over. He decides to interview a sample of 50

people who are members of the association to get their views. C. Quota - sets a sample size then chooses the

respondents without setting criteria. The researcher proceeds to fill the prescribed quota. The researcher is left to his own convenience or preference.

D. Snowball

REASONS FOR USING NON-RANDOM SAMPLING

a. Some might use this technique because they just want to get a “feel” of the market before launching or producing a certain product.

b. Lack of logistics or inadequate knowledge in the use of random methods

c. The validity of the sample is based on the soundness of the judgement of whoever make the choice.

Example. One would naturally use judgement instead of

randomness in the choice of people who will work for a company.

Page 52: Statistics manual

Statistics Handouts

Page 52 of 92

Lesson # 6 – FPC, Permutations and Combinations

Definition. FUNDAMENTAL PRINCIPLE OF COUNTING. If one event can occur in m different ways, and if, after it has happened in one of these ways, a second event can occur in n different ways, and then both events can occur, in the order stated, in m

x n different ways.

Examples 1. If there are seven doors providing access to a building, in how many ways can a

person enter the building by one door and leave by a different door?

2. How many even three-digit numbers can be formed fro the digits 1, 2, 3,4 and 5 if each digit can be used only once?

3. How many different arrangements, each consisting of five different letters, can be

formed from the letters of the word “PERSONAL” if each arrangement is to begin and end with a vowel?

4. How many different arrangements of five distinct books each can be made on a

shelf with space for five books?

5. Suppose that there are 3 math books and 3 physics books, how many different

arrangements of the six books can be made on a shelf if books on the same subject are to be kept together?

6. How many ways can a 10-question true-false exam be answered?

Page 53: Statistics manual

Statistics Handouts

Page 53 of 92

Definition. PERMUTATION (nPr). Let S be a set containing n elements and suppose r

is a positive integer such that r < n. Then a permutation of r elements of s is an arrangement in a definite order, without repetitions of r elements of s.

Theorem 1. The number of permutations of n elements taken r at a time is given by either of the following formulas:

a. nPr = n(n-1)(n-2) … (n-r+1) b. nPr = n! / (n-r)!

Special case: nPn = n!

Examples: 1. A bus has six vacant seats. If three additional passengers enter the bus, in how

many different ways can they be seated?

2. In how many ways can 3 boys and 3 girls be seated in a row containing six seats

if a. a person may sit in any seat

b. boys and girls must sit in alternate seats?

Theorem 2. If we are given n elements, of which exactly m1 are of one kind, exactly m2 are alike of a second kind, …, and exactly mk are alike of a kth kind, and if

n=m1 + m2 + .. + mk, then the number of distinguishable permutations that can be made of the n elements taking them all at one time is

Examples:

1. Determine the number of different nine-digit numerals that can be formed from the digits 6,6,6,5,5,5,4,4 and 3.

2. How many permutations can be formed from the word HONOLULU?

Page 54: Statistics manual

Statistics Handouts

Page 54 of 92

Definition. COMBINATION (nCr). Let s be a set containing n elements, and suppose r

is a positive integer such that r< n. then a combination of r elements of s is containing r distinct elements.

Theorem 3. The number of combinations of n elements taken r at a time is given by

nCr = nPr / r!

= n! / (n-r)!r!

Theorem 4. NCr = nCn-r

Examples: 1. A football conference consists of 8 teams. If each team plays every other team,

how many conference games are played?

2. A student has 10 posters to pin up on the walls of her room, but there is space for only 7. In how many ways can she choose the posters to be pinned up?

3. How many committees of five can be formed from 7 sophomores and 5 freshmen

if each committee is to consist of 3 sophomores and 2 freshmen?

consist of at least 3 sophomores?

at most 3 sophomores?

Page 55: Statistics manual

Statistics Handouts

Page 55 of 92

Exercise #5 – FPC, Combinations and Permutations Objectives: At the end of the exercise, the student is expected to be able to: 1. Count the number of ways an event may possibly occur by:

a. listing all possible outcomes in the sample space corresponding to the event; and b. using the method of counting.

2. Solve problems requiring the applications of the concept of permutation and combination.

I. Show complete solution for each. 1. How many different outcomes are possible in a roll of 3 dice? In tossing 2 coins?

In rolling 2 dice and tossing 3 coins simultaneously? 2. How many distinct permutations can be made from the word FOOL? List them

down. 3. Package of 10 game boy sets contains 3 defective sets. If 5 sets are to be picked

out randomly and sent to a customer for an inspection, in how many ways can

the customer find at least two defective set? 4. How many different telephone numbers can be formed from a seven-digit number

if the first digit cannot be zero?

5. A college freshman must take a science course, a humanities course, and a math course. If she may select any of 6 science courses, any of 4 humanities, and any

of 4 math courses, how many ways can she set her program? 6. A shelf contains 3 books in red binding, 4 books in blue and 2 in green. In how

many different orders can they be arranged if all the books of the same color

must be kept together? 7. How many different numbers greater than 200 can be formed from the digits

1,2,3,4 and 5 (a)if repetitions are not allowed? (b) repetitions are allowed? 8. How many committees of 5 can be selected from 12 republicans and 8 democrats

(a) if it must contains 2 republicans and 3 democrats? (b)if it must contains at

least 3 republicans? 9. There are 8 baseball teams in a league. How many games will be played if each

team play each of the other teams 40 times?

10. In how many ways can one make a selection of 5 black balls, 3 red balls, and 2 white balls from a box containing 8 black balls, 7 red balls and 5 white balls?

11. The tennis squad of one college consists of 8 players that if another consist of 10 players. In how many ways can a doubles match between the 2 institutions be arranged?

12. In how many ways can one make selection 4 novels, 3 biographies and 6 detective stories from a shelf containing 10 novels, 8 biographies and 10 detective stories.

Page 56: Statistics manual

Statistics Handouts

Page 56 of 92

Lesson #7 – Probability

PROBABILITY

SAMPLE SPACE is the set of all possible outcome of a given experiment.

A subset of the sample space of an experiment is called an EVENT associated

with the experiment Definition. PROBABILITY OF AN EVENT. If S is the sample space of an experiment

and E is an event associated with the experiment, the probability of E, denoted by P(E), is defined by

P(E) = . n(E) . where n(E) are the numbers of elements in E and S respectively. n(S)

Furthermore, if P(E)= 0 then the event will never happen or it is an “impossible”

event. If P(E) = 1, the event is certain to happen or it is a “sure” event. Examples:

1. Determine the probability of each of the following events: a. Obtaining a 4 on a throw of a single die b. Obtaining a head on a toss of a coin

2.

a. a. If 2 dice are thrown, what

is the probability of obtaining a sum of 8? a sum of 3?

1 2 3 4 5 6

1 (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)

2 (2,1) (2,2) (2,3) (2,4) (2,5) (2,6)

3 (3,1) (3,2) (3,3) (3,4) (3,5) (3,6)

4 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6)

5 (5,1) (5,2) (5,3) (5,4) (5,5) (5,6)

6 (6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

3. Determine the probability of each of the following events

a. Drawing a heart from a deck of 52 playing cards b. Drawing 4 spades in succession from a deck of 52 playing cards if after each

card is drawn it is not replaced in a deck

4. If a French, Spanish, Russian and English books are placed at random on a shelf

with a space for 4 books, what is the probability that the Russian and English books will be next to each other?

Page 57: Statistics manual

Statistics Handouts

Page 57 of 92

CONJUNCTION AND DISJUNCTION PROBABILITIES

Definition. CONJUNCTION PROBABILITY. This type of probability is associated with events happening together, one event and another event occurring at the same time. Events, however, may be independent or dependent

Case 1. P(A and B) = P(A) x P(B) When the occurrence of one event does not influence the probability of the

occurrence of the other event, these events are said to be independent. Example. At birth the probability that US female will survive to age 65 is

approximately 7/10. The probability that a male will survive to age 65 is approximately 3/5. What is the probability that both male and female

will be alive at age 65?

What is the probability that only the male will be alive at age 65?

What is the probability that at least one of the two will be alive at age 65?

Case 2. P(A and B)= P(A) x P(B/A) When the occurrence of one event is conditioned by the other event, these

events are said to be conditional. Example. Suppose a box contains 30 fuses 5 of which are defective. What

is the probability of drawing at random two defective fuses in succession if the first fuse that has been drawn is not returned before making the second draw?

Page 58: Statistics manual

Statistics Handouts

Page 58 of 92

Definition. DISJUNCTION PROBABILITY. This type of probability is associated with

several events that happen either separately or simultaneously. Disjunction probability is concerned with “either or” relationship.

Case 3. P(A or B) = P(A) + P(B When the events do not have common sample points, they are said to be mutually exclusive. Example. What is the probability that in a single toss of a two dice, the

sum will be 5 or 8?

Case 4. P(A or B) = P(A) + P(B) – P(AB) There are also cases of joint events which are not mutually exclusive because there are some elements common to both events. Example. What is the probability of getting a sum of 5 or a sum greater

than 4 in a throw of two dice?

Example. Take a math class with 52 students, 27 of whom are males and the rest are females. A total of 21 of the males and 15 of the females got a grade above 90. What is the probability that if a student is chosen

at random, this student has either grade of above 90 or is a male?

Page 59: Statistics manual

Statistics Handouts

Page 59 of 92

PROBABILITIES INVOLVING QUALITATIVE DATA IN CONTINGENCY TABLE

When the data are presented in the form of frequencies and are classified

according to qualitative rather than quantitative categories, they are called qualitative data in contingency tables.

Illustration:

Vegetarian Status

Gender

Vegetarian

Non Vegetarian

Total

Male

20 23 43

Female

22 25 47

Total

42 48 90

1. To find the probability of a single event from qualitative data, simply divide the subtotal of the desired event by the grand total.

P(A) = subtotal/ grand total Example. The probability that a person is vegetarian

2. To find the conjunction probabilities of two independent events from qualitative data, divide the observed frequency where the two events intersect by the grand total.

P(A and B) = observed freq. of the two events intersection . Grand total

Example. The probability that a person is female and a vegetarian

Page 60: Statistics manual

Statistics Handouts

Page 60 of 92

3. To find the probabilities of two dependent events from qualitative

data, divide the observed frequency where the two events intersect by the subtotal of the event which is used as a condition

P(A and B) = observed freq. of the two events intersection . Subtotal of the conditional events

Example. The probability of getting a male at random provided that he is a non- vegetarian

4. To find the disjunction probabilities of the two events P(A or B) = Subtotal of 1st event . + . subtotal of 2nd event .

grand total grand total

– Observd Freq. Of Intersectx

grand total

Example. The probability of getting a female or a person who is a non vegetarian

Page 61: Statistics manual

Statistics Handouts

Page 61 of 92

Exercise # 6 - Probability Objectives: At the end of the exercise, the student is expected to be able to apply the different operations on probability

II. Show complete solution for each.

1. On a throw of two dice, what is the probability of obtaining a sum that at most 10?

2. If a single card is drawn from deck of 52 playing cards, what is the probability of each of the following events: (a) obtaining a red card; (b) obtaining a diamond;

and (c) obtaining an ace or heart?

3. A committee of 5 is to be selected from 10 seniors and 8 juniors. What is the

probability that the committee is to consist of at most 3 seniors?

4. A number of two different digits is to be formed from the digits 1,2,3,4 and 5.

Determine the probability of each of the following events:

a. the no. is odd b. no. is greater than 25

5. A student guesses his answers on a 3-question test. What is the probability that he will get a. two correct and one wrong answer

b. at least two correct c. all wrong

d. at most two correct e. two correct and last answer is wrong

Page 62: Statistics manual

Statistics Handouts

Page 62 of 92

6. Classification of Patients in a Hospital

Pregnant Elderly Children

Male 0 27 35 62

Female 28 49 11 88

28 76 46 150 What is the probability that a patient chosen at random from among the 150 will be:

a. pregnant b. female or elderly c. female and elderly

d. male or a child e. male provided that he is elderly

f. child given male

Page 63: Statistics manual

Statistics Handouts

Page 63 of 92

PROBABILITY DISTRIBUTIONS

Concept of a Random Variable

Definition. A function whose value is a real number determined by each element n the sample space is called a random variable.

Remark. We shall use an uppercase letter, say X, to denote a random variable and its corresponding lowercase letter, x in this case, for one of its value.

Example (Experiment #1): An experiment consists of tossing a coin 3 times and observing the result. The possible outcome and the values of the random variables X

and Y, where X is the number of heads and Y is the number of heads minus the number of tails are

Sample Points X Y HHH 3 3

HHT 2 1 HTH 2 1

HTT 1 -1 THH 2 1 THT 1 -1

TTH 1 -1 TTT 0 -3

DISCRETE AND CONTINUOUS PROBABILITY DISTRIBUTIONS

Definition. If a sample space contains a finite number of possibilities or an unending

sequence with as many elements as there are whole numbers, it is called

a discrete sample space. Definition. A random variable defines over a discrete sample space is called a

discrete random variable

Definition. If a sample space contains an infinite number of possibilities equal to the number of points on a line segment, it is called a continuous sample space.

Definition. A random variable defines over a continuous sample space is called a continuous random variable.

Page 64: Statistics manual

Statistics Handouts

Page 64 of 92

Discrete Probability Distributions

Definition. A table or formula listing all possible values that a discrete random variable can take on, along with the associated probabilities, is called a discrete probability distribution.

Remark. The probabilities associated with all possible values of a discrete random

variable must sum to 1.

Examples. For Experiment #1, the discrete probability distributions of the random variables X and Y are

x 0 1 2 3

P(X = x) 1/8 3/8 3/8 1/8

Y -3 -1 1 3

P(Y = y) 1/8 3/8 3/8 1/8

Continuous Probability Distribution

Definition. The function with values f(x) is called a probability density function for the continuous random variable X, if

*the total area under its curve and above the horizontal axis is equal to 1; and

*the area under the curve between any two ordinates x=a and x=b gives the probability that X lies between a and b.

Remarks: 1. A continuous random variable has a probability of zero of assuming exactly any

of its values, that is, if X is a continuous random variable, then P(X=x) = 0 for all real numbers x.

2. The probability random variable X that can assume values between 0 and 2 has a density function given by

f(x) = {

Page 65: Statistics manual

Statistics Handouts

Page 65 of 92

Expected Values

Definition. Let X be a discrete random variable with probability distribution

x x1 x2 … xn

P(X = x) f(X1) f(X2) … f(Xn)

The mean or expected value of X is

( ) ∑ ( )

Examples:

1. Find the mean of the random variables X and Y of Experiment No. 1

x 0 1 2 3

P(X = x) 1/8 3/8 3/8 1/8

E(X) = (0)(1/8) + (1)(3/8) + (2)(3/8) + (3)(1/8) = 12/8 or 1.5

Y -3 -1 1 3

P(Y = y) 1/8 3/8 3/8 1/8

E(Y) = (-3)(1/8) + (-1)(3/8) + (1)(3/8) + (3)(1/8) = 0

Page 66: Statistics manual

Statistics Handouts

Page 66 of 92

Definition. Let X be a random variable with mean then the variance of X is

( ) ( )

Definition. Let X be a discrete random variable with probability distribution

x x1 x2 … xn

P(X = x) f(X1) f(X2) … f(Xn)

The variance of X is

( ) ( ) ∑( ) ( )

Example:

In experiment No. 1, find the variance of X.

Using the definition of Var(X),

E(X) = 1.5

( ) ( ) ∑ ( ) ( )

= ( 0 – 1.5)2 (1/8) + ( 1 – 1.5)2 (3/8) + ( 2 – 1.5)2 (3/8) + ( 3 – 1.5)2 (1/8)

= 0.75

Example. A used car dealer finds that in any day, the probability of selling no car is 0.4, one car is 0.2, two cars is 0.15, 3 cars is 0.10, 4 cars is 0.08, five cars is 0.06 and six cars is 0.01. Let g(X) = 500 + 1500X represent the salesman’s daily earnings,

where X is the number of cars sold. Find the salesman’s expected daily earnings.

Page 67: Statistics manual

Statistics Handouts

Page 67 of 92

Lesson # 8 – Normal Distribution

PROPERTIES OF A NORMAL CURVE The normal distribution is represented by a normal curve. A normal curve is

bell-shaped figure, has the following six properties:

1. It is symmetrical about X .

2. The mean is equal to the median, which is also equal to mode. 3. The tail or ends are asymptotic relative to the horizontal line 4. The total area under the normal curve is equal to 1 or 100%

5. The normal curve area may be subdivided into at least three standard scores each to the left and to the right of the vertical axis.

6. Along the horizontal line, the distance from one integral standard score to the next integral standard score is measured by the standard deviation.

AREA UNDER THE NORMAL CURVE

In making use of the properties of the normal curve to solve certain types of statistical problems, one must first learn how to find areas under the normal curve.

The first step in finding areas under the normal curve is to convert the normal curve of any given variable into a standardized normal curve by using the

formula:

X XZ

S

where Z = standard score

X = mean

S = Standard deviation X = given value of a particular variable

WORDED PROBLEMS: 1. Given a normal distribution with mean 350 and standard deviation s=40, find the

probability that x assumes a value greater than 362.

Page 68: Statistics manual

Statistics Handouts

Page 68 of 92

2. An electrical firm manufactures light bulbs that have a length of life that is

normally distributed with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a bulb burns between 778 and 834

hours

3. On an examination the average grade was 74 and the standard deviation was

7. If 12% of the class are given A’s, and the grades are curved to follow a normal distribution, what is the lowest possible A and the highest possible B?

Find D6.

4. The quality grade-point averages of 300 college freshmen follow approximately

a normal distribution with a mean of 2.1 and a standard deviation of 0.8. How

many of these freshmen would you expect to have a score

a. between 2.4 and 3.5? b. greater than 3.8? c. less than 1.7?

Page 69: Statistics manual

Statistics Handouts

Page 69 of 92

Exercise # 7 – Normal Distribution

Objectives: At the end of the exercise the student should be able to: 1.Find probabilities using the standard normal probability curve; 2. Apply the concepts of finding areas under the normal probability curve in solving

problems

I. Find the probability. a. P( z < -1.257 f. P( z > 0.85) k. P(1.33 < z < 1.56) b P( z < 1.65) g. P( z > 0.69) l. P(-1.48 < z < 2.04)

c. P( z < 0.92) h. P( z > 3.01) m. P(-0.58 < z < 1.05) d. P( z < -2.02) i. P( z > 2.84) n. P(-0.92 < z < 0.07) e. P( z < -1.24) j. P( z > 0.53) o. P(-1.45 < z < 1.87)

II. Find the unknown constant a given the area under the normal curve. a. P(z < a) = 0.25 b. P(z > a) = 0.99

III. Solve the following problems.

a. Given a normal distribution variable X with mean 18 and standard

deviation 2.5, find

i. P(X < 15) ii. P(17 < X < 21) iii. the value of k such that P(X < k) = 0.2578;

iv. the value of k such that P(X > k) = 0.1539

b. If a set of grades on a statistics exam are approximately normally distributed with a mean of 74 and a standard deviation of 7.9, find

i. the lowest passing grade if the lowest 10% of the students are given F’s;

ii. the highest B if the top 5% of the students are given A’s;

c. A soft drink machine is regulated so that it discharges an average of 200

milliliters per cup. If the amount of drink is normally distributed with a = 15 milliliters,

i. What is the probability that a cup contains between 180 and 230 milliliters?

ii. How many cups will likely to overflow if 220 milliliter cups are used

to the next 1000 drinks? iii. Below what value do we get the smallest 35% of the drinks?

Page 70: Statistics manual

Statistics Handouts

Page 70 of 92

Lesson # 9 – Estimation

ESTIMATION - refers to any process by which sample information is used to predict or estimate

the numerical value of some population measure.

- The formula, function or procedure used in estimating a population parameter is

called an estimator. The value obtained with the use of the estimator is the estimate.

- Two types of estimators: point estimator and interval estimator. A point estimator yields a numerical value of the estimate. An interval estimate gives a range or band of values within which the value of the parameter is estimated to lie.

INTERVAL ESTIMATION OF THE POPULATION MEAN

An interval estimate of ( or any parameter) incorporates a measure of the

confidence in the reliability of the range or interval of values within which the parameter is estimated to lie. Thus, an interval estimate is also called a confidence

estimate, and its limits, confidence limits.

Where

= level of significance

1- = level of confidence

( ) 1P X k X k

. .s en

2

( . .)k Z s e

Page 71: Statistics manual

Statistics Handouts

Page 71 of 92

Example.

1.The mean IQ of a random sample of 400 high school students is 110. The standard

deviation of the population of IQ scores is 16. If the population is normally distributed, find:

a. a .95 confidence interval estimate of

b. a .90 confidence interval estimate of

Find the .90 confidence interval estimate of the mean weight of all the pupils in a certain school if a random sample of 25 pupils has a mean weight of 70lbs with a

standard deviation of 15lbs. Assume the population weights to be normally distributed.

2

1.96Z

2

1.64Z

2

1.711t

Page 72: Statistics manual

Statistics Handouts

Page 72 of 92

3. The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0,

10.2 and 9.6 liters. Find a 95% confidence interval for the mean content of all such containers, assuming an approximate normal distribution for containers contents. (

)

4. The mean and standard deviation for the quality grade-point averages of a

random sample of 36 college seniors are calculated to be 2.6 and 0.3, respectively. Find the 99% confidence interval for the mean of the entire senior class. Interpret

the obtained confidence interval. ( )

5. The manager of a home delivery service for pizza pies wants an estimate of the

average time it takes to deliver an order within the town proper of the City of Naga. A sample of 25 deliveries had a Mean time of 15 minutes and a standard deviation of 4 minutes. Construct a 95% confidence interval for the average time for all deliveries.

Interpret the interval obtained. ( Z = 1.96 )

6. A random sample of 12 students in a certain dormitory showed an average weekly expenditure of P400 for snack foods, with a standard deviation of P50.25.

Construct a 90% confidence interval for the average amount spent each week on snack foods by female students living in this dormitory, assuming the expenditure

to be approximately normally distributed. Interpret your confidence interval. ( t = 1.796)

Page 73: Statistics manual

Statistics Handouts

Page 73 of 92

Lesson # 10- Test of Hypothesis

COMMON TERMS IN INFERENTIAL STATISTICS

A HYPOTHESIS is a statement, which aims to explain facts about the real

world. A test of hypothesis is a two-way decision problem. It is a procedure to substantiate or invalidate a claim which is stated as null hypothesis

Definition. A NULL HYPOTHESIS (Ho) is the hypothesis that we hope to accept or reject; must always express the idea of non significance of

difference An ALTERNATIVE HYPOTHESIS (Ha). The rejection of Ho is the acceptance of this hypothesis.

TYPE I and TYPE II ERROR

Decision Ho is TRUE Ha is TRUE

Reject Ho Type I error Correct decision Accept Ho Correct decision Type II error

Type I error ( error) – when we reject the null hypothesis when in fact

the null hypothesis is true.

Type II error ( error) – when we accept the null hypthesis when in fact

the null hypothesis is false.

ONE-TAILED AND TWO-TAILED TEST

Definition. When the rejection region located at only one extreme of the range of values for the test statistics, the test is ONE-TAILED. If Ha is a

statement of non-equality represented by the sign , then the hypothesis is non-directional, thus we have a two-tailed test.

Page 74: Statistics manual

Statistics Handouts

Page 74 of 92

Steps in Test of Hypothesis: i. State the hypotheses, Ho and Ha. ii. Determine the appropriate test statistic to use iii. Choose the level of significance and formulate the decision rule iv. Compute the value of statistic from the sample data v. Make a decision (reject or accept) in accordance with the decision rule

formulated vi. Draw a conclusion in relation to the objective of the original problem

I. Mean of a Single Population

Case 1. Z Test

a. Hypotheses: Ho: 0 against

A. Ha: 0 or

B. Ha: 0

C. Ha: 0

i. Test Statistic : Z Test j. Computation:

0XZc

n

k. Decision Rule: At a level of significance ,

A. For Ha: 0 reject Ho if /Zc/ > 2

Z , otherwise accept Ho.

B. For Ha: 0 reject Ho if Zc < -Z otherwise accept Ho.

C. For Ha: 0 reject Ho if Zc > Z otherwise accept Ho.

Page 75: Statistics manual

Statistics Handouts

Page 75 of 92

Example 1. The weight of crabs is normally distributed with mean 28.5

ounces and standard deviation of 3 ounces. A new breeder claims that he can breed crabs yielding a mean weight of more than 28 ounces. A

random sample of 16 crabs from the new breeder had a mean weight of

29.2 ounces. At = 5%, do the data support the breeders claim?

i. Ho : = 28.5

Ha: > 28.5

ii. Test Statistic: Z Test

iii. Decision Rule : Reject Ho if Zc > Z otherwise accept Ho.

iv. Computation:

Z = 1.645

v.Decision: Since Zc < Z (0.933 = 1.645), accept Ho.

vi. Conclusion: At 5 % level of significance, there is no enough evidence to

support the new breeders claim OR the mean weight of the samples is not significantly different from the mean of 28.5.

Example 2. For the past five years, the mean height of AdeNU students is 60 inches. A simple random sample of 100 is taken from the present students. It was found that the mean height is 65 inches with a standard

deviation of 4 inches. Is there reason to believe that the mean height of present AdeNU students different from the past five years at 5% level of significance?

Page 76: Statistics manual

Statistics Handouts

Page 76 of 92

Case 2. T Test

a. Hypotheses: Ho: 0 against

D. Ha: 0 or

E. Ha: 0

F. Ha: 0

l. Test Statistic : T Test

m. Computation:

XTc

s

n

n. Decision Rule: At a level of significance ,

D. For Ha: 0 reject Ho if /Tc/ > [ , 1]

2n

T

, otherwise accept Ho.

E. For Ha: 0 reject Ho if Tc < -T, n otherwise accept Ho.

F. For Ha: 0 reject Ho if Tc > T, n otherwise accept Ho.

Example 3. A softdrink vending machine is set to dispense 6 ounces per cup. If the machine is tested eight times, yielding a mean cup fill of 5.8 ounces with a standard deviation of 0.16 oz. Is there evidence at 5%

level of significance that the machine is underfilling cups. Assume normality.

i. Ho : = 6

Ha: < 6

ii. Test Statistic: T Test

iii. Decision Rule : reject Ho if Tc < -T, otherwise accept Ho.

iv. Computation:

-T, n = -T[0.05,7] = -1.895

Page 77: Statistics manual

Statistics Handouts

Page 77 of 92

v. Decision: Since -3.536 < -1.895, reject Ho.

vi. Conclusion: At 5 % level of significance, there is evidence to say that the machine is under filling the cups.

Example 4. The monthly output of a plywood manufacturers was measured

in nine randomly selected months. The results obtained (in tons) are 100, 120, 100, 102, 130, 140, 150, 140 and 145. Test the hypothesis that the mean monthly output is 140 tons against the alternative that it is not 140

tons at 10%level of significance. Assume that the monthly output is normal random variable.

Page 78: Statistics manual

Statistics Handouts

Page 78 of 92

Exercise # 8 – Test of Hypothesis ( Z and T Test)

A. Carry out a complete test of hypothesis for the following problems. 1. A certain brand of powdered milk is advertised as having net weight of 250

grams. If the net weights of a random sample of 10 cans are 253, 248, 252,245,247,249,251,250,247 and 248 grams, can it be concluded that the

average net weight of the cans is less than the advertised amount? Use = 0.01 and assume that the net weight of this brand of powdered milk is normally

distributed. 2. In a time and motion study, it was found that the average time required by

workers to complete a certain manual operation was 26.6. A group of 20 workers was randomly chosen to receive a special training for two weeks. After the training it was found that their average time was 24 minutes and a standard

deviation of 3 minutes. Can it be concluded that the special training speeds up

the operation? Use = 0.05

3. The manager of an appliance store, after noting that the average daily sales was only 12 units, decided to adopt a new marketing strategy. Daily sales under this strategy were recorded for 90 days after which period the average was found to be

15 units with a standard deviation of 4 units. Does this indicate that the new

marketing strategy increased the daily sales? Employ = 0.01

4. The daily wages in a particular industry are normally distributed with a mean of P66.00. In a random sample of 144 workers of a very large company in this industry, the average daily wage was found to be P62.00 with a standard

deviation of P12.50, can this company be accused of paying inferior wages at the 0.01 level of significance?

Page 79: Statistics manual

Statistics Handouts

Page 79 of 92

II. Two Population Means – T Test

A. Dependent or Paired/ Independent

i. Ho: population mean of A is equal to population mean of B Ha: The population means are not equal

ii. Decision rule: Reject Ho if p-value < level of significance

Or t-computed > t-value, otherwise accept Ho.

III. ANOVA

Sample Problems: a. A researcher wishes to know if there are differences on the average preparation

time of four methods of preparing a solvent. b. An agriculturist may compare the average yields of three corn varieties used by Los

Banos c. A consumer wish to know if the different brands of gasoline in the market are

equally good with respect to average mileage d. A medical researcher is interested in comparing the effectiveness of 3 different

treatments to lower the cholesterol of patients with high values e. An ecologist wants to compare the amount of certain pollutant in five rivers

i. Ho: There is no difference between groups

Ha: There is difference between groups

i. Decision rule: Reject Ho if p-value < level of significance Or f-value > critical value, otherwise accept Ho.

IV. Chi-Square Test-t of Independence

This test is usually applied on enumeration data or data in contingency tables.

It tests the association or independence of one variable from another variable.

i. Ho: The two variables are independent Ha: The two variables are dependent.

ii. Decision rule: Reject Ho if p-value < level of significance

Or X2 value > critical value, otherwise accept Ho.

Page 80: Statistics manual

Statistics Handouts

Page 80 of 92

SAMPLE PROBLEMS

Two Population Means - T test

A. Dependent or Paired

1. In a study of the effectiveness of physical exercise in weight reduction, a

simple random sample of 8 persons engaged in a prescribed program of physical exercise for one month showed the ff. Results:

Weight Before

209

178

169

212

180

192

158

180

Weight

After

196

171

170

207

177

190

159

180

At 1% level of significance, do the data provide evidence that the prescribed program of exercise is effective?

a. Ho: The weights before and after are equal therefore the procedure is not

effective.

Ha: The weights before and after are not equal therefore the procedure is

effective.

b. Decision rule: Reject Ho if T-computed > critical value, otherwise accept Ho

at 1% level of confidence.

c. Test Statistics: T-test on Two Populations

d. Computation: T-computed = 2.07 Critical value = 3.499

e. Decision: Accept Ho.

f. Conclusion: At 1% level of significance, there is sufficient evidence to say

that the program is not effective.

Page 81: Statistics manual

Statistics Handouts

Page 81 of 92

B. Independent

2. Some statistics students complain that pocket calculators give other

students advantage during statistics examination. To check this

contention, a simple random sample of 45 students were randomly assigned to two groups, 23 to use calculators and 22 to perform

calculations by hands. The students then took a statistics examination that required a modest amount of arithmetic. The results are shown below:

With Calculator

85 86 89 84 82 83 90 91 86 90 87 87 92 85 86 89 88 88 89 90 85 89 90

Without Calculator

86 88 90 92 86 85 88 89 85 91 86 85 92 84 83 88 90 91 86 90 86 87

Do the date provide sufficient evidence to indicate that the students taking

this particular examination obtain higher scores when using a calculator? Test at = 10%.

a. Ho: The mean scores are equal. Ha: The mean scores are not equal.

b. Decision rule: Reject Ho if T-computed > critical value, otherwise accept Ho.

c. Test Statistics: T-test on Two Populations

d. Computation: T-computed = 0.25 Critical value = 1.303

e. Decision: Accept Ho.

f. Conclusion: At 10% level of significance there is no enough evidence to say

that the use of calculators will assure students of higher scores.

Page 82: Statistics manual

Statistics Handouts

Page 82 of 92

ANOVA

3. A study was conducted to compare the three teaching methods. Three

groups of 6 students were chosen and each group is subjected to one of three types of teaching method. The grades of the students taken at the end of the semester are given as:

Group I

Method A

Group II

Method B

Group III

Method C

Student 1 84 70 90

Student 2 90 75 95

Student 3 92 90 100

Student 4 96 80 98

Student 5 84 75 88

Student 6 88 75 90

a. Ho: The three teaching methods are equal. Ha: The three teaching methods are not equal.

b. Decision rule: Reject Ho if F-computed > critical value, otherwise accept Ho.

c. Test Statistics: F-test ANOVA

d. Computation: F-computed = 13.121 Critical value= 3.68

e. Decision: Reject Ho.

f. Conclusion: There is evidence to say that the three methods are not

equal. We can also conclude that Method III is more effective since it students got

higher grades compared to the other two methods.

Page 83: Statistics manual

Statistics Handouts

Page 83 of 92

Chi-Square Test of Independence

4. It is believed that people with high blood pressure need to watch their weight. A random sample of 300 subjects was classified according to their weight and blood pressure. At the 5% level of significance, is there

sufficient evidence to conclude that a person’s weight is related to his blood pressure?

Blood Pressure

Weight High Normal Low

Overweight Normal

Underweight

40 36

16

34 77

33

18 27

19

a. Ho: Weight is independent with blood pressure or weight is unaffected by

blood pressure or the two variables weight and blood pressure are

independent.

Ha: Weight is dependent with blood pressure or weight is affected by blood pressure or the two variables weight and blood pressure are dependent.

b. Decision rule: Reject Ho if X2-computed > critical value, otherwise accept Ho.

c. Test Statistics: Chi-square Test

d. Computation: X2-computed = 12.75 Critical value = 9.49

e. Decision: Reject Ho.

f. Conclusion: At 5% level of significance, there is evidence to say that weight is affected by blood pressure. For overweight persons, most of them (approximately 40% of the actual population) will have higher blood pressure.

For normal weight person, they are most likely to have normal blood pressure. Those who are underweight will also most likely to have normal blood pressure.

Page 84: Statistics manual

Statistics Handouts

Page 84 of 92

Exercise # 9 – Test of Hypothesis (T-test, ANOVA and Chi-Square Test) Objectives: At the end of the exercise, the student is expected to be able to apply the appropriate statistical procedure in performing test of hypothesis of various problems Carry out a complete test of hypothesis for the following problems.

1. As part of a study to determine the effects of a certain oral contraceptive on

weight gain, 12 healthy females were weighed at the beginning of a course

of oral contraceptive usage. They were reweighed after three months. Do the results suggest evidence of weight gain? Use = 0.05

Subject 1 2 3 4 5 6 7 8 9 10 11 12

Initial

Weight

120 141 130 162 150 148 135 140 129 120 140 130

3-Month

Weight

123 143 140 162 145 150 140 143 130 118 141 132

Source: Basic Statistics for Health Sciences by Kuzma

d. Ho:

Ha:

e. Test Statistic:

f. Decision Rule:

g. Computation: computed value = 1.75

Critical value = 2.201 h. Decision:

i. Conclusion:

Page 85: Statistics manual

Statistics Handouts

Page 85 of 92

2. An investment analyst claims to have mastered the art of forecasting the

price changes of gold. The ff. Table gives the actual gold price changes and

the changes forecasted by the investment analyst (in%) on a simple random sample of 8 months. Use a = 5%.

Month 1 2 3 4 5 6 7 8

Actual Price Changes 7.3 -2.1 8.5 -1.5 9.2 6.7 -4.8 -0.8

Forecasted Changes 14.9 -19.7 7.0 -5.3 1.0 -0.8 -8.3 6.7

a. Ho:

Ha:

b. Test Statistic: o. Decision Rule:

p. Computation: Computed value = 1.15 Critical value = 2.365

q. Decision:

r. Conclusion:

Page 86: Statistics manual

Statistics Handouts

Page 86 of 92

3. Four groups of 4 patients each were subjected to four different types of

treatment fort he same ailment. The following data are on the number of days that elapsed before that were completely cured. What conclusions may be

drawn about the four types of treatment?

Treatment A

Treatment B

Treatment C

Treatment D

Patient 1 10 11 3 6

Patient 2 9 11 4 10

Patient 3 6 18 5 8

Patient 4 7 6 7 11

a. Ho:

Ha:

b. Test Statistic:

c. Decision Rule:

d. Computation: Computed value = 3.474

Critical value = 3.49

e. Decision:

f. Conclusion:

Page 87: Statistics manual

Statistics Handouts

Page 87 of 92

4. Test if there is significant association between academic performance and

IQ

Table. Academic Performance and IQ of 100 Students

IQ

Academic

Performance

High

Average

Low

Total

Passed

Failed

31

1

45

4

4

15

80

20

Total

32

49

19

100

a. Ho:

Ha:

b.Test Statistic:

c.Decision Rule:

d.Computation: Computed value = 51.25 Critical value = 5.99

e.Decision:

f.Conclusion:

Page 88: Statistics manual

Statistics Handouts

Page 88 of 92

Lesson # 11 - TWO-FACTOR ANOVA

Example 1. A research study was conducted to examine the impact of eating a

high protein breakfast on adolescent’s performance during a physical education physical fitness test. Half of the subjects received a high protein breakfast and half were given a low protein breakfast. All of the adolescents, both male and

female, were given a fitness test with high scores representing better performance. Test scores are recorded below.

Males Females

High Protein Low Protein High Protein Low Protein

10

7 9 6

8

5

4 7 4

5

5

4 6 3

2

3

4 5 1

2

Statistical test results:

Treatment F -value F-critical

between (protein level) within (gender) among (interaction betwn

protein level and gender)

*8.89 *20.00 2.22

4.49 4.49 4.49

8.53 8.53 8.53

5% 1%

Ho : There is no difference on the performance between the two protein levels There is no difference on the performance between the two gender There is no interaction between protein levels and gender

Interpretation:

At 5% level of significance it can be concluded that there is significant difference on the performance for both protein level and gender. There was no

significant interaction effect. Based on this data, it appears that a higher protein diet results in a better fitness test scores. Additionally, young men seem to have a significantly higher fitness test score than women.

Page 89: Statistics manual

Statistics Handouts

Page 89 of 92

Seatwork:

1. Different typing skills are required for secretaries depending on whether one is

working in a law office, an accounting firm, or for research mathematical group at a major university. In order to evaluate candidate for this positions, an employment agency administers three distinct standardized typing samples. A time penalty has

been incorporated into the scoring of each sample based on the number of typing errors. The mean and standard deviation for each test, together with the score

achieved by a recent applicant, are given in Table below. For what type of position does this applicant seem to be best suited?

Sample Applicant’s

Score

Mean Standard

Deviation

Law Accounting Scientific

141 sec 7min 33min

180sec 10min 26min

30 sec 2min 5min

Page 90: Statistics manual

Statistics Handouts

Page 90 of 92

2. Researchers have sought to examine the effect of various types of music on

agitation levels in patients who are in the early and middle stages of Alzheimer’s disease. Patients were selected to participate in the study based on their stage of

Alzheimer’ s disease. Three forms of music were tested: easy listening, Mozart, and piano interludes. While listening to music, agitation levels were recorded for the patients with a high score indicating a higher level of agitation. Scores are recorded

below.

Early Stage Alzheimer Middle Stage Alzheimer

Piano

Interlude

Mozart

Easy

Listening

Piano

Interlude

Mozart

Easy

listening

21 24 22

18 20

9 12 10

5 9

29 26 30

24 26

22 20 25

18 20

14 18 11

9 13

15 18 20

13 19

Page 91: Statistics manual

Statistics Handouts

Page 91 of 92

3. A study examining differences in life satisfaction between young adults, middle

adult and older adult men and women was conducted. Each individual who participated in the study completed a life satisfaction questionnaire. A high score on

the test indicates a higher level of life satisfaction. Test scores are recorded below.

Male Females

Young Adult

Middle Adult

Older Adult

Young Adult

Middle Adult

Older Adult

4

2 3

4 2

7

5 7

5 6

10

7 9

8 11

7

4 3

6 5

8

10 7

7 8

10

9 12

11 13

Mean = 3 6 9 5 8 11

Page 92: Statistics manual

Statistics Handouts

Page 92 of 92

Lesson # 12 – Pearson Moment Correlation

Pearson Moment is one of the measures of correlation which quantifies the strength as well as direction of such relationship. The correlation coefficient (r) has

the following interpretation:

Scale ( +/ -) Decision

1.00 0.80 - 0.99

0.60 – 0.79 0.40 – 0.59 0.20 – 0.39

0.01 – 0.19 0.00

Perfect Relationship Very Strong Relationship

Strong relationship Moderate Relationship

Weak Relationship

Very Weak Relationship No relationship

Table. Result of AdNU Entrance Examinees of 20 Examinees

No. SAI RPM Math English 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

18 19 20

52 84 113 92 98 91 52 116 101 83 65 96 94 89 91 92 101

97 89 96

25 40 90 90 80 80 15 40 60 15 10 95 80 65 45 80 95

95 80 95

47 48 58 47 54 56 52 68 69 48 52 54 54 56 54 64 58

56 56 58

21 11 29 14 17 19 18 38 22 16 16 19 15 20 21 17 33

17 11 27