School of Engineering · 2018-04-11 · 1 . School of Engineering . Course : Diploma in Electronic...

1

School of Engineering

Course : Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B EGB/D/F/H/J/M207

LECTURE NOTES Revised since Apr 18

2

Content Page

• Study Plan, Assessment Plan, Formula List …..………………….....….

3

• Topic 1: Descriptive Statistics …………………………………………..…

11

• Topic 2: Linear Regression and Correlation …..…………………………

30

• Topic 3: Principles of Counting ……………………………………………

50

• Topic 4: Probability ……………………………………………..………….

64

• Topic 5: Discrete Probability Distribution ……………………………......

82

• Topic 6: Binomial and Poisson Distributions …………….………………

90

• Topic 7: Normal Distribution …………………………….....………...........

103

• Topic 8: Distribution of Sample Means ……………..………...................

121

• Topic 9: Estimation of Parameters and the t-Distribution …..…………..

129

• Topic 10: Hypothesis Testing with One Sample .………………………..

145

• Topic 11: Hypothesis Testing with Two Samples …..............................

159

Suggested Study Plan

Wk Topic Study Materials

Practice

1 Descriptive Statistics

Topic 1: Descriptive Statistics ◊ Interpret frequency distributions ◊ Describe measures of central frequency ◊ Describe measures of variability

Tutorial 1

2

Scatter diagram,

correlation & simple linear regression

Topic 2: Linear Regression & Correlation ◊ Explain relationship between two variables by

observing scatter diagram & using the correlation coefficient

◊ Perform regression analysis by using the least square regression line

◊ Evaluate the reliability of an estimation by using the correlation coefficient & the range of dataset

** Task for assignment 1 released to students **

Tutorial 2

3

Principles of Counting

Topic 3: Principle of Counting ◊ Fundamental Counting Principle ◊ Permutation and Combination ** Quiz 1: Chapters 1 and 2 **

Tutorial 3

4

Probability

Topic 4: Probability ◊ Probability experiments, types of probability ◊ Properties of Probability & its Applications ◊ Conditional Probability ◊ Types of events (independent, mutually exclusive) ◊ Multiplication & Addition Rules

Tutorial 4

5

Discrete Random Variables

Topic 5: Discrete Probability Distribution ◊ Define random variables ◊ Distinguish discrete and continuous random

variables ◊ Define a discrete probability distribution ◊ Compute mean, variance & standard deviation of

discrete random variable

Tutorial 5

6 Binomial Distribution

Topic 6A: Binomial Distribution ◊ Binomial Experiments ◊ Binomial probability function & applications ◊ Calculate mean & variance

** Quiz 2: Chapters 3, 4 and 5 **

Tutorial 6

7

Poisson Distribution

Topic 6B: Poisson Distribution ◊ Define Poisson random variable & conditions

◊ Poisson probability function & applications

◊ Calculate mean and variance

Tutorial 6

8 Normal Distribution

Topic 7: Normal Distribution ◊ Properties of Normal & Standard Normal

Distribution ◊ Compute probabilities using tables ** Due for assignment submission **

Tutorial 7

9-10 Term Break

11 Sampling Distribution

Topic 8: Distribution of Sample Means ◊ Describe sampling distribution of sample means ◊ Apply Central Limit Theorem

Tutorial 8

12

Estimation of the population

mean

Topic 9A: Estimation of Parameters ◊ Calculate point estimators of population

parameters ◊ Construct & interpret confidence intervals for the

population mean for known population standard deviation

◊ Calculate the sample size necessary for estimating population mean with specified margin of error

** E-Quiz: Chapters 7 and 8 **

Tutorial 9

13 The t-distribution

Topic 9B: Estimation of Parameters ◊ Describe the properties of the t-distribution ◊ Construct confidence interval to estimate

population mean when sample size is small with unknown population standard deviation

Tutorial 9

14 Introduction to

hypothesis testing

Topic 10: Introduction to hypothesis testing ◊ Formulate a hypothesis test ◊ Evaluate its reliability by explaining type I and II

errors ** Written assignment: Chapters 10 **

Tutorial 10

15 Testing a population

mean

Topic 10: Testing a population mean ◊ Evaluate the hypothesis of a population mean by

using the z-test or t-test

Tutorial 10

16

Testing the difference

between two population

means

Topic 11: Testing the difference between two population means ◊ Evaluate the hypothesis for the difference between

two population means by using the z-test or t-test

Tutorial 11

17 Revision Final Exam Revision

Assessment Plan AY18 S1

Assessment Methods Percentage

Components

Quizzes 20%

Assignments 30% Semester Exam 50 %

Total 100 %

5

Formula Tables

Probability

( ) ( )( )

n EP E

n S= , E is an event & S a sample space

( ) 1 ( )P E P E′ = − , E′ is a complement event of E

( ) ( ) ( ) ( )P A B P A P B P A B∪ = + − ∩

( ) ( )( )

P A BP A B

P B

∩= or ( ) ( ) ( )P A B P A B P B∩ =

A, B mutually exclusive ( ) 0P A B⇔ ∩ =

A, B independent ( ) ( ) ( )P A B P A P B⇔ ∩ =

Counting

Permutation: ( )!

!

nnPrn r

=−

Repeated objects, ik : 1 2

!! !... !

nk

m

nPk k k

=

Combination: ( )!

! !

nnCrr n r

=−

Linear Regression Line

y mx C= +) , m is the slope/gradient and C is the y-intercept

Binomial Distribution

( )( ) ( )( )( )

~

1 0 1 2 ...

1

n xn xx

X B n p

P X x C p p x n

E X np

Var X npq q p

−= = − =

=

= = −

,

, , , , ,

where

Poisson Distribution

( )

( )

( )( )

~

0 1 2 3 ...!

ox

X P

eP X x xx

E X

Var X

−= = =

=

=

, , , , ,µ

µ

µ

µ

µ

Mean and Variance of a Random Variable

The Mean of a discrete random variable X ,

( )k P X kµ = ⋅ =∑

The Variance of a discrete random variable is

2 2 2( )k P X k σ = ⋅ = − µ ∑

Measures of Central Tendency

Sample mean: x

xn

∑=

Sample variance: ( )2

2

211

sn

xx

n=

−

−

∑∑

X

6

Confidence Interval for Population Mean

2

,

30 ,

30 ,

Population Sample Confidence Intervalsize, Variance,

known any

unknown

unknown

c c

c c

c c

cn

n x z x zn ns sn x z x zn ns sn x t x tn n

σ

σ σ − + ≥ − + < − +

Testing a Mean

30

301

2Population Sample Test

size, StatisticVariance, Testing a single knownsample value

Testing a mean known any

Testing a mean unknown

Testing a mean unknownwith

nxz

xn zn

xn zs n

xtn s n

df n

σ−µ

− =σ−µ

=σ

−µ≥ =

− µ=

<

= −

7

Testing the difference of two means (Independent Samples)

Population variance

Sample Size

Test statistic

Known (Unknown

sσ ≈ ) 1 2, 30n n ≥

( ) ( )1 2 1 2

2 21 2

1 2

x xz

n n

µ µ

σ σ

− − −=

+

Unknown & 21 σσ ≠ 1 2, 30n n <

( ) ( )1 2 1 2

2 21 2

1 2

x xt

s sn n

µ µ− − −=

+

With smaller of 1 1df n= − or 2 1df n= −

Unknown & 21 σσ = 30, 21 <nn

( ) ( )1 2 1 2

1 2

1 1ˆ

x xt

n n

µ µ

σ

− − −=

+

where 1 2 2df n n= + −

and 2 2

1 1 2 2

1 2

( 1) ( 1)ˆ2

s n s nn n

σ − + −=

+ −

Testing the difference of two means (Dependent Samples)

d

d

dt sn

µ−= where

( )2

2

1d

dd

ns

n

− =−

∑∑

8

Table 1: Standard Normal Distribution

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 -3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002 -3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003 -3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005 -3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007 -3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010 -2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014 -2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019 -2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026 -2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036 -2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048 -2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064 -2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084 -2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110 -2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143 -2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183 -1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233 -1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294 -1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367 -1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455 -1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559 -1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681 -1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823 -1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985 -1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170 -1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379 -0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611 -0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867 -0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148 -0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451 -0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776 -0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121 -0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483 -0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859 -0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247 0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641

9

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

z

0 z

10

Table 2: t – Distribution

Level of

confidence, c 0.50 0.80 0.90 0.95 0.98 0.99

One tail, α 0.25 0.10 0.05 0.025 0.01 0.005 d.f. Two tails, α 0.50 0.20 0.10 0.05 0.02 0.01 1 1.000 3.078 6.314 12.706 31.821 63.657 2 0.816 1.886 2.920 4.303 6.965 9.925 3 0.765 1.638 2.353 3.182 4.541 5.841 4 0.741 1.533 2.132 2.776 3.747 4.604 5 0.727 1.476 2.015 2.571 3.365 4.032 6 0.718 1.440 1.943 2.447 3.143 3.707 7 0.711 1.415 1.895 2.365 2.998 3.499 8 0.706 1.397 1.860 2.306 2.896 3.355 9 0.703 1.383 1.833 2.262 2.821 3.250 10 0.700 1.372 1.812 2.228 2.764 3.169 11 0.697 1.363 1.796 2.201 2.718 3.106 12 0.695 1.356 1.782 2.179 2.681 3.055 13 0.694 1.350 1.771 2.160 2.650 3.012 14 0.692 1.345 1.761 2.145 2.624 2.977 15 0.691 1.341 1.753 2.131 2.602 2.947 16 0.690 1.337 1.746 2.120 2.583 2.921 17 0.689 1.333 1.740 2.110 2.567 2.898 18 0.688 1.330 1.734 2.101 2.552 2.878 19 0.688 1.328 1.729 2.093 2.539 2.861 20 0.687 1.325 1.725 2.086 2.528 2.845 21 0.686 1.323 1.721 2.080 2.518 2.831 22 0.686 1.321 1.717 2.074 2.508 2.819 23 0.685 1.319 1.714 2.069 2.500 2.807 24 0.685 1.318 1.711 2.064 2.492 2.797 25 0.684 1.316 1.708 2.060 2.485 2.787 26 0.684 1.315 1.706 2.056 2.479 2.779 27 0.684 1.314 1.703 2.052 2.473 2.771 28 0.683 1.313 1.701 2.048 2.467 2.763 29 0.683 1.311 1.699 2.045 2.462 2.756 ∞ 0.674 1.282 1.645 1.960 2.326 2.576

tt t tt−t−t− t t t

11

Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 1 : Descriptive Statistics

Objectives :

At the end of this lesson, the student should be able to: 1 Identify the type of variables 2 Interpret frequency distributions 3 Organise the data and represent through various graphical methods 4 Describe measures of central tendency – mean, median and mode 5 Describe measures of variability – range, variance and standard deviation

12

Topic 1: Descriptive Statistics

1.1.1 Variables

• A variable is the characteristic of an object that can be assigned a value or a category. Variables can be classified as two broad types: Categorical and Quantitative.

• Categorical variables refer to those that cannot be measured numerically. Quantitative variables are those that can be measured numerically.

• Categorical variables can be further divided in to two sub-categories: Ordinal and Nominal. Ordinal variables have categories with a natural ranking while nominal variables don’t.

• Quantitative variables can be further divided into two sub-categories: discrete and

continuous. Discrete variables have a countable number of values (usually integers) while continuous variables have an uncountable number of values.

The following chart illustrates the relationship between the various types of variables.

1.1.2 Population and sample

Variables

Categorical Quantitative

Ordinal

• Size of shirt: S, M, L.

• Survey results: Poor, Good, Excellent.

Nominal

• Hair colour: Brown, Black,..

• Type of drinks: Cola, coffee,…

Discrete

• Number of girls: 1, 2, 3, …

Continuous

• Body Weight (kg): 71.5, 77.2,…

13

• A population is a set of all the possible observations that can be made. A

sample is a subset of the population.

• Taking samples is usually necessary as it is very tedious to obtain every single

data from the population.

• Usually we represent a variable with capital letters such as : X, Y, M,…. We

represent observed values of the variable using small letters. For example x =

750.

Example 1.1.2-1

The Manpower Ministry wishes to conduct a survey on a Singapore citizen’s monthly

salary. A representative from the ministry conducted the survey with 100 people out

of 3.34 million Singapore citizens. Let X represents the salary of a Singapore citizen.

_____________ is the variable of interest _____________ is the observational unit _____________ is the sample size

x = 5000 is an ________________

1.2 Organising data • The raw data given will need to be organised neatly before we draw some basic

conclusions on the findings. For example, the data below does not show a clear

indication on the distribution of the various blood groups.

A A AB O B B O A O A O O O A O A O O B A B O A O AB O A O O O O A O O A O O O B B

AB O B O B O A A A AB

1.2.1 Bar chart, histogram

14

• We can sort the data into two types: ungrouped or grouped. Ungrouped data is one given as individual data points, while grouped data is one given in intervals.

• A frequency distribution is commonly used to organise data into well specified categories. Grouping of data is usually done when a variable has many different values. We can use a bar chart or histogram to have a visual view of the data.

Example 1.2.1-1 (Ungrouped data)

Using the data set on blood groups of 50 people above, use the frequency table and complete the bar chart below:

Blood Types Frequency A 14

AB 4

B 8

O 24

• To create a histogram for grouped data, we need to calculate the common interval length of each class (also known as the “bin width”). To ensure there are no gaps between each class, the class intervals will be extended by half a unit of measurement on the left and right.

Example 1.2.1-2 (Grouped data)

1 2 3 4 5 6

1

2

3

4

x

y

Blood Type

Freq

uenc

y

A AB B O 0

10

20

30

15

A dentist measured the width (in mm) of the last lower molar of 60 female adult. The

results were as follows: 7.6 10.6 8.2 10.3 9.6 7.8 10.1 8.7 9.1 7.7

8.2 9.9 10.9 9.5 10.4 8.8 9.4 9.1 9.7 9.2 8.7 9.4 9.1 7.9 9.5 9.3 8.5 10.8 8.3 8.6 10.1 9.8 8.3 10.5 8.7 9.8 7.6 9.7 10.7 10.4 9.2 9.7 8.6 8.7 8.1 9.2 9.6 10.2 8.9 9.3 8.0 9.3 8.4 9.9 8.7 11.0 8.9 10.0 8.6 8.4 The frequency distribution table is shown below:

Class Interval Class Boundaries Class Midpoint Frequency 7.6 - 8.0

7.55 – 8.05 7.8 6

8.1 – 8.5

8.05 – 8.55 8.3 8

8.6 – 9.0

8.55 – 9.05 8.8 11

9.1 – 9.5

9.05 – 9.55 9.3 13

9.6 – 10.0

9.55 – 10.05 9.8 10

10.1 – 10.5

10.05 – 10.55 10.3 7

10.6 –11.0

10.55 – 11.05 10.8 5

Complete the histogram below:

1.2.2 Stem and Leaf Plot

Width of lower molar (in mm)

Freq

uenc

y

1 2 3 4 5 6

1

2

3

4

x

y

1 2 3 4 5 6

1

2

3

4

x

y

Width of lower molar (in mm)

7.55 8.05 8.55 9.05 9.55 10.05 10.55 11.05 0

5

10

15

16

• Stem-and-leaf plot is a method for showing the frequency with which certain classes of values occur. One common approach is to let the last digit be the “leaves” and the remaining digits form the “stems”.

• Unlike the histogram or bar chart, we can still observe the individual data value in

the stem and leaf plot.

Example 1.2.2-1

The following are the scores of 20 students on a Statistics test:

83 84 77 64 71 87 72 92 57 92

75 52 80 65 79 71 87 93 96 95

Construct a stem-and-leaf plot. Solution:

First we organise the scores in ascending order:

52, 57, 64, 65, 71, 71, 72, 75, 77, 79, 80, 83, 84, 87, 87, 92, 92, 93, 95, 96

The first digit will form the “stems”, while the second digit will form the “leaves”.

1.3 Measures of Central Tendency

5

6

7

8

9

17

1.3.1 Mean

• Given a set of numerical data, 1 2, ,..., nx x x ,

Mean = 1

n

ii

x

n=∑

.

• Notation: Population mean: µ (data set contains the entire population information).

Sample mean: x (data set contains only a sample’s information).

Example 1.3.1-1 (Refer to Appendix 1.2)

Given a sample data set: 2 5 7 10 11 13, , , , , , calculate the sample mean. Solution: 1.3.2 Median • Median refers to a value such that 50% of the data is below this value and 50% of

the data is above this value. It is the middle set of data.

• Suppose our data set has n values. Step 1: Arrange the values in ascending order.

Step 2: If n is odd, then the median is the th1

2n +

number.

If n is even, then the median is the average of the th

2n

andth

12n +

numbers.

Example 1.3.2-1

For each of the following data set, find its median.

18

(a) 2, 5, 6, 4, 7, 4, 7, 2, 8, 9, 4, 11, 9, 1, 3

(b) 3, 1, 4, 7, 9, 5, 6, 8, 3, 1, 2, 9, 12, 4, 4, 15

Solution:

a) Arrange the numbers in ascending order:

1, 2, 2, 3, 4, 4, 4, 5, 6, 7, 7, 8, 9, 9, 11

Since there are 15 (odd) data values, median = ____ th value =

b) Arrange the numbers in ascending order:

1, 1, 2, 3, 3, 4, 4, 4, 5, 6, 7, 8, 9, 9, 12, 15

Since there are 16 (even) data values, median is the average of ____th and _____ th values =

1.3.3 Mode

• Mode refers to the data value that appears the most frequent (highest frequency). A set of data can have more than 1 mode.

Example 1.3.3-1

For each of the following data sets, find the mode.

(a) 2, 5, 6, 4, 7, 4, 7, 2, 8, 9, 4, 11, 9, 1, 3

(b) 2, 3, 5, 9, 6, 4, 7, 4, 7, 2, 8, 9, 2, 9, 1, 3

Solution:

(a) Mode(s) is / are ____________

(b) Mode(s) is / are ____________

1.4 Measures of Dispersion

1.4.1 Range, Interquartile Range

19

• Range of a data set = Largest value – Smallest value.

• Quartiles: Q1, Q2, Q3. Q1 refers to the value such that 25 % of the data values are below it. Q2 is the median of the data and Q3 is the value such that 75 % of the data values are below it.

• Interquartile range = Q3 – Q1 • Procedure: Step 1: Find Q2 (the median).

Step 2: Find Q1 (the median of the data values below Q2).

Step 3: Find Q3 (the median of the data values above Q2).

Step 4: Interquartile range = Q3 – Q1.

Example 1.4.1-1

The annual profit (rounded to millions of dollars) of 12 randomly selected companies in 2013 are as follows:

8 12 7 17 14 45 10 13 17 13 9 11 Find the values of the range, the three quartiles and the interquartile range. Solution: Arrange the numbers in ascending order

7, 8, 9, 10, 11, 12, 13, 13, 14, 17, 17, 45

Range = Q2 (median) = 12 13 12.52+

=

Q1 = median of {7, 8, 9, 10, 11, 12} =

Q3 = median of {13, 13, 14, 17, 17, 45} =

Hence IQR = Q3 – Q1 =

1.4.2 Variance and Standard deviation

• Variance is a measurement of the spread of data, in particular how each data value deviates away from the mean.

20

Higher variance Lower variance

• Notation:

Population variance: 2

2 2 = x

Nσ µ−∑ , where N is the

total population size.

Sample variance: ( )2

2 21 = 1

ii

xs x

n n

∑ − −

∑ , where n is the sample size.

• Standard deviation = variance . ** In this course we will focus on using a scientific calculator to obtain the values of sample mean and variance from a sample data set. **

Example 1.4.2-1 (Refer to Appendix 1.2)

Consider the sample data marks for a mathematics class test (upon 10)

1, 5, 4, 2, 6, 2, 1, 1, 5, 3

Calculate the sample standard deviation.

Solution:

Appendix 1.1 Using Excel for Descriptive Statistics

A1.1.1 Create Bar Chart and Histogram

From Excel, go to Options add-ins analysisAnalysis TookPak.

• Refer to Example 1.2.1-1 (Bar Chart)

21

Step 1: Enter two columns of information, one column for categories and the other

for the frequency.

Step 2: To create a heading for the bar chart, you may key in the heading at the cell above the frequency values’ column.

Step 3: Highlight both columns, proceed to “Insert” tab, select “Column” and “Clustered Column.

• Refer to example 1.2.1-2 (Histogram)

22

Step 1: Copy and paste the data values into an excel spread sheet. Also copy the

the upper limit of the class boundaries on a separate column.

Step 2: Proceed to the “Data” tab, click on the “Data Analysis” option and choose

“Histogram”.

Step 3: Under “Input Range”, highlight all the cells containing the data values.

Under the “Bin Range”, highlight the cells containing the upper limits of the class boundaries (including the header).

Also select the option “Chart Output”.

23

Step 4: Once the diagram is generated, highlight on of the bars and right click to

select “Format Data Series”. Select the gap width to 0.

24

Step 5: You can relabel the class intervals and also the header, etc.

A1.1.2 Calculate mean, median, standard deviation and quartiles

• Refer to Example 1.4.1-1. To calculate the mean of the data: Step 1: Key in the data values in a column and choose the tab “Formulas”, choose “Statistical” and the option “Average”. Step 2: Highlight the values to be computed.

25

• To obtain the median, repeat the same steps as above, choose the function “Median” instead of “Average”.

• To obtain the standard deviation, repeat the same steps as above, choose the function “STDEV.P” if the data is the population data or “STDEV.S” if the data comes from a sample.

• To find the quartiles: Step 1: Arrange the values in ascending order by highlighting the column of data

value, right click and select “Sort” followed by “”smallest to largest”.

Step 2: Use the “Median” function to find Q2. Now apply the “Median” function again on the set of data values lower than Q2 to obtain Q1. Lastly apply the “Median” function again on the set of data values higher than Q2 to obtain Q3.

26

Appendix 1.2 Using Calculator to obtain mean and standard deviation

Step 1: To enter data into a list:

• Mode 2: STAT 1: 1 – Var

• Key in a number and press “ = ” to input the value.

• If frequency list is required:

Shift Setup Down ( ) 4: Frequency

Step 2: To calculate sample standard deviation

• Exit the data input screen by pressing “AC”. • Press “Shift” 1: STAT 4: VAR 4: xs

Step 3: To calculate sample mean

• Exit the data input screen by pressing “AC”. • Press “Shift” 1: STAT 4: VAR 2: x .

Mode Shift

STAT

27

Step 1: To key in data in the calculator

• Mode 1: STAT 0: SD.

• Key in a number and press “ DATA ” to store the value in the list.

Step 2: To calculate sample standard deviation and sample mean

• Press “ALPHA” + “5” for sample standard deviation xs .

• Press “ALPHA” + “4” for sample mean x .

Mode

DATA

x

xs

28

Tutorial 1: Descriptive Statistics Tutorial

1 Classify each of the variables below by placing a “” in the correct categories.

E.g. Flavour of milk Categorical Nominal Ordinal Quantitative Discrete Continuous

(i) Age of a driver (in whole numbers)

Categorical Nominal Ordinal Quantitative Discrete Continuous

(ii) Gender of a driver Categorical Nominal Ordinal Quantitative Discrete Continuous

(iii) Colour of bag Categorical Nominal Ordinal Quantitative Discrete Continuous

(iv) Volume of drink Categorical Nominal Ordinal Quantitative Discrete Continuous

(v) Size of a shirt (S, M, L, XL)


(vi) Number of students


(vii) Examination grades


2 Consider the following 2 sets of sample data

A: 3, 4, 5, 5, 5, 6, 8, 10, 11, 11, 11, 12, 12, 14, 18

B: 3, 4, 5, 5, 5, 5, 6, 8, 10, 11, 11, 11, 12, 12, 14, 18

For each sample, find the following by using the calculator:

(i) mean, (ii) median, (iii) mode, (iv) interquartile range, (v) range, (vi) standard deviation

3 The daily number of Internet system crashes is observed over 30 days at a university computer centre. The daily Internet system crashes are shown in the table below:

Complete the frequency table and compute its mean, median and mode.

Value, x 0 1 2 3 4 5 6 Frequency

1 3 1 1 0 1 0 1 1 0

2 2 0 0 0 1 2 1 2 0

0 1 6 4 3 3 1 2 4 0

29

Answers

1

2 i) 9, 8.75A Bx x= = ii) median 10A = , median 9B =

iii) mode A = 5, 11, mode B = 5,

iv) IQRA = 7, IQRB = 6.5,

v) range A = range B = 15,

vi) 4.28, 4.25A Bs s= =

3

Value, x 0 1 2 3 4 5 6 Frequency 9 10 5 3 2 0 1

i mean = 1.43 ii median = 1 iii mode = 1

E.g. Flavour of milk Categorical Nominal Ordinal Quantitative Discrete Continuous

(i) Age of a driver (in whole numbers)


(ii) Gender of a driver Categorical Nominal Ordinal Quantitative Discrete Continuous

(iii) Colour of bag Categorical Nominal Ordinal Quantitative Discrete Continuous

(iv) Volume of drink Categorical Nominal Ordinal Quantitative Discrete Continuous

(v) Size of a shirt (S, M, L, XL)


(vi) Number of students


(vii) Examination grades


30

Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 2 : Linear Regression and Correlation

Objectives :

At the end of this lesson, the student should be able to: 1 explain the line of best fit and linear correlation between two variables 2 find the correlation coefficient and the equation of regression line 3 interpret the output from simple linear regression analysis and find the

predicted values

Topic 2: Linear Regression and Correlation

31

2.1 Introduction

• In the previous chapter we have been dealing with data of one variable. In this chapter, we will study data with two variables and the relationship between them.

• Examples of data with two variables:

(a) Class test results vs final exam results, (b) Blood pressure vs age of a person, (c) Price of a car vs price of a 3 room HDB flat.

• An overview of this chapter is shown below:

2.2 Scatter Diagram

• Given pairs of observed data ( ) ( ) ( )1 1 2 2, , , ,..., ,n nx y x y x y , we can plot them on the x

– y axes to obtain a scatter diagram.

• Scatter diagrams are useful as they provide visual information whether variables X

and Y share any special relationship. Some examples of scatter diagrams are

shown below:

Scatter diagram Correlation Coefficient

Linear Regression Estimation / Prediction

32

Diagram 2.2.1 Diagram 2.2.2

Diagram 2.2.3 Diagram 2.2.4

• Variables X and Y have a positive relationship if X increases, Y increases (i.e. there is a upward trend). Variables X and Y have a negative relationship if X increases, Y decreases (i.e. there is a downward trend).

• Variables X and Y have a linear relationship if the observed values of X and Y can be described using a straight line equation y mx c= + .

• Diagrams 2.2.1 and 2.2.3 illustrate a positive and negative linear relationship

between X and Y respectively. Diagram 2.2.2 shows that X and Y have a curvilinear relationship while Diagram 2.2.4 shows that X and Y have no obvious relationship.

2 4 6 8 10 12 14 163

5

7

9

x

y

● ●

● ●

●

● ●

● ●

●

2 4 6 8 10 12 14 165

7

9

x

y

33

Example 2.2-1

For each of the scatter plot shown below, describe whether

(i) the relationship between X and Y is positive or negative,

(ii) the relationship between X and Y is linear, curvilinear or not obvious.

(a)

(b)

Solution:

a) The relationship between X and Y is _____________ and ________________.

b) The relationship between X and Y is _____________ and ________________.

• In some situations we may have outliers (observation points that is distant from

the rest) that will distort the shape of the scatter plot. We may need to apply

transformation of the data values so that the relationship between the variables

is more visible.

110115120125130135140145150155160

40 50 60 70 80

Pres

sure

,y

Age, x

Graph of Blood Pressure Vs Age

40

50

60

70

80

90

100

0 5 10 15

Gra

de, y

Absence, x

Graph of GradeVs Absence

34

Example 2.2-2

We want to investigate the relationship between body and brain weights of different

animals. The scatter diagram of the brain weights (in grams) and the body weights (in

kilograms) of 28 animals is shown below:

We observed at least one point that is distant from the rest. Hence it distorts the shape

of the scatter diagram and we may not see any clear relationship between the

variables.

We can apply transformation by taking logarithm to both the observed values of body

and brain weight and plot the scatter diagram shown below:

Now we observe that the new variables exhibit a clearer linear relationship.

0

1000

2000

3000

4000

5000

6000

0 20000 40000 60000 80000 100000

Brai

n W

eigh

t (in

Gra

m)

Body Weight (in Kilogram)

Graph of Brain Weight Vs Body Weight

0

1

2

3

4

-2 -1 0 1 2 3 4 5 6

Log

of B

rain

Wei

ght)

Log of Body Weight

Graph of Log Brain Weight Vs Log Body Weight

outlier

35

2.3 Correlation Coefficient

• Sometimes it is difficult to use our eyes to determine through the scatter diagram

whether there is indeed a linear relationship between the variables. (Refer to

Example 2.2.1).

• Therefore we will need precise mathematical calculation to help us determine the

degree of linearity in the relationship between two variables. This is done through

the Pearson product moment correlation coefficient, r. • The table below shows how the correlation coefficient, r indicates the linear

relationship between two variables.

1 1r− ≤ ≤ Strength of linear relationship Positive Negative

Perfect 1r = 1r = − Very strong 0.8 1r≤ < 1 0.8r− < ≤ −

Strong 0.4 0.8r≤ < 0.8 0.4r− < ≤ − Weak 0.2 0.4r≤ < 0.4 0.2r− < ≤ −

Little / no relationship 0 0.2r≤ < 0.2 0r− < ≤ • The examples below show the shape of various scatter diagrams and r values.

Very strong, positive correlation Strong positive correlation Weak positive correlation

Very strong, negative correlation Strong negative correlation Weak negative correlation Source of data: http://www.seeingstatistics.com/seeing1999/resources/opening.html

36

• We can obtain the correlation coefficient of a data set through EXCEL. The

correlation coefficient is obtained from “Multiple R” and the sign of “X variable

coefficient”. The two examples below illustrate how to interpret the summary output

from EXCEL.

Example 2.3-1

For each of the summary output below, state the value of the correlation coefficient

and describe the relationship between X and Y

(a)

(b)

SUMMARY OUTPUT

Regression Statistics Multiple R 0.896673 R Square 0.804022 Adjusted R Square 0.755028 Standard Error 5.641091 Observations 6

Coefficients Standard

Error t Stat P-value Lower 95% Upper 95%

Low 95.0

Intercept 81.04809 13.88088 5.838829 0.004289 42.50858 119.5876 42.5 X Variable 1 0.964381 0.238061 4.050984 0.015463 0.303418 1.625344 0.30

SUMMARY OUTPUT Regression Statistics

Multiple R 0.944215 R Square 0.891542 Adjusted R Square 0.869851 Standard Error 6.054643 Observations 7



Lower 95.0%

Intercept 102.4925 5.138068 19.94768 5.85E-06 89.28471 115.7004 89.284 X Variable 1 -3.62189 0.564949 -6.411 0.00137 -5.07414 -2.16964 -5.074

37

Solution:

a) The correlation coefficient, r = ______________. X and Y has a _____________,

____________ linear relationship.

b) The correlation coefficient, r = ______________. X and Y has a _____________,

____________ linear relationship.

2.4 Simple Linear Regression

• If scatter diagram and correlation coefficient indicate that two variables share a

linear relationship, we will model them using a straight line equation and see how

one variable (dependent variable) changes its value according to another variable

(independent variable).

• Some examples of dependent and independent variables Dependent variables Y Independent variables X

Blood pressure Age

Sales of cold drinks Climate temperature

Price of house bought Monthly salary

Final exam score Number of lessons missed

• Usually we will denote the independent variable as X and dependent variable as

Y.

• A linear regression line Y on X is of the form y mx c= + , where the values of m

and c can be obtained from EXCEL summary output (refer to Example 2.3-1): o m = coefficient of X variable,

o c = coefficient of Intercept.

• The equation of the linear regression line (best fit line) is obtained using the

principle of least squared error (refer to Appendix 2.2).

38

Example 2.4-1

The table below shows the high school GPA and the college GPA at the end of the 1st

year for 10 different students:

Student High School GPA, x College GPA, y 1 2.7 2.2 2 3.1 2.8 3 2.1 2.4 4 3.2 3.8 5 2.4 1.9 6 3.4 3.5 7 2.6 3.1 8 2.0 1.4 9 3.1 3.4 10 2.5 2.5

(a) Using the summary output

(i) state the correlation coefficient and the relationship between the two variables.

(i) write the equation of the regression line Y on X.

SUMMARY OUTPUT Regression Statistics

Multiple R 0.843923 R Square 0.712206 Adjusted R Square 0.676232 Standard Error 0.433342 Observations 10

ANOVA

df SS MS F Significance F Regression 1 3.717716 3.717716 19.79767 0.002141 Residual 8 1.502284 0.187786 Total 9 5.22



Intercept −0.95037 0.831773 -1.14258 0.286254 -2.86844 0.967706

X Variable 1 1.346999 0.302733 4.449458 0.002141 0.648895 2.045103

1

1.5

2

2.5

3

3.5

4

1.8 2.3 2.8 3.3 3.8

Colle

ge G

PA,y

High School GPA, x

39

(b) Using the equation of the line in part (aii), find the college GPA if the High School GPA is 3.6.

(c) Using the line of best fit in part (aii), find the High School GPA if the College GPA is 2.3.

Solution:

(ai) The correlation coefficient is _______________. The two variables has a

___________ , ______________ and _____________ relationship.

(aii) The linear regression line Y on X is _______________________.

(b) when 3.6x = , (c) when 2.3y = ,

40

Example 2.4-2

The Financial World magazine uses its own complex formula to estimate how much

the following brand names would be worth in cash. The table gives the brand name,

its value in billions of dollars, Y and the company’s revenue in billions, X:

(a) Using the summary output, write the equation of line of best fit.

(b) Using the equation in (a), find the value of the brand name if the company’s revenue is $5 billion, $10 billion and $25 billion,

Brand Name Revenue Value Marlboro 15.4 31.2 Coca-Cola 0.4 4.4 Budweiser 6.2 10.1 Pepsi-Cola 5.5 9.6 Nescafe 4.3 8.5 Kellogg 4.7 8.4 Winston 3.6 6.1 Pampers 4 6.1 Camel 2.3 4.4 Campbell 2.4 3.9 Nestle 6 3.7 Hennessy 0.9 3 Heineken 3.5 2.7 Johnnie Walker 1.5 2.6 Louis Vuitton 0.9 2.6 Hershey 2.6 2.3 Guinness 1.8 2.3 Barbie 0.8 2.2 Kraft 2.8 2.2 Smirnoff 1 2.2 Del Monte 2.3 1.6 Wrigley's 1 1.5 Schweppes 1.3 1.4 Tampax 0.6 1.4 Heinz 0.8 1.3 Quaker 1.1 1.2

Brand Name Revenue Value Colgate 1.1 1.2 Gordon's 0.6 1.1 Hermes 0.5 1 Kleenex 0.7 0.8 Carlsberg 0.8 0.7 Haagen-Dazs 0.5 0.6 Fisher-Price 0.6 0.6 Nivea 0.9 0.6 Sara Lee 0.8 0.5 Oil of Olay 0.6 0.5 Planters 0.7 0.5 Green Giant 1 0.4 Jell-o 0.3 0.4 Band-Aid 0.2 0.2 Ivory 0.4 0.2 Birds Eye 0.3 0.2 Source of data: Financial World, August 12, 1992

ANOVA df SS MS F Significance F

Regression 1 968.6591 968.6591 337.9662 4.11E-21 Residual 40 114.6457 2.866142 Total 41 1083.305


Error t Stat P-value Lower 95%

Intercept -0.55226 0.333114 -1.65787 0.105167 -1.22551 X Variable 1 1.819783 0.098988 18.38386 4.11E-21 1.61972

41

Solution:

(a) The regression line is: _______ ________*y x= +

(b) Substituting the values 5, 10 and 25 in for x and computing the values of y yield

the following predicted values of brand names when the revenues are $5, $10

and $25 billion:

Revenue Predicted value

$5 billion −0.55226 +1.819783 (____) = $___________ billion

$10 billion −0.55226 +1.819783 (____) = $___________ billion

$25 billion −0.55226 +1.819783 (____) = $___________ billion

2.5 Reliability of estimation / prediction

(a) Regression and correlation analysis only attempts to find a relationship

between two variables. Even if there is a very strong linear relationship,

we cannot conclude any causation between the variables.

(i.e. If there is a strong positive linear relationship between blood

pressure and age, we cannot conclude that age causes high blood

pressure).

(b) Given a value of the independent variable X, we say the estimation on Y is reliable if

the x value is within the data range of X provided,

the correlation coefficient r is close to 1 or 1− .

42

Appendix 2.1 Using EXCEL for Regression and Correlation

A2.1.1 Plot scatter diagram and regression line

• Refer to Example 2.4-1 Step 1: Highlight the data values for x and y (x on left y on right column) and go to

“Insert” tab, select “Scatter”. You may use the chart tools to add titles or to do adjustment of other settings.

Step 2: You can highlight the scatter diagram under “Chart Tools”, choose the tab

“Layout” and select “Trendline” “Linear Trendline”. Under “Trendline

Options” you may check the box “display equation on chart” to obtain the

equation of the linear regression line.

A2.1.2 Generate Summary Output

43

• Using Example 2.4-1

Step 1: Ensure you have the Analysis TookPak installed in Excel (Chapter 1).

Under “Data” tab “Data Analysis” Regression.

Step 2: Highlight the data cells for the X and Y variables respectively. Click “OK”

and the summary output will be generated.

Appendix 2.2 Principle of Least Squared Error

44

• Given a scatter diagram, there can be a few possible straight lines to model the

linear pattern. To avoid any ambiguity, statisticians adopt the “best fit” line that

meets the criteria of having the least squared error value.

• The “dots” on the scatter diagram represent the actual observed values of the

variables, while the values on the best fit line are just estimations.

• The errors “ 1,..., ne e ” represent the difference between the actual and estimated

values (error). The line that has the least value of 2

1

n

rr

e=∑ is the best fit line.

Tutorial 2: Linear Regression and Correlation

A Self Practice Questions

110

120

130

140

150

160

40 50 60 70 80

Bloo

d Pr

essu

re,y

Age, x

Graph of Blood Pressure Vs Age

45

1 Each table gives the summary output of the linear regression analysis of y on x. Write down the correlation coefficient and comment on the relationship between x and y.

(a)

(b)

2 Write down the equation of the regression line in the form y mx c= + for

Question 1 (a). Explain what do the values of m and c represent. 3 Using your answer to Question 2, estimate the value of y when 20x = .

46

B Discussion Questions

1 A survey is conducted on the relationship between the maximum height in feet of the roller coasters and their top speeds in miles per hour. The scatter diagram and the Excel summary output of the data are given below:

SUMMARY OUTPUT

Regression Statistics Multiple R 0.89321 R Square 0.79782 Adjusted R Square 0.77760 Standard Error 6.29004 Observations 12

Coefficients Standard Error t Stat P-

value Lower 95%

Upper 95%

Intercept 39.06121 8.16739 4.7826 0.0007 20.8631 57.2593 X Variable 1 0.170724 0.02718 6.2818 9.1E05 0.11017 0.23128

(i) State the correlation coefficient and comment on the relationship between the

height of roller coaster ( ) and the top speed ( ). (ii) Find the equation of the line of best fit. (iii) What is the predicted top speed for a new roller coaster of height 325 feet?

Top

Speed

Height of Roller Coaster

Top Speed Vs Height of Roller Coaster

x y

47

(iv) What must be the height of a new roller coaster if it is designed to go at a top speed of 90 miles per hour?

2 A study is carried out to investigate the relationship between the mid-parent’s height and the daughter’s height. Mid-parent’s height is the average of father’s and mother’s heights. The heights of eleven female students and their mid-parent’s heights in inches were collected. The scatter diagram and the Excel summary output of the data are given below:

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.8504 R Square 0.7232 Adjusted R Square 0.6924 Standard Error 1.4506 Observations 11.000

(a) State the correlation coefficient and comment on the relationship between mid-parent’s height ( ) and daughter’s height ( ). (b) Find the equation of the line of best fit, . (c) Predict the daughter’s height if the mid-parent’s height is 69 inches. (d) Briefly state the physical significance of the coefficients a and b. (e) Comment on the reliability of the estimate of daughter’s height when the mid –

parent’s height is 73 inches.

Daug

hter

's He

ight

s(in

ches

)

Mid-parent's Heights (inches)

Daughter's Height Vs Mid-parent's Heights

x y

y = a +bx

Coefficients Standard Error t Stat

Intercept 1.6497 13.363 0.1235

X Variable 1 0.9555 0.1971 4.8487

65 67 68 69 70 71

48

3 In an experiment involving two chemicals x and y, a researcher recorded observations of values of y for controlled values of x and the summary output and scatter diagram are shown below:.

(a) Explain whether a linear model is appropriate.

(b) The researcher realised that some of the observations came from contaminated materials. He then considered only the seven pairs of observations for which the values of x exceeded 6 and discarded the other observations.

(i) On the scatter diagram, circle the data points that are to be removed.

(ii) The researcher proposed two models for the remaining seven pairs of

data:

Model A: y = ax2 + b, correlation coefficient, 0.912961r = −

Model B: y = a ln x + b, correlation coefficient, 0.970794r = −

where a and b are constants

State which model is a better choice, giving a reason for your choice.

(iii) Hence using the better model with 2.69, 11.0a b= − = , estimate the value

of x when y = 6.1. For this model, comment on the validity of this

estimated value.

x

y

4.4 12.3

2.7

6.3

49

Answers

A1 (a) 0.972r = − (b) 0.948r =

A2 0.0636 22.7y x= − +

A3 estimated value of 21.4y =

B1 (i) 0.893r = (ii) 0.171 39.1y x= + (iii) 94.5 mph

(iv) 298 feet

B2 (a) 0.850r = (b) 0.956 1.65y x= + (c) 67.6 inches

B3 (bi) (4.4,3.2) , (5.1, 2.7) (bii) Model B (biii) estimated 6.18x =

50

Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 3 : Principles of Counting

Objectives :

At the end of this lesson, the student should be able to: 1 understand the counting principles 2 explain the difference between permutations and combinations 3 use permutations and combinations in counting problems

51

Topic 3: Principles of Counting

3.1.1 Multiplication Principle

• Suppose that the hardware you wish to create consists of three parts: (a) a battery,

(b) a LED screen and (c) a memory card. There are two types of batteries, three

types of LED screens and two types of memory cards available. How many

products of different specifications can you create?

In total there are _____________ specifications of the product we can create.

I.e. ______ x _______ x _________ = 12 ways.

Battery A

Battery B

LED 1

LED 2

LED 3

LED 1

LED 2

LED 3

Card 1

Card 2

Card 1

Card 2

Card 1

Card 2

Card 1

Card 2

Card 1

Card 2

Card 1

Card 2

Spec 1

Spec 2

Spec 3

Spec 4

Spec 5

Spec 6

Spec 7

Spec 8

Spec 9

Spec 10

Spec 11

Spec 12

52

• In a counting event whereby it can be broken down into n stages. If there is m1

ways for step 1, m2 ways for step 2, …, mn ways for step n. Then by multiplication

principle, there are a total 1 2 ... nm m m× × × ways.

Example 3.1.1-1

A female student is preparing for her graduation dance party and her wardrobe

contains 4 blouses, 7 skirts, 6 pairs of shoes and 3 sets of jewellery and 5 handbags.

Assuming that all the items can be matched in terms of colours and styles, how many

possible ways could she dress herself up?

Solution:

Example 3.1.1-2

A typical PIN (personal identification number) consists of any four letters followed by

two numeric digits. How many different PINs are possible if

(a) repetition of alphanumeric are allowed;

(b) such repetition is not allowed.

Solution:

53

Example 3.1.2-3 The diagram below shows the keypad for an automatic teller machine. The same

sequence of keys represents a variety of different PINs. For instance, 2133, AZDE

and BQ3F are all keyed in exactly the same way.

(a) How many different PINs are represented by the same sequence of keys as

2133?

(b) How many different PINs are represented by the same sequence of keys as

6809?

Solution:

1 QZ

2 ABC

3 DEF

4 GHI

5 JKL

6 MNO

7 PRS

8 TUV

9 WXY

0

54

3.1.2 Addition Principle

• Suppose you are considering to buy a laptop from three brands, Acer, Lenovo and

Dell. Acer, Lenovo and Dell has three, five and two different models available to

choose from. How many choices can you have to buy your laptop?

Case 1: Buy from Acer: _________ choices

Case 2: Buy from Lenovo: __________ choices

Case 3: Buy from Dell: _________ choices

In total there are __________________________ choices.

• In a counting event whereby it can be broken down into n non – overlapping cases

and there are m1 ways for case 1, m2 ways for case 2, …, mn ways for case n. Then

by addition principle, there are a total 1 2 ... nm m m+ + + ways.

3.2 Permutation

• Permutation involves arrangement of objects whereby if we switch their positions

we will get a different outcome.

• Examples include: (a) Arranging alphabets / digits to form words or codes,

(b) Arranging people in a formation to take photographs.

55

• A summary of the various scenarios on permutation is shown below:

Example 3.2-1

A class of 10 students consist of 6 men and 4 women.

(a) How many ways can all of them arrange themselves in a row?

(b) How many ways can we arrange 6 of them in a row?

Solution:

Given n objects and arrange r of them, 0 r n≤ ≤

n objects all distinct

n rP

• Example 3.2-1,2 • !n nP n=

• !

( )!n rnP

n r=

−

r = n n objects not all distinct k1 objects identical of type 1, k2 objects identical of type 2,…, km objects identical of type m

1 2

!! !... !m

nk k k

• Example 3.2-3

56

Example 3.2-2

A biologist has decided to use colours to label the collection of cell specimens in the

laboratory. If he has 5 colours (red, blue, green, yellow and pink) to choose from, how

many 3-colour codes can he make with no repetitions of each colour selected?

Solution:

Example 3.2-3

(a) How many ways can we arrange all the letters in the word “RANDOM”? (b) How many ways can we arrange all the letters in the word “ENGINEERING”? Solution:

3.3 Combination

• Combination involves selection of objects whereby the positions of the objects

does not matter.

• Examples include: (a) Forming a team of 5 people out of 10 people,

(b) Choosing 5 balls from a box full of balls in various colours.

57

• n distinct objects select r of them, 0 r n≤ ≤ . Number of selections:

!( )! !n r

nCn r r

=−

• Alternate notation: nrC or

nr

.

Example 3.3-1 3 different species of orchid are to be selected from 20 unique species for cross-

breeding. How many possible selections can be made?

Solution: Example 3.3-2 The manager of a marketing department wants to form a four- person committee from

the 15 employees in the department. In how many ways can the manager form this

committee?

Solution:

58

3.4 More counting examples

Example 3.4-1 (Cases)

A four-member research team is to be chosen from 6 men and 5 women.

(a) How many teams can be formed if there are no restrictions?

(b) How many teams can be formed if there must be more men than women?

Solution:

Example 3.4-2

Eleven cards each bear a letter, and together they can be made to spell the word

“EXAMINATION”. Three cards are selected from the eleven cards and the order of

selection is not important. Find how many selections can be made

(i) if the three cards all bear different letters,

(ii) if two of the three cards bear the same letter.

Solution:

59

Example 3.4-3 (Complement & Slot – in Method)

In how many ways can the letters of the word “EXCELLENCE” be arranged if

(i) the four E’s are not all together,

(ii) the four E’s are all separated.

Solution:

60

Tutorial 3: Counting


Permutation

1 Using the letters from the word COMPUTER, find

(i) the number of words that can be formed using all the letters,

(ii) the number of 4 – letter words that can be formed.

2 Find the number of 3-digit PIN codes that can be formed using the digits 1, 2, 3, 4 ,5 ,6 if

(i) no repetitions are allowed,

(ii) repetitions are allowed.

3 In how many distinguishable ways can the letters in the following words be arranged?

(a) PAPAYA (b) PERMUTATIONS

Combination

4 Space shuttle astronauts each consume an average of 3000 calories per day. One meal normally consists of a main dish, a vegetable dish, and two different desserts. The astronauts can choose from 10 main dishes, 8 vegetable dishes, and 13 desserts. How many different meals are possible?

5 In a class of 20 people there are 13 girls and 7 boys. Find the number of ways to form a committee of 8 members if

(i) there are no restrictions,

(ii) the committee is made up of all girls,

(iii) there is exactly 1 boy in the committee,

(iv) there are less than 2 boys in the committee.

61

6 In a box there are 3 green, 5 red, 7 yellow and 6 blue balls. The balls are identical except for the colours. Find the number of ways to select 2 balls of different colours.


1 (a) How many different ways can three of the letters of the word BYTES be chosen and written in a row?

(b) How many different ways can this be done if the first letter must be “B”?

2 Janet has 10 different books that she is going to put on her bookshelf. Of these, 4 are Chemistry books, 3 are Biology books, 2 are Statistics books, and 1 Physics book. Janet wants to arrange her books so that all the books dealing with the same subject are together on the shelf. How many different arrangements are possible?

3 In how many ways can three distinct letters and two distinct digits be arranged if

(i) there is no restriction, (ii) the letters must come first, (iii) the digits must always be together.

4 Find the number of distinguishable ways the word STATISTICS can be arranged

(i) without conditions, (ii) if the letter “T”s must be together,

(iii) if no two “T”s are together.

5 A sample of 5 mice is to be chosen from 7 male and 6 female mice. In how many ways can the sample be selected if it must have at least 2 male and 1 female mice?

6 A shipment of 10 microwave ovens contains two defective units. In how many ways can a restaurant buy three of these units and receive

(a) no defective units? (b) one defective unit? (c) at least two non-defective units?

62

7 Four sales representatives for a company are to be chosen to participate in a training program. The company has eight sales representatives, two in each of four regions. In how many ways can the four representatives be chosen if

(a) there are no restrictions? (b) the selection must include a sales representative from each region? (c) the selection must be from only two of the four regions?

8 There are 10 students who are going to spend the evenings in 2 groups; one group goes to the Library and the other plays football. In how many ways can the group for football be selected if there must be at least 4 people in each group?

63

Answers

A1 i 40320 ii 1680

A2 i 120 ii 216

A3 a 60 b 239 500 800

A4 6240

A5 i 125970 ii 1287 iii 12012 iv 13299

A6 161

B1 a 60 b 12

B2 6912

B3 i 120 ii 12 iii 48

B4 i 50400 ii 3360 iii 23520

B5 1155

B6 a 56 b 56 c 112

B7 a 70 b 16 c 6

B8 672

64

Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 4 : Probability

Objectives :

At the end of this lesson, the student should be able to: 1 describe probability experiments 2 calculate classical and conditional probabilities 3 distinguish between independent and dependent events 4 apply the multiplication rule 5 identify mutually exclusive events 6 apply the addition rule

Topic 4: Probability

65

4.1 Sample space and events

• Suppose we throw a die once and look at the number that appears on the top face.

The set of all possible outcomes is {1, 2, 3, 4, 5, 6}. Suppose we are interested

in getting an even number, then the set of outcomes of this event is {2, 4, 6}.

• Sample space is the set of all possible outcomes in an experiment. Event is a subset of the sample space.

Example 4.1-1

An unbiased coin is tossed three times, list out the sample space and the event in which there is exactly one head.

Solution:

4.2 Probability of single events

• Probability is a measure of how likely an event will happen in an experiment.

• In an experiment, if each outcome in the sample space is equally likely to happen:

P =no. of outcomes in event(Event)

no. of outcomes in sample space

66

Example 4.2-1 (List of outcomes)

Following Example 4.1-1, calculate the probability that there is exactly one head.

Solution:

Example 4.2-2 (Frequency Table)

The table below shows the gender and blood pressure categories of 300 participants.

Blood Pressure Female Male Row Total Normal 39 25 64

Pre-hypertension 61 50 111 High Stage 1 42 47 89 High Stage 2 20 16 36

Column Total 162 138 300

A participant is randomly chosen. Calculate the probability that

(a) the participant is male,

(b) the participant has high stage 2 blood pressure.

Solution:

67

Example 4.2-3 (Counting)

A four-member research team is to be chosen from 6 men and 5 women. What is the

probability that the team formed has more men than women?

(Refer to Example 3.4-1)

Solution:

Number of teams without restriction = 330

Number of teams with more men than women = 115

P (team has more men than women) =

4.3 Probability involving multiple events

4.3.1 Complement Event

• Given an event E, its complement event, 'E is the set of outcomes in the sample

space that is not in E.

• ( ) 1 ( ')P E P E= −

Example 4.3.1-1

Referring to Example 4.2-3, find the probability that the team formed has at most two

men.

Solution:

Let E = { team of 4 has at most two men }. Observe that E’ = { team of 4 has more men

than women }.

( )P E∴ =

E 'E

68

4.3.2 Intersection and union of two events

• Let A, B be two events. The intersection of them is known as “A and B” (notation:

A B∩ ) refers to the set of outcomes that is common to both A and B.

• The union of A and B is known as “A or B” (notation: A B∪ ) refers to the set of

outcomes that is either in A or B.

• Addition formula: ( ) ( ) ( ) ( )P A B P A P B P A B∪ = + − ∩ .

Example 4.3.2-1

Refer to the table in Example 4.2-2, find the probability that

(i) a participant is a female and has high stage 2 blood pressure,

(ii) a participant is a male or has pre-hypertension.

Solution:

4.3.3 Mutually exclusive events

A B A B∩

A B A B∪

69

• Two events A and B are mutually exclusive if they share no common outcome.

I.e. ( ) 0P A B∩ = .

Example 4.3.3-1

Suppose we draw a card from a standard deck of poker cards. Find the probability that the card is a “4” or an ace.

Solution:

4.3.4 Conditional events

• The conditional event A given B (notation: |A B ) refers to the event that A will

occur based on the knowledge that B has occurred.

• For example, we draw two cards from a deck of 52 poker cards without

replacement. Let B be the event that the first card is an ace of heart, A be the event

that the second card drawn is an ace. Then |A B = {ace of spade, ace of diamond,

ace of club}

• ( )( | )( )

P A BP A BP B∩

= .

Example 4.3.4-1

B A

70

Two ordinary dice are thrown. Let A be the event that the numbers shown on both dice

are equal, B be the event that the total sum of the two numbers is 8. Calculate ( | )P A B

and ( | )P B A .

Solution: A = { (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6) }

B = { (2, 6), (6, 2), (3, 5), (5, 3), (4, 4) }

A B∩ = { (4, 4) }

( | )P A B =

4.3.5 Independent Events • Two events A and B are independent if the probability of one event occurring does

not affect the probability of the other event occurring.

• Mathematically, A and B are independent if and only if either condition holds

(a) ( ) ( ) ( )P A B P A P B∩ =

(b) ( | ) ( )P A B P A=

Example 4.3.5-1 The probability of a successful appendicitis operation is 98%. Find the probability that

(i) out of three operations, all are successful.

(ii) out of two operations at least one is unsuccessful.as

Solution: Assume that the outcomes of the operations are independent of each other.

71

4.4 Tree diagram and multiplication rule

• Recall from Section 4.3.4: ( )( | )( )

P A BP A BP B∩

= . We can rearrange the terms to

obtain ( ) ( | ) ( )P A B P A B P B∩ = ∗

This is also known as the multiplication rule.

• A tree diagram is useful to calculate probabilities involving experiments happening

in stages with multiple events happening one after another. For example, we want

to calculate the event whether a person smoke followed by whether he or she has

lung cancer.

• An example of tree diagram looks like this:

We can use the tree diagram to compute various probabilities such as ( ) ( )* ( | )P A B P A P B A∩ =

[ ] [ ]( ) ( ) ( ' )

( )* ( | ) ( ')* ( | ')P B P A B P A B

P A P B A P A P B A= ∩ + ∩

= + ∗

(i.e. add up all the “branches” leading to event B).

Example 4.4.1

A

'A

'B

B

B

'B

( )P A

( ')P A

( | )P B A

( ' | )P B A

( | ')P B A

( ' | ')P B A

72

15% of Singaporean adult smokes cigarettes. It is found that 62% of the smokers and

12% of non-smokers develop lung problem by age 60.

(a) Find the probability that a randomly selected 60-year adult has lung problem.

(b) Given that a randomly selected 60-year adult has lung problem, what is the

probability that he smokes?

Solution:

(a) ( )P =lung problem (b) ( )P smokes lung problem =

Example 4.4-2

smokes 0.15

0.85

0.62

0.38

0.12 Doesn’t smoke

Lung problem

No lung problem

Lung problem

No lung problem 0.88

73

Machine A, B and C makes components. Machine A makes 20% of the components,

machine B makes 30% of the components and machine C makes the rest. The

probability that a component is faulty is 0.07 when is made by machine A, 0.06 when

made by machine B and 0.05 when made by machine C. A component is picked at

random. Calculate the probability that the component is

(a) made by machine A and is faulty.

(b) made by machine B given that it is faulty.

Solution:

74

4.5 More probability examples (Self Practice) Example 4.5-1 The blood samples given by donors over one week were being catalogued according

to the types of blood, including the positive and negative Rhesus factor. The 2 by 4

matrix of Rhesus factor against the blood type is given below:

Blood Type

O A B AB Total

Rhesus

Factor

Positive 156 139 37 12 344

Negative 28 25 8 4 65

Total 184 164 45 16 409

Find the probability that a randomly selected donor has

(i) Atype blood,

(ii) positive Rhesus factor,

(iii) Atype blood and is positive Rhesus factor,

(iv) Otype blood or is negative Rhesus factor,

(v) Btype blood given that it is positive Rhesus factor,

(vi) ABpositive Rhesus factor given that it is type blood.

Solution:

Ans: (i) 164409

, (ii) 344409

, (iii) 139409

, (iv) 221409

, (v) 37344

, (vi) 34

75

Example 4.5-2 A, B and C are three random events. A Band are mutually exclusive , A and C are

independent. ( ) ( ) ( ) ( )1 1 7 23 5 10 15 60

P A P B P A C P B C= = = =, , or and or are given.

(a) Find ( ) P A Bor , ( )P C ,and ( ) .P B Cand

(b) State whether B and C are independent.

Solution: (a) ( )A B P A B⇒ ∩ =and are mutually exclusive

( ) P A B =or

( ) ( )* ( )A C P A C P A P C⇒ ∩ =and are independent

(b) ( and )P B C =

( )* ( )P B P C =

Ans: (a) ( ) 3 10

P A B =or , ( ) 13

P C = , ( ) 1 20

P B C =and , (b) not independent

Tutorial 4: Probability

76


1 A quiz has 3 true/false questions. Suppose you are randomly selecting the answers and have equal chance of being correct for each question. Let CCW indicate that you were correct on the first two questions and wrong on the third. (a) List the sample space. (b) List the possible outcomes with at least two questions answered

correctly. 2 A pair of unbiased die is tossed. Find the probability of getting

(i) a total of 7; (ii) at most a total of 6.

3 Given that 3 1( ) , ( )7 3

P A P B= = and 1( )9

P A B∩ = . Find

(i) ( ')P A (ii) ( )P A B∪ (iii) ( ' ')P A B∩ (iv) ( ')P A B∩ (v) ( | )P A B

4 A group of files in a medical clinic classifies the patients by gender and by type of diabetes (I or II). The groupings may be shown as follows. The table gives the number in each classification.

Type of Diabetes

I II Gender Male 25 20

Female 35 20

If one file is selected at random, find the probability that the individual is a (a) female. (b) Type II. (c) Type II, given that the patient is a male. (d) Are the events “Type II” and “a male” independent? (e) Are the events “Type I” and “a female” mutually exclusive?

5 A study showed that one out of every ten women will get breast cancer. Among

those who do, one out of four will die of it. (i) Complete the tree diagram below.

77

(ii) Calculate the probability that a randomly chosen woman get breast cancer and not die of it.


1 In a group of 10 persons, 4 have a type A personality and 6 have a type B personality. If two persons are selected at random from this group, what is the probability that the two will have different personality type?

2 If 3 books are picked at random from shelf containing 6 novels, 5 cook books and 1 computer book, what is the probability that (a) the computer book is selected? (b) 2 novels and 1 cook book are selected?

3 In a road show, the compere holds a bag containing 4 movie tickets and 6 concert tickets. 4 tickets are to be drawn at random and given away to 4 lucky winners on stage. Find the probability that (a) all 4 drawn are concert tickets. (b) 4 tickets are not of the same type. (c) at least 2 movie tickets are drawn.

4 Independent events A and B are such that ( ) ( )P A P B p= = and ( ) 59

P A B∪ = .

Find p and ( )P A B∩ .

5 Events A and B are such that 1( )3

P A = , 1( | )4

P B A = and 1( ' ')6

P A B∩ = . Find

(i) ( )P A B∪ ,

Has breast cancer

Does not have breast cancer

Dies from cancer

Does not die from cancer

110

78

(ii) ( )P B .

6 The probability that a family owns a car is 0.48, that it owns a 5-room flat is 0.35, and that it owns both a car and a 5-room flat is 0.21. What is the probability that a randomly selected family owns a car or a 5-room flat?

7 1000 people were randomly selected and they were asked whether they are right-handed or left-handed. The following table shows the result of the survey:

Men Women Left-handed 63 50 Right-handed 462 425

(a) A person is selected at random from the sample. Find the probability that

the person is (i) left-handed or a woman; (ii) right-handed or a man; (iii) not right handed given the person is a man;

(iv) right-handed woman. (b) Are the events “being right-handed” and “being a woman” mutually

exclusive? Explain. 8 Two thousand randomly selected adults were asked if they think they are better

off financially than their parents. The following table gives the two-way classification of the responses based on the education levels of the adults and whether they are financially better off, the same, or worse off than their parents.

Primary Secondary Tertiary Better off 140 450 420 Same 60 250 110 Worse off 200 300 70

Suppose one adult is selected at random from these 2000 adults. Find the probability that the adult is (i) better off and has secondary education, (ii) not the same financially, (iii) worse off or has primary education, (iv) not better off given secondary education.

79

9 The table below shows the results of a survey of the 120 cars in a carpark, in which the colour of each car and the gender of the driver were recorded.

Male Female Green 18 12 Blue 48 22 Red 6 14

One of the cars is selected at random.

M is the event that the car selected has a male owner.

G is the event that the car selected is green.

B is the event that the car selected is blue.

R is the event that the car selected is red.

Find the following probabilitites:

(i) ( )P M B∪ , (ii) ( | ')P M R . (iii) Determine whether the events M and G are independent, justifying your

answer.

10 A shipment of two boxes, each containing 6 calculators is received by a store. Box 1 contains one defective calculator and box 2 contains two defective calculators. After the boxes are unpacked, a calculator is selected and found to be defective. Find the probability that it came from box 2.

11 A certain virus infects 0.5 % of the population. A test will be positive 80% of the

time if the person has the virus and 5 % of the time if the person does not have the virus. Suppose A is the event “the person is infected” and B is the event “the person tests positive”.

(a) Draw a tree diagram to show the outcomes of the tests.

(b) Find the probability that

(i) the person is infected and is tested positive,

80

(ii) the person is tested positive.

12 Two children, Tan and Mui, are each to be given a pen from a box containing 3 red pens and 5 blue pens. One pen is chosen at random and given to Tan. A green pen is then put in the box. A second pen is chosen at random from the box and given to Mui.

(i) Draw a tree diagram to represent the possible outcomes.

(ii) Find the conditional probability that Mui’s pen is blue, given that Tan’s pen is red.

(iii) Find the probability that Mui’s pen is red.

(iv) Find the conditional probability that Tan’s pen is red, given that Mui’s pen is blue.

Answers

A1 i { CCC, CCW, CWC, WCC, CWW, WCW, WWC, WWW }

ii { CCC, CCW, CWC, WCC }

A2 i 16

ii 512

81

A3 i 47

ii 4163

iii 2263

iv 2063

v 13

A4 a 1120

b 25

c 49

d No e No

A5 ii 340

B1 815

B2 a 14

b 1544

B3 a 114

b 97105

c 2342

B4 13

; 19

B5 a 56

b 712

B6 0.62

B7 ai 269500

aii 1920

aiii 325

iv 1740

b No

B8 i 940

ii 79100

iii 77200

iv 1120

B9 i 4760

ii 3350

iii Independent

B10 23

B11 bi 0.004 bii 0.05375

B12 ii 58

iii 2164

iv 37

82

Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 5 : Discrete Probability Distribution

Objectives :

At the end of this lesson, the student should be able to: 1 define random variables 2 distinguish between discrete and continuous random variables 3 define a discrete probability distribution 4 compute the mean, variance and standard deviation of a discrete random

variable

83

Topic 5: Discrete Probability Distribution

5.1 Random variable

• A variable is an alphabetical representation of a quantity that can take on various numerical values.

• A random variable, usually denoted by X, Y, …, is a variable that takes on different values due to random phenomenon (or by chance).

• A random variable can be discrete or continuous (refer to Chapter 1).

Example 5.1-1

(a) A coin is tossed three times.

If X is a random variable representing the number of heads, then

0 1 2 3X = , , ,

There can be no head or 1 head or 2 heads or 3 heads in the three toss.

(b) Supposing Y is a random variable representing the time a sales person

spends on making calls per day.

The time spends on making calls can be any value (e.g. 2.4 minutes, 49.5

minutes, etc), Y is said to be continuous random variable.

The values of a continuous random variable can be represented as an interval

on a number line.

0 3 6 9 12 15 18 21 24

84

5.2 Discrete probability distribution

• We may not know the exact value of a random variable at any specific moment.

However we may calculate the likelihood (probability) that a random variable may

take a specific value.

• A probability distribution is a table or an equation that links each value of a

random variable with its probability of occurrence. The probability distribution of a

discrete random variable may be represented using a table.

Example 5.2-1

A fair coin is tossed twice. If X is a random variable representing the number of heads, then construct the probability distribution for X.

Solution:

( 0)P X = = 1 1( )2 2

P TT = × =

( 1)P X = =1 1( ) ( ) 22 2

P TH P HT+ = × × =

( 2) ( )P X P HH= = =1 12 2× =

A probability distribution must satisfy the following conditions:

(a) 0 ( ) 1P X k≤ = ≤ for all values of k ,

(b) ( ) 1all k

P X k= =∑ (sum of all probabilities is 1).

X k= 0 1 2

( )P X k=

85

Example 5.2-2

Explain whether each of the following is a discrete probability distribution function.

X k= 5 6 7 8

( )P X k= 116

58

14

116

X k= 1 2 3 4 ( )P X k= 0.09 0.36 0.49 0.05

Solution:

(a) It is a discrete probability distribution function since

(i) ( )____ ______P X k≤ = ≤

(ii) ( ) ( ) ( ) ( ) ( )8

55 6 7 8 _________

kP X k P P P P

=

= = + + + =∑

(b) It is not a discrete probability function since

( )4

1______ ______ ______ ______ 1

kP X k

=

= = + + + ≠∑

5.3 Mean and Variance of a discrete probability distribution

• In Chapter 1 (Section 1.3.1 and 1.4.2), we learnt to calculate the mean and

variance for a set of data values.

• In this chapter, we will learn to calculate the theoretical population mean µ and

population variance 2σ from a discrete probability distribution.

• The expectation of a random variable (or expected value) is the same as the

population mean.

(a)

(b)

86

( )kP X kµ = =∑

( )2 2 2k P X kσ µ = ⋅ = − ∑ or ( ) ( )22 k P X kσ µ= − =∑

Example 5.3-1

Find the mean, variance and standard deviation of the random variable in the following probability distribution:

X k= 1 2 3 4 5 ( )P X k= 0.16 0.22 0.28 0.20 0.14

Solution:

Mean, ( ) 1(0.16) 2(0.22) 3(0.28) 4(0.20) 5(0.14)kP X kµ = = = + + + + =∑

( )2 2 2 2 2 21 (0.16) 2 (0.22) 3 (0.28) 4 (0.20) 5 (0.14)k P X k= = + + + + =∑

Variance, 2σ = ( )2 2k P X k µ= − =∑

Standard deviation, σ =

87

Example 5.3-2

The random variable X represents the number of defective tires. The probability

distribution of X is given below:

k 0 1 2 3 4 ( )P X k= m 0.16 0.06 0.04 0.20

(a) Find the value of m . (b) Compute

(i) the expectation of X,

(ii) the standard deviation of the distribution.

Solution:

(a) For a probability distribution, 0.16 0.06 0.04 0.2 1m + + + + =

m =

(bi) ( ) ( )E X kP X k= = =∑

(bii) 2σ = ( )2 2k P X k µ= − =∑


Example 5.3-3

The following table shows the distribution of household sizes in a small town.

k 1 2 3 4 5 6

( )P X k= 0.266 0.330 0.166 0.140 0.064 0.034

(i) Show that the distribution is a probability distribution.

(ii) What is the expected size of a household in the town?

Solution:

(i) Since 0.266 0.330 0.166 0.140 0.064 0.034+ + + + + =

(ii) ( ) ( )E X kP X k= = =∑

88

Tutorial 5: Discrete Probability Distribution

1 Randomly selected households from a particular estate were asked on the number of children they have and the following frequency distribution shows the result of the survey:

Number of children 0 1 2 3 Households 300 280 95 20

(a) Construct a probability distribution table. (b) Let X denotes the number of children from the particular estate. Find the

following probabilities:

(i) P (X = 1) (ii) P (X ≥ 2)

(iii) P (X < 1) (iv) (1 3)P X≤ ≤

2 An electrical appliance company offers its customers a number of different instalment plans. Let the random variable X represents the number of instalments for a randomly selected customer and the probability distribution for X is given below:

x 6 12 24 36

( )P X x= 0.20 0.30 k 0.15

(a) Find the constant value, k.

(b) Find the mean of the distribution, X.

3 Let the random variable X be the number of errors that a randomly selected page of a book contains. The following table lists the probability distribution of X.

x 0 1 2 3 4 ( )P X x= 0.73 0.16 k 0.04 0.01

Find the value of k ; hence, find the mean and standard deviation of X.

89

4 A charity organisation is selling $4 raffle tickets as part of a fund-raising programme. The first prize is a computer valued at $3150, and the second prize is a vacuum cleaner valued at $450. The remaining 15 prizes are $25 gift vouchers. The number of tickets sold is 5000.

(a) Find the expected net gain to the player for one play of the game.

(b) Interpret your answer to part (a).

Answers

1(a)

No. of children 0 1 2 3

No. of households 60/139 56/139 19/139 4/139

(b) (i) 0.403 (ii) 0.165 (iii) 0.432 (iv) 0.568

2 (a) 0.35 (b) 18.6 months

3 k = 0.06; mean = 0.44; standard deviation = 0.852

4 $3.21−

90

Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/ H/J/M207 Topic 6 : Binomial and Poisson Distributions Objectives :

At the end of this lesson, the student should be able to: 1 list the conditions of a binomial experiment 2 explain the binomial probability function ( ) n x n x

xP x C p q −=

3 calculate the mean, variance and standard deviation of a binomial distribution

4 use the binomial probability distribution in problem solving

5 define Poisson random variable

6 list the conditions of the Poisson distribution

7 explain the Poisson probability function ( )!

xeP xx

λλ −

=

8 calculate the mean and variance of a Poisson distribution 9 use the Poisson probability distribution in problem solving

91

Topic 6: Binomial and Poisson Distributions 6.1 Binomial Distribution • Suppose we throw a die ten times and we want to create a probability distribution

for the number of times “6” appear. It is reasonable to make the following

assumptions:

(a) Each trial has only two outcomes: success or failure (a “6” or not a “6”).

(b) The outcome of each trial is independent of other trials. (The next throw is not affected by previous throws.)

(c) The probability of success, p, for each trial is the same

(e.g. 1( "6")6

p P= =get a ).

• In general if we conduct an experiment for n trials and the experiment satisfies the

conditions (a) to (c), then we can model the number of successes using a Binomial Distribution.

• Notation: Let X be denote the number of successes in a Binomial experiment with n trials

and p = probability of success for each trial. Then

~ ( , )X B n p

( )( ) 1 n kkn kP X k C p p −= = − , 0,1,2,...,k n=

** Note that X is a discrete random variable.**

92

Example 6.1-1 A certain surgical procedure has an 85 % chance of success. A doctor performs the

procedure on eight patients. The random variable X represents the number of

successful surgeries. State the distribution of X .

Solution: Let X be the number of successful surgeries out of 8 patients. n = 8, p = 0.85

~X

Example 6.1-2 A jar contains 5 red marbles, 9 blue marbles, and 6 green marbles. You randomly

select 3 marbles from the jar, without replacement. The random variable X

represents the number of red marbles. Explain whether X is a Binomial random

variable.

Solution: P ( first marble is red ) =

Since the probability of obtaining a red marble for each draw is

___________________

X is NOT a Binomial random variable. Example 6.1-3 Microfracture knee surgery has a 75 % chance of success on patients with

degenerative knees. The surgery is performed on three patients. Find the probability

of the surgery being successful on exactly two patients.

Solution: Let X be the number of successful surgery out of 3 patients.

~ ( , )X B

( 2)P X = =

93

Example 6.1-4 Childhood asthma is a public health problem in country A. It is known that one out of

10 children in country A has asthmatic problems. In a randomly chosen group of 14

children from the population, what is the probability that

(a) 3 has asthmatic problems?

(b) 1 or less has asthmatic problems?

(c) more than 1 has asthmatic problems?

Solution: Let X be the number of children with asthmatic problems out of 14 children.

~ ( , )X B

(a) ( 3)P X = =

(b) ( 1) ( 0) ( 1)P X P X P X≤ = = + = =

(c) ( 1) 1 ( 1)P X P X> = − ≤ =

6.2 Mean and Variance of a Binomial Distribution • For a Binomial random variable, ~ ( , )X B n p

Expectation or population mean, npµ =

Population Variance, 2 (1 )np pσ = −

94

Example 6.2-1 5 % of workers at construction sites are known to suffer from hearing impaired problem

due to the unhealthy noise level. If we randomly select 28 workers from construction

sites, find

(a) the probability that exactly 4 of them suffer from hearing impaired problem ,

(b) the mean and standard deviation of the number of workers suffering from

hearing impaired problem.

Solution: Let X be the number of workers with hearing impaired problems out of 28 workers.

~ ( , )X B

(a) ( 4)P X = =

(b) ( )E X =

2( ),Var X σ =


Example 6.2-2 The random variable X which follows a Binomial distribution is such that the mean is

2 and variance is 2413

. Find the values of n and p.

Solution: ~ ( , )X B

( ) 2E X np= = --- (1) 24( ) (1 )13

Var X np p= − = --- (2)

95

6.3 Poisson Distribution • Suppose in a country it is known that a cyclone will arrive at a rate of 1.5 times

every 2 years. We want to create a probability distribution on the number of times

a cyclone arrives in a specific time period. It is reasonable to make the following

assumptions: (a) The mean rate of events, µ , occurring in an unit interval / region is the same

for every other unit interval / region.

(E.g. Mean rate of cyclones arriving is the same across any interval of 2 years.).

(b) Events occurring in an interval / region are independent of events occurring in

other non-overlapping intervals/ regions. (E.g. The number of cyclones in year 2013 to 2014 is independent of the number

of cyclones in year 2011 to 2012.) (c) No two events can occur at the same time.

(E.g. we assume that no two cyclones can happen together.)

• In general if we are counting the number of events occurring in an interval / region

and conditions (a) to (c) are satisfied, we can model the number of events occurring

using a Poisson Distribution.

• Notation: Let X be denote the number of events occurring in an interval / region with a mean

rate µ . Then

~ ( )X Po µ

( )!

k

P X k ek

µ µ−= = , 0,1,2,...k =

** Note that X is a discrete random variable.**

96

Example 6.3-1 The mean number of accidents per month at a certain intersection is 3. What is the

probability that in any given month,

(a) 4 accidents will occur at this intersection?

(b) more than 1 accidents will occur at this intersection?

Solution: Let X be the number of accidents per month.

~ ( 3 )X Po

(a) 4

3 3( 4)4!

P X e−= = =

(b) ( 1) 1 ( 0) ( 1)P X P X P X> = − = − = =

Example 6.3-2 2000 brown trout are introduced into a small lake. The lake has a volume of 20000

cubic meters. Find the probability that

(a) 3 brown trout are found on any given cubic meter of the lake.

(b) less than 2 brown trout are found on any 10 cubic meters of the lake.

Solution: Let X be the number of trouts per cubic meter of lake.

~ ( )X Po

(a) ( 3)P X = =

(b) Let Y be the number of trouts per 10 cubic meters of lake.

~ ( )Y Po

( 2) ( 0) ( 1)P Y P Y P Y< = = + = =

97

6.4 Mean and Variance of Poisson Distribution

• For a Poisson random variable, ~ ( )oX P µ

Expectation or population mean µ=

Population Variance, 2σ µ=

Example 6.4-1 A school “Lost and Found” department receives an average of 3.7 reports per week of

lost student ID cards.

(a) Find the probability that at most 2 such reports will be received during a given

week by this department.

(b) Find the probability that there will be 1 to 3 (inclusive) such reports received

during a given week by this department.

(c) Find the variance and standard deviation of the probability distribution.

Solution: Let X be the number of reports per week.

~ ( )X Po

(a) ( 2) ( 0) ( 1) ( 2)P X P X P X P X≤ = = + = + = =

(b) (1 3) ( 1) ( 2) ( 3)P X P X P X P X≤ ≤ = = + = + = =

(c) ( ) ( )E X Var X= =

98

Appendix Binomial and Poisson Distributions using Excel • Under the tab “Formulas” “More Functions” “Statistical” there are 2 options to

calculate probabilities for Binomial and Poisson distributions.

(a) BINOM.DIST: For Binomial distribution.

(b) POISSON.DIST: For Poisson distribution.

99

• To compute ( 4)P X = and ( 4)P X ≤ , given that ~ (6,0.3)X B . Select BINOM.DIST

For ( 4)P X = : Key “FALSE” under the “CUMULATIVE” option.

For ( 4)P X ≤ : Key “TRUE” under the “CUMULATIVE” option.

( 4) 0.0595P X = = , ( 4) 0.989P X ≤ =

• To compute ( 4)P X = and ( 4)P X ≤ , given that ~ (1.8)X Po . Select POISSON.DIST

Under the option “MEAN”, enter 1.8 . The rest are the same as the Binomial

distribution shown above.

100

Tutorial 6: Binomial and Poisson Distributions


A.1 Binomial Distribution

1 Given that ~ (10,0.3)X B , calculate the following:

(i) mean, µ (ii) variance, 2σ

(iii) ( 4)P X = (iv) ( 2)P X ≤

(v) ( 2)P X < (vi) ( 2)P X ≥

(vii) ( 8)P X >

2 If X~B 4,5

n

and ( ) 1 015625

P X = = ,

(a) How many outcomes are there in each trial?

(b) How many trials are there?

(c) How many possible values that X can take.

(d) Find the mean and standard deviation of this distribution.

A.2 Poisson Distribution

3 Given that ~ (3)X Po , calculate the following:

(i) mean, µ (ii) variance, 2σ

(iii) ( 4)P X = (iv) ( 2)P X ≤

(v) ( 3)P X ≥

4 The number of calls arriving, X, is Poisson distributed with a rate of 2 per hour. Write the distribution of the number of calls arriving in

(i) 3 hours, (ii) 45 minutes.

101


1 10% of drivers do not wear seat-belts. Find the probability that, in the next 10 cars to pass, less than 2 drivers will not be wearing seat-belts.

2 A telephone enquiry service is so busy that only 80% of calls to it are

successfully connected. It may be assumed that all calls are independent. Twelve calls are made at random to the service. Find the probability that at least 10 are successfully connected.

3 The probability of a patient recovering from a heart operation is 0.85. In a particular hospital, 10 patients went through such an operation in a particular month. What is the probability that

(i) exactly 4 survive the operation? (ii) the actual number of survivors is more than the expected value? (iii) exactly 2 do not survive the operation?

4 Coach A has four wheels and equipped with two spare tires, and coach B has six wheels and equipped with three spare tires. These coaches travel from town A to town B independently. The probability that a tire needed to be replaced during the journey is 0.1.

(i) State an assumption required for the Binomial distribution to be a suitable model.

(ii) Determine whether coach A or coach B has the higher probability for a successful journey.

5 On average, a household receives 1.8 junk mails per day. Using the Poisson

formula, find the probability that a randomly selected household receives

(a) exactly 3 junk mails on a certain day,

(b) at most 2 junk mails on a certain day.

6 A budget airline receives an average of 9.7 complaints per day from its passengers. Using the Poisson formula, find the probability that on a certain day this airline will receive

(a) exactly 5 complaints.

(b) at least 3 complaints.

102

7 A customer service department receives an average of 1.6 telephone calls in any 10-minute interval. Find the probability that the department receives

(a) no calls in any 10-minute interval.

(b) at most 1 calls in any 5-minute interval.

(c) more than 2 calls in any 15-minute interval.

Answers

A1 i 3, ii 2.1, iii 0.200, iv 0.383, v 0.149

vi 0.851 vii 0.000144

A 2 a 2 b 6 c 7 d 4.8, 0.980

A 3 i 3 ii 3 iii 0.168 iv 0.423 v 0.577

A 4 i ~ (6)X Po ii 6~4

X Po

B 1 0.736 B2 0.558

B 3 i 0.00125 ii 0.544 iii 0.276

B 4 i independent ii Coach B

B 5 a 0.161 b 0.731

B 6 a 0.0439 b 0.996

B 7 a 0.202 b 0.809 c 0.430

103

Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 7 : Normal Distribution

Objectives :

At the end of this lesson, the student should be able to: 1 describe the characteristics of a normal distribution including its shape and the

relationship among its mean, median and mode 2 define normal random variable and standard normal random variable 3 compute normal probabilities using standard normal tables 4 use the normal probability distribution to approximate the binomial probabilities

(including correction for continuity)

104

Topic 7: Normal Distribution

7.1 Introduction

• Many continuous random variables can be modelled using the normal distribution. Examples include:

(a) Students’ examination scores,

(b) Height and weight of people. • The normal distribution can be described using two features: mean µ and

variance 2σ . Notation: 2~ ( , )X N µ σ .

• The normal distribution can be represented using a bell – shaped curve with the

properties:

o The curve is symmetrical about the mean.

o The mean, median, mode are the same.

o Approximately 95% of the distribution lies within 2 standard deviations of the mean. This is sometimes known as the ‘2σ rule’.

Mean, median, mode

~ (1,1)X N ~ (2,6)X N

1µ = 2µ =

105

7.2 Probability for normal distribution

• For normal distribution, the probability is interpreted as area under the curve. For example ~ (1, 2)X N :

**Note that ( ) 0P X k= = . Hence for normal distribution, ( ) ( )P X k P X k≤ = < .**

7.3 Standard normal random variable

• A normal random variable can have many different values of mean and variance. When the mean is 0 and variance is 1, we call it a standard normal random variable, denoted as Z.

~ (0,1)Z N

• To convert from 2~ ( , )X N µ σ to ~ (0,1)Z N , we apply the formula: XZ µσ−

=

This procedure is also known as standardization.

( 1.5)P X < ( 0.5)P X >

1µ = 1.5 1µ =0.5

(0.2 1.8)P X< <

1µ = 1.80.2

( 0.7)P X =

1µ =0.7

106

Example 7.3-1

Given that ~ (2,5)X N , rewrite the following probabilities in the form ( ).P Z k≤

(a) ( 3)P X ≤ , (b) ( 1.5)P X ≥ , (c) (1.5 3)P X< <

Solution:

(a) ( )2 3 2( 3) 0.455 5

XP X P P Z− − ≤ = ≤ = ≤

(b) ( 1.5)P X ≥ =

(c) (1.5 3) ( 3) ( 1.5)P X P X P X< < = < − ≤ =

7.4 Standard Normal Table

• To calculate probabilities involving normal distribution, we will obtain the probability value via the standard normal table (on pages 112 and 113).

Step 1: Apply standardization from 2~ ( , )X N µ σ to ~ (0,1)Z N .

Step 2: Ensure the probability is expressed in the form ( ).P Z k≤

Step 3: Obtain the required probabilities’ value from the standard normal table.

• The following example illustrates how the standard normal table is to be read:

(a) Suppose we want to find ( 0.52)P Z ≤ :

( 0.52) 0.6985P Z ≤ =

1st decimal place

2nd decimal place

Probability value

107

(b) Suppose we want to find the value of k such that ( ) 0.0020P Z k≤ = :

2.88k = −

Example 7.4-1 Let ~ (0,1)Z N . Use the standard normal table on pages 112 and 113 to evaluate the

following probabilities:

(a) ( 0.99)P Z < − , (b) ( 1.06)P Z > , (c) ( 1.5 1.25)P Z− < < −

Solution: (a)

(b)

(c)

Example 7.4-2 Given the normally distributed variable X with mean 20 and standard deviation 4, find

(a) ( 28)P X >

(b) (17.5 22.5)P X< <

(c) the value of k such that ( ) 0.1539P X k> =

Probability value

1st decimal place

2nd decimal place

108

Solution:

(a) Step 1: Switch the inequality sign to " "< or " "≤

( )( 28) 1 28P X P X> = − ≤

Step 2: Convert to standard normal random variable, Z

( ) 20 28 20( 28) 1 28 1 1 ( 2)4 4

XP X P X P P Z− − > = − ≤ = − ≤ = − ≤

Step 3: Obtain the probability value from standard normal table

( ) 20 28 20( 28) 1 28 1 1 ( 2)4 4

1 0.9772 0.0228

XP X P X P P Z− − > = − ≤ = − ≤ = − ≤

= − =

(b) (17.5 22.5) ( 22.5) ( 17.5)P X P X P X< < = < − ≤ =

(c) ( ) 0.1539 1 ( ) 0.1539 ( ) 0.8461P X k P X k P X k> = ⇒ − ≤ = ⇒ ≤ =

20 0.8461

4kP Z − ⇒ ≤ =

From standard normal table,

20 1.02 24.084

k k−= ⇒ =

109

Example 7.4-3 The serum cholesterol levels of a certain population of 40-year-olds male adults follow

approximately a normal distribution with mean 185 mg/dl and standard deviation 36

mg/dl. If a 40-year-old male adult is chosen at random from this population, what is the

probability that he has serum cholesterol level

(a) greater than 195 mg/dl ?

(b) less than 178 mg/dl ?

(c) between 178 and 195 mg/dl ?

Solution: Let X be the cholesterol levels of a 40 year old male

2~ (185,36 )X N

(a) ( ) 195 185( 195) 1 195 136

P X P X P Z − > = − ≤ = − ≤

=

(b) 178 185( 178)

36

P X P Z − < = ≤

=

(c) (178 195) ( 195) ( 178)P X P X P X≤ ≤ = ≤ − < =

110

Example 7.4-4 The weights of a certain batch of obese male recruits are approximately normally

distributed with mean 88 kg and standard deviation 9. The lightest 15% of the recruits

receive a classification of A whilst the heaviest 12.5% receive a classification of F.

Find

(i) the minimum weight required to obtain a classification of F,

(ii) the weight of the heaviest recruit in classification A.

Solution: Let X be the weight of a obese male recruit.

2~ (88,9 )X N

(i) Let m be the minimum weight to be in classification F.

( ) 0.125 ( ) 0.875 P X m P X m≥ = ⇒ < =

⇒

(ii) Let k be the largest weight to be in classification A.

( ) 0.15P X k≤ = ⇒

111

Standard Normal Table

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 -3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002 -3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003 -3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005 -3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007 -3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010 -2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014 -2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019 -2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026 -2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036 -2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048 -2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064 -2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084 -2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110 -2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143 -2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183 -1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233 -1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294 -1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367 -1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455 -1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559 -1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681 -1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823 -1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985 -1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170 -1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379 -0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611 -0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867 -0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148 -0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451 -0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776 -0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121 -0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483 -0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859 -0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247 0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641

z z 0

112

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

z z 0

113

Appendix A Normal approximation to Binomial • Suppose ~ (100,0.45)X B and we wish to calculate ( 60)P X ≤ . If we use the

Binomial distribution then we have to calculate ( ), 0,1, 2,...,60P X k k= = before

adding up the probabilities. This is quite tedious and hence we may want to use

some approximation methods instead.

• Given ~ ( , )X B n p , if 30, 5, (1 ) 5n np n p> > − > then

~ ( , (1 ))X N np np p− approximately.

• Since Binomial random variable is discrete but a normal random variable is

continuous, we will need to do some adjustment to the calculation of the

probabilities known as continuity correction.

Step 1: Rewrite the probabilities into the form ( )P X k≤ , when ~ ( , )X B n p .

Step 2: Using the approximate normal distribution, calculate the probability when

we add 0.5 to k, i.e. using ~ ( , (1 ))X N np np p− , calculate ( 0.5)P X k≤ + .

Example A-1 A Binomial random variable is given by ( ) ~ 50 0.45X B , .

(a) State reasons why normal distribution can be used as an approximation.

(b) Find

(i) ( )14P X ≤

(ii) ( )26P X >

(iii) ( )15 26P X≤ ≤

114

Solution:

(a) Since n = 50 is large, 50*0.45 22.5 5np = = > , ( )1 50*0.55 27.5 5n p− = = > ,

X can be approximated using normal distribution.

(bi) mean of X = np = 22.5, variance of X = np(1 – p) = 12.375

( ) ~ 22.5 12.375X N , approximately

( ) ( ). 14.5 22.514 14.5

12.375

c cP X P X P Z − ≤ ≈ < = < =

(bii) ( ) ( ) ( ).

26 1 26 1 26.5c c

P X P X P X> = − ≤ ≈ − < =

(biii) ( )15 26 ( 26) ( 14)P X P X P X≤ ≤ = ≤ − ≤ =

Ans: (bi) 0.0116, (bii) 0.1271, (biii) 0.01155

115

Example A-2 Sing-Chip produces computer chips. On average, 2% of all computer chips produced

are defective. In a sample of 500 chips, the quality-control inspector accepts the batch

if less than 1% of the chips tested are defective.

(i) Explain why the number of defective computer chips, X can be approximated by

a normal distribution. Hence determine the mean and standard deviation of X

(ii) Use the normal approximation of X , find the probability that a batch is accepted.

Solution: (i) Let X be the number of defective chips.

Since n = 50 is large, 500*0.02 10 5np = = > , ( )1 500*0.98 490 5n p− = = > ,

X can be approximated using normal distribution.

Mean = 500*0.02 10np = = , variance = (1 ) 9.8np p− = ⇒

standard deviation = 9.8

(ii) ~ (10,9.8)X N approx.

Batch is accepted if there are less than 1% 500 5∗ = defects.

( ) ( ).

( 5) 4 4.5c c

P X P X P X< = ≤ ≈ < =

Ans: (i) 10, 3.13µ σ= = , (ii) 0.0392

~ (500,0.02)X B

116

Appendix B Normal Distribution using Excel • Under the tab “Formulas” “More Functions” “Statistical” there are 2 options

related to normal distribution.

When 2~ ( , )X N µ σ :

(a) NORM.DIST: Calculate probability value ( )P X k≤ , with k known.

(b) NORM.INV: Given the value of the probability ( )P X k≤ , find k.

• To compute ( 4)P X ≤ , given that ~ (3,5)X N . Select NORM.DIST

( 4) 0.673P X ≤ =

117

• To find the value of k such that ~ (3,5)X N and ( ) 0.388P X k≤ = , select

NORM.INV.

2.36k =

118

Tutorial 7: Normal Distribution


1 Let Z be a standard normal random variable. Use the normal table provided to find:

(i) ( 2.11)P Z < (ii) ( 0.35)P Z < −

(iii) ( 1.02)P Z > (iv) ( 0.99)P Z > −

(v) ( 0.35 2.11)P Z− < < (vi) ( 1.02 0.35)P Z Z> < −or

2 Given that ~ (3, 4)X N , use the normal table provided to find:

(i) ( 1)P X < (ii) ( 4)P X ≤

(iii) ( 0.5)P X > (iv) ( 3.5)P X ≥

(v) (1 4)P X≤ < (vi) ( 1 or 3.5)P X X< ≥

3 Let Z be a standard normal random variable, find m such that

(i) ( ) 0.9082P Z m< = (ii) ( ) 0.0096P Z m> =

4 Given that ~ (3, 4)X N , find m such that:

(i) ( ) 0.6217P X m≤ = (ii) ( ) 0.7734P X m> =

119


1 The brain weights of a certain population of 18-year olds follow a normal distribution with mean 1380 gm and standard deviation 80 gm. Suppose an 18-year old is chosen at random, find the probability that the person’s brain weight is

(i) less than 1300 gm, (ii) more than 1400 gm, (iii) between 1320 and 1420 gm.

2 The random variable X has the distribution (1, 20)N . Find a such that ( ) 2 ( )P X a P X a< = > .

3 The masses of articles are normally distributed such that 4.36% are under 30

kg and 6.3% are over 60 kg. Calculate the mean and standard deviation of the distribution.

4 A recent survey on a group of adults shows that the average daily calories

intake of an adult is normally distributed with mean 1380 calories and standard deviation 320 calories. (i) Find the probability that an adult chosen at random from this group

consumes less than 1000 calories per day. (ii) What should be the recommended daily caloric intake if 90% of the group

has average daily calories below this recommended daily intake? (iii) If 12 000 adults participated in the survey, find the expected number of

people, to the nearest integer, to consume more than 1200 calories per day?

5 The mass, in kilograms, of an apple sold in a supermarket has a normal

distribution with mean 0.15 and standard deviation 0.03. Suppose the apples are sold at $9 per kilogram, find

(i) the probability that a single apple cost between $1.30 and $1.50;

(ii) the minimum price set for an apple such that the probability of an apple being sold for less than this minimum price is at least 0.9.

120

Answers

A1 i 0.9826 ii 0.3632 iii 0.1539 iv 0.8389

v 0.6194 vi 0.5171

A2 i 0.1587 ii 0.6915 iii 0.8944 iv 0.4013

v 0.5328 vi 0.560

A3 i 1.33 ii 2.34

A4 i 3.62 ii 1.50

B1 i 0.1587 ii 0.4013 iii 0.4649

B2 a = 2.92

B3 45.8, 9.26µ σ= =

B4 i 0.1170 ii 1789.6 iii 8548

B5 i 0.2876 ii $1.70

121

Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science

Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 8 : Distribution of sample means

Objectives :

At the end of this lesson, the student should be able to: 1 identify distribution of sample means 2 apply the Central Limit Theorem to find the probability of a sample mean for

sufficiently large samples

122

Topic 8: Distribution of Sample Means

8.1 Introduction of X

• In Chapter 1 we introduced the concept of sample mean, x , for a single sample

data set. In this case x is a single value. In this chapter we will look at the

sample means of multiple samples and the distribution of the sample means.

• Introduction example:

Let X denote the weight of a single peanut from a packet of peanuts. Suppose we

weigh each peanut in that packet, we can calculate the sample mean of the weight

of the peanuts, x , in that single packet.

1 packet:

Suppose we have n packets of peanuts and we will calculate the sample mean of

each packet of peanuts. The sample mean for a packet of peanut will most likely

be different from the sample mean of other packets.

Hence in general, the sample mean, X , is a random variable (as we cannot

determine the actual value of x for a randomly chosen sample.)

x

nx1x 2x ,...,,

123

8.2 Distribution of X

• Since X is a random variable, we can calculate probabilities involving X once

we know its distribution which is shown below:

• Let µ and 2σ be the mean and variance of a random variable X (single quantity).

If we have a sample of n objects,

(a) If 2~ ( , )X N µ σ , then 2

~ ,X Nnσµ

.

(b) If the distribution of X is not normal (or unknown), and 30n ≥

2

~ ,X Nnσµ

approximately.

This is known as the Central Limit Theorem.

• From above, we see that the mean of X µ= , variance of2

Xnσ

= .

• Standard deviation of Xnσ

= is also known as the standard error of the mean.

Example 8.2-1 The mass of garlic bulbs produced by a particular farm is approximately normally

distributed, with a mean of 60 g and a standard deviation of 5 g. State the distribution

of the sample mean of a random sample of 16 garlic bulbs.

Solution:

Let X be the mass of a garlic bulb. ( )2~ 60,5X N

The sample mean of 16 garlic bulbs, ( )~ , X N

Example 8.2-2

124

The waistline of forty-year-old male Singaporeans is known to have a mean of 33

inches and a variance of 9 square inches. A random sample of 36 forty-year-old male

Singaporeans was selected. Find the probability that the sample mean

(a) is greater than 31.5 inches,

(b) lies between 32 and 34 inches,

(c) differs from the population mean by more than one inch.

Solution: Let X be the waistline of a forty year old male.

36n =

Sample mean of 36 male, 9 1~ 33, ~ 33,

36 4X N approx X N approx ⇒

(by CLT)

a) ( ) ( ) 31.5 3331.5 1 31.5 11/ 4

P X P X P Z − > = − ≤ = − ≤

=

b) ( ) ( ) ( ) 34 33 32 3332 34 34 321/ 4 1/ 4

P X P X P X P Z P Z− − ≤ ≤ = ≤ − < = ≤ − <

= c) ( ) ( )32 or 34 1 32 34P X X P X< > = − ≤ ≤ = Example 8.2-3

125

The body length (excluding the tail) of a particular species of mice is approximately

normally distributed, with a mean of 12 cm and a standard deviation of 2.4 cm.

(a) If a random sample of 16 mice is selected, what is the probability that it will have

an average body length of between 11 and 13 cm?

(b) If a random sample of 25 mice is selected, what is the probability that it will have

an average body length of between 11 and 13 cm?

(c) Comment on the answers obtained in part (a) and (b).

Solution:

Let X be the body length of a mouse. 2~ (12,2.4 )X N

a) 16n =

Sample mean of 16 mice, ( )22.4~ 12, ~ 12, 0.36

16X N X N

⇒

( ) ( ) ( ) 13 12 11 1211 13 13 110.36 0.36

P X P X P X P Z P Z− − ≤ ≤ = ≤ − < = ≤ − <

=

b) 25n =

Sample mean of 25 mice, ( )22.4~ 12, ~ 12, 0.2304

25X N X N

⇒

( ) ( ) ( ) 13 12 11 1211 13 13 110.2304 0.2304

P X P X P X P Z P Z− − ≤ ≤ = ≤ − < = ≤ − <

=

c) The required probability becomes ___________ when sample size

___________.

126

Tutorial 8: Distribution of Sampling Means


1 Let 1 2, ,..., nX X X be independent random variables. Write down the mean and

variance of X for each of the following:

(i) n = 15, mean of X = 4, variance of X = 7.

(ii) n = 30, mean of X = 5, standard deviation of X = 3.

2 Let 1 2, ,..., nX X X be independent random variables with mean 3 and variance

5. Write down the distribution of X (with explanation if necessary) when:

(i) iX ’s are normal, n = 10, (ii) iX ’s are normal, n = 60,

(iii) Distribution of iX ’s are unknown, n = 35.

3 Calculate ( )4P X < for Question 2(iii).


1 In a certain population of swordtail fish, the lengths of the individual fish follow approximately a normal distribution with mean 52.0 mm and standard deviation of 6.0 mm. Find the probability that a random sample of 25 swordtail fishes will have an average length of

(i) less than 48.6 mm

(ii) between 52.4 and 54.4 mm.

2 According to an article, root-canal therapy costs from $200 to $700. Suppose the mean cost for root-canal therapy is $450 and the standard deviation is $125. If a sample of 100 dentists was selected across the country, find the probability that the mean cost per root canal for the sample would fall between $425 and $475.

127

3 The average number of days spent in a particular hospital for a coronary bypass in 2013 was 9 days and the standard deviation was 4 days. What is the probability that a random sample of 30 patients will have an average stay longer than 9.5 days? State any assumptions required on the distribution on the days spent.

4 The intelligence quotient (IQ) score of a certain population of children is

approximately normally distributed with a mean of 102 and a standard deviation of 10. Let Y be the random variable ‘the IQ score of children’.

(i) If a random sample of n children is selected, find the value of n given

that ( 103) 0.3446P Y > = .

(ii) Using the value of n found in part(i), find the value of k if ( 105) 0.6730P k Y< < = .

5 The heartbeat rate of a certain population of babies follows a normal distribution with mean 70 beats/min and standard deviation of 10 beats/min.

(i) Find the probability that a baby randomly selected from this population has a heartbeat rate of less than 66 beats/min.

(ii) If a sample of 8 babies is randomly selected, find the probability 3 of them will have a heartbeat rate of less than 66 beats/min

(iii) If a random sample of 36 babies is selected, what is the probability that it will have a mean heartbeat rate of more than 68 beats/min.

6 The masses of Giant apple follow a normal distribution with mean 700 g and standard deviation of 100 g.

(i) Find the probability that the total mass of 10 Giant apples will be more than 7.2 kg.

(ii) A random sample of n apples is chosen. Find the least value of n such that there is a probability of not more than 0.25 that the sample mean differs from its mean mass by more than 20 g.

128

Answers

A1 i mean = 4 , variance = 715

, ii mean = 5 , variance =

310

.

A2 i 1~ 3,2

X N

ii 1~ 3,

12X N

iii 1~ 3, approx7

X N

A3 0.9960

B1 i 0.0023 ii 0.3479

B2 0.9544

B3 0.2483

B4 i n = 16 ii k = 100

B5 i 0.3446 ii 0.2771 iii 0.8849

B6 i 0.2643 ii least n = 34

129

Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/HJ/M207 Topic 9 : Estimation of Parameters

Objectives :

At the end of this lesson, the student should be able to: 1 calculate the point estimators of population parameters 2 construct and interpret confidence intervals for the population mean using the

appropriate distributions (standard normal or t-distribution) 3 explain how the confidence interval is related to sample size and confidence

level

130

Topic 9: Estimation of Parameters

9.1 Estimation of the Population Mean

• It is common that we do not know the population mean for a random variable

we are interested in. Using the example in Chapter 8, it would be impossible for us

to determine the population mean of all the peanuts in this world.

• Hence a common approach is to take a sample and use the information from it to

estimate the population mean.

Example 9.1-1 The following set of data is the heights (in cm) of 16 children:

101 118 125 116 113 102 117 126

106 114 109 121 107 119 116 115

Find a point estimate of the mean height, µ of all the children (the population).

Solution:

Definition: A point estimate is a single value estimate for a population parameter. The most unbiased point estimate of the population mean µ is the sample mean x

131

9.2 Confidence Interval for the Mean (Large Samples)

• In example 9.1-1, the probability that the population mean height of the children is

exactly 114.0625 cm is virtually nil. So instead of using a point estimate to estimate

µ to be exactly 114.0625 cm, we can estimate that µ lies in an interval.

Although the point estimate in example 9.1-1 is not equal to the actual population

mean, it is probably close to it. To form an interval estimate, use the point estimate

as the centre of the interval, then add and subtract a margin of error.

If the margin of error is 3.95, then the interval estimate in example 9.1-1 will be

computed as 114.0625 3.95± or 110.1 118.0µ< < .

• Before finding a margin of error for an interval estimate, we must first determine

how confident we need to be that our interval estimate contains the population

mean µ.

For example,

Definition: An interval estimate is an interval, or a range of values, used to estimate a population parameter.

Definition: A confidence level 100c % refers to the percentage of the intervals from all possible samples that we can expect to contain the true population mean.

The diagram shows that there are 10 intervals obtained from 10 samples. If it is a 90% confidence interval, then it is expected that 9 out of 10 intervals contain the population mean.

132

• When the sample size is large, i.e. 30n ≥ , by Central Limit Theorem, the sampling distribution of sample means is a normal distribution. The level of confidence 100c % is the area under the standard normal curve between the critical values,. cz− and

cz .

•

Example 9.2-1

Find the critical values cz necessary to form a confidence interval at the following given level of confidence: (i) 80%, (ii) 85%, (iii) 97%.

Solution:

100 %c

cz

( )100 100%

2c−

0cz−

133

• Given a level of confidence 100c %, the margin of error, E is the greatest possible

distance between the point estimate and the value of the parameter it is estimating.

E is also known as the maximum error of estimate or error tolerance.

• The margin of error, E can be calculated as follows:

•

cE znσ =

or c

sE zn

=

If the population standard deviation, σ is known or when 30n ≥ , the sample

standard deviation, s is used in place of σ .

Example 9.2-2

Find the margin of error for the given values of c, s and n.

(i) 0.90, 2.5, 36c s n= = = ;

(ii) 0.95, 3.0, 60c s n= = = ;

(iii) 0.975, 4.6, 100c s n= = =

Solution:

Note: In general, the margin of error decreases as the sample size increases.

134

• Using a point estimate and a margin of error, an interval estimate for a population

parameter such as µ can be constructed. This interval estimate is called a

confidence interval. • Hence, a 100c % confidence interval for the population mean µ is given as:

x E x Eµ− < < + or ( ),x E x E− +

where the probability that the confidence interval contains µ is 100c %.

• Steps for constructing a confidence interval for a population mean ( 30n ≥ or σ is

known with a normally distributed population) are:

1. Find the sample statistics n and 1x xn

= ∑ .

2. Specify σ if known. Otherwise, if 30n ≥ , find the sample standard deviation

( )211

s x xn

= −− ∑ and use it as an estimate for σ .

3. Find the critical value cz that corresponds to the given level of confidence.

4. Find the margin of error, cE znσ =

.

5. Form the confidence interval; ( ),x E x E− + .

135

Example 9.2.3

After a few rainy days, numerous tadpoles appeared on a wet field. 12 tadpoles were randomly picked and their lengths measured. It is found that the sample mean is 11.1 mm. If this sample came from a normally distributed population with variance 4, calculate a 95% confidence interval for the mean length of all the tadpoles in the field.

Solution:

( )0.95 0.950.975P Z z z≤ = ⇒ =

95% confidence interval = 0.95 0.95,x z x zn nσ σ − + =

Example 9.2-4 Fifty 2-year-old cows were injected with an antibiotic A, at a dosage of 12 mg/kg body

weight. It is found that the sample mean of the blood serum concentrations ( mlg /µ )

of the antibiotic 2 hrs after injection is 25.5 and the sample standard deviation is 3.03.

Construct a 90% confidence interval for the population mean.

Solution:

( )0.90 0.900.95P Z z z≤ = ⇒ =

90% confidence interval = 0.90 0.90,s sx z x zn n

− + =

95 %2.5 %

0.95z

90 %5 %

0.90z

136

Example 9.2-5 A random sample of 150 readings was taken from a population with mean µ and

variance 2σ . Given that 1623=Σx and 36.178142 =Σx ,

(a) calculate x and s .

(b) construct a 95 % confidence interval for the population mean.

Solution:

a) Recall x

xn

= =∑ , ( )2

2 211

xs x

n n

= − =

−

∑∑

b)

( )0.95 0.950.975P Z z z≤ = ⇒ =

95% confidence interval = 0.95 0.95,s sx z x zn n

− + =

9.3 Confidence Interval for the Mean (Small Samples) • In many real-life situations, the population standard deviation is unknown.

Moreover, due to constraints such as cost and time, it is often not practical to collect

samples of size 30 or more. If the random variable is normally or approximately

normally distributed, we can use a t-distribution.

• When X is a normal random variable, with the population standard deviation, σ

unknown, the random variable T

XT sn

µ−=

follows a t – distribution with degrees of freedom, d.f. = n – 1.

137

• The value of ct can be obtained from the t – distribution table.

• For example, if 7n = and we want to construct a 95% confidence interval, we can

obtain the value of 0.95t as follows:

d.f. 7 1 6= − =

0.95 2.447t∴ =

Example 9.3-1 Twelve packets of a particular brand of sweets are selected at random and their

weights noted. The weights obtained (in grams) are

407.3, 409.6, 391.0, 402.9, 406.8, 390.0, 407.6, 402.1, 390.8, 390.6, 396.8, 400.2.

Assuming that the sample is taken from an approximately normal population with

mean massµ , calculate

(a) the 95% confidence interval for µ ,

(b) the 99% confidence interval for µ .

Solution:

Using calculator, x = , s = , df 1n= − =

a) Since sample size, 30n < and population variance unknown, 0.95t =

95% confidence interval is 0.95 0.95,s sx t x tn n

− + =

138

b) Since sample size, 30n < and population variance unknown, 0.99t =

99% confidence interval is 0.99 0.99,s sx t x tn n

− + =

9.4 Minimum Sample Size to Estimate Population Mean µ • Sometimes we will need to determine the sample size required before we conduct

an experiment. Given a pre-determined confidence level 100c % and margin of error,

2

cznEσ =

or

2cz snE

=

or 2

ct snE

=

Example 9.4-1 You want to estimate the mean number of sentences in a magazine advertisement.

How many magazine advertisements must be included in the sample if you want to be

95% confident that the sample mean is within one sentence of the population mean?

Assume that the population standard deviation is 5.0 and the number of sentences is

normally distributed.

Solution:

Given E = 1, σ = 5.0, & 0.95z =

Hence the number of advertisements required in the sample is at least :

2

cznEσ = =

139

In summary,

YES

NO

NO

YES

NO

YES

140

Table 2: t – Distribution

Level of

confidence, c 0.50 0.80 0.90 0.95 0.98 0.99

One tail, α 0.25 0.10 0.05 0.025 0.01 0.005 d.f. Two tails, α 0.50 0.20 0.10 0.05 0.02 0.01 1 1.000 3.078 6.314 12.706 31.821 63.657 2 0.816 1.886 2.920 4.303 6.965 9.925 3 0.765 1.638 2.353 3.182 4.541 5.841 4 0.741 1.533 2.132 2.776 3.747 4.604 5 0.727 1.476 2.015 2.571 3.365 4.032 6 0.718 1.440 1.943 2.447 3.143 3.707 7 0.711 1.415 1.895 2.365 2.998 3.499 8 0.706 1.397 1.860 2.306 2.896 3.355 9 0.703 1.383 1.833 2.262 2.821 3.250 10 0.700 1.372 1.812 2.228 2.764 3.169 11 0.697 1.363 1.796 2.201 2.718 3.106 12 0.695 1.356 1.782 2.179 2.681 3.055 13 0.694 1.350 1.771 2.160 2.650 3.012 14 0.692 1.345 1.761 2.145 2.624 2.977 15 0.691 1.341 1.753 2.131 2.602 2.947 16 0.690 1.337 1.746 2.120 2.583 2.921 17 0.689 1.333 1.740 2.110 2.567 2.898 18 0.688 1.330 1.734 2.101 2.552 2.878 19 0.688 1.328 1.729 2.093 2.539 2.861 20 0.687 1.325 1.725 2.086 2.528 2.845 21 0.686 1.323 1.721 2.080 2.518 2.831 22 0.686 1.321 1.717 2.074 2.508 2.819 23 0.685 1.319 1.714 2.069 2.500 2.807 24 0.685 1.318 1.711 2.064 2.492 2.797 25 0.684 1.316 1.708 2.060 2.485 2.787 26 0.684 1.315 1.706 2.056 2.479 2.779 27 0.684 1.314 1.703 2.052 2.473 2.771 28 0.683 1.313 1.701 2.048 2.467 2.763 29 0.683 1.311 1.699 2.045 2.462 2.756 ∞ 0.674 1.282 1.645 1.960 2.326 2.576

tt t tt−t−t− t t t

141

Appendix A Confidence intervals using EXCEL • In EXCEL, select the tab “Formulas” “More Functions” “Statistical”.

You can calculate the margin of error using the functions:

CONFIDENCE.NORM (z table) or CONFIDENCE.T (t table).

• Suppose we will construct a 95 % confidence interval from the z table with

population deviation 4 and sample size 20:

“Alpha” = 1 – 0.95 = 0.05

Hence the margin or error = 1.7530

142

Tutorial 9: Estimation of parameters


1 Determine the cz value for the following:

(a) 90 % confidence interval for µ .

(b) 95 % confidence interval for µ .

(c) 98 % confidence interval for µ .

(d) 99 % confidence interval for µ .

2 Determine the ct value for the following:

(a) 10n = , 90 % confidence interval for µ .

(b) 22n = , 95 % confidence interval for µ .

(c) 25n = , 98 % confidence interval for µ .

(d) 18n = , 99 % confidence interval for µ .

3 Given that 10x = , calculate the 98 % confidence interval for µ when

(a) X is a normal random variable, 20, 3n σ= = .

(b) X is not a normal random variable, 50, 3n σ= = .

(c) X is a normal random variable, 10,n = σ is unknown and 2s = .

143


1 In a particular factory, the quantity of mineral water dispensed by automated machines into plastic bottles is approximately normally distributed with standard deviation of 24 millilitres. A random sample of 25 such bottles was found to have a mean quantity of 503 millilitres.

(a) Find the standard error of the mean.

(b) Find a 90 % confidence interval for the mean quantity of mineral water dispensed by the machines.

(c) Find a 98 % confidence interval for the mean quantity of mineral water dispensed by the machines.

2 The heights of a random sample of 40 NYP students yield a mean of 173.8 cm and a standard deviation of 6.8 cm. Assume population is normally distributed.

(a) Construct a 95 % confidence interval for mean height of all NYP students.

(b) With reference to the 95 % confidence interval, what is the maximum possible error of using the sample mean as an estimate of the population mean?

3 One of the objectives of a large medical study was to estimate the mean physician fee for cataract removal. For n randomly selected cases the mean fee was found to be $1550 with a standard deviation of $125.

(a) Find a 99 % confidence interval on µ , the mean fee for all physicians when n = 35.

(b) Find a 99 % confidence interval on µ , the mean fee for all physicians when n = 25 and the distribution of the fees is normally distributed.

144

4 A researcher selected a random sample of 8 chick embryos to study the development of thymus gland. He weighed the glands of these 8 chick embryos after 12 days of incubation. The thymus weights (in mg) were as follows:

28.4 20.8 27.6 33.0 40.8 36.5 29.1 31.8

(a) Using your calculator, find the sample mean and the sample standard deviation.

(b) Construct a 90% confidence interval for the population mean.

(c) State whether any assumption is required on the distribution of the embryos.

C Conceptual Questions

Determine whether the following statements are true or false. Explain your reasoning.

1 For a given standard error, lower confidence levels produce wider confidence intervals.

2 If you increase sample size, the width of the confidence interval will increase.

3 To reduce the width of a confidence interval by half, we have to increase the sample size by four times.

Answers

A1 a 1.645 b 1.96 c 2.33 d 2.575

A2 a 1.833 b 2.080 c 2.492 d 2.898

A3 a (8.44, 11.6) b (9.01, 11.0) c (8.22, 11.8)

B1 a 4.8 b (495, 511) c (492, 514)

B2 a (172,176) b 2.11

B3 a (1495.49,1604.51) b ( )14980.08,1619.93

B4 a 31, 6.06 b (26.9,35.1)

C1 F C2 F C3 T

145

Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008//2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 10 : Hypothesis Testing with One Sample

Objectives :

At the end of this lesson, the student should be able to: 1. formulate a hypothesis test by using its characteristics such as formulating the

null and alternate hypothesis, identifying the correct test statistics and applying the critical regions

2. evaluate its reliability by explaining the type I and II errors

3. evaluate the hypothesis of a population mean by using the z-test or t-test

146

TOPIC 10: Hypothesis Testing with One Sample

10.1 Introduction to Hypothesis Testing

• Suppose a car manufacturer advertises that its new hybrid car has a mean mileage

of 50 miles per gallon. This statement may be true but it has yet been proven. Such

a statement is known as a statistical hypothesis.

• One way of testing the above hypothesis is to literally test all the hybrid cars made by this manufacturer; which is both impractical and non-economical. The more sensible approach is to test the validity by considering random samples taken from the population of this hybrid cars.

• In this chapter, you will learn how to test a claim or a hypothesis about a population parameter, based on the information obtained from a random sample. In this module, we are only concerned with testing the population mean.

10.2.1 Stating a Hypothesis

• A statement about a population parameter is called a statistical hypothesis. To

test a population parameter, you must state a pair of hypotheses – one that

represents the claim and the other its complement. When one of these hypotheses

is false, the other must be true. Either hypotheses – the null hypothesis or the

alternative hypothesis may represent the original claim.

• To write the null and alternative hypotheses, translate the claim made about the

population parameter from a verbal statement to a mathematical statement. Then

Definition

1. A null hypothesis 0H is a statistical hypothesis that contains a statement of equality, such as ≤, = or ≥ . 2. The alternate hypothesis aH is the complement of the null hypothesis. It is a statement that must be true if 0H is false and it contains statement of strict inequality, such as >, ≠ or < .

147

write its complements. For instance, if the claim value is k and the population

parameter is µ, then some possible pairs of null and alternative hypotheses are:

• 1st possible pair 2nd possible pair 3rd possible pair

0

1

::

H kH k

µµ≤

> 0

1

::

H kH k

µµ≥

< 0

1

::

H kH k

µµ=

≠

• Thereafter, we will examine the sampling distribution and determine whether or not

a sample statistic is unusual.

Example 10.2-1

Write the following claims as a mathematical sentence. State the null and alternative

hypotheses and identify which represents the claim.

(a) A university publicises that the proportion of its students who graduate in 4 years

is 82%.

(b) A water faucet manufacturer announces that the mean flow rate of a certain type

of faucet is less than 2.5 gallons per minute.

(c) A cereal company advertises that the mean weight of the contents of its 20-ounce

size cereal boxes is more than 20 ounces.

Solution: (a) The claim “the proportion … is 82%” can be written as 0.82p = . Its complement

is 0.82p ≠ . Since 0.82p = contains the statement of equality, it becomes the null hypothesis. In this case, the null hypothesis is also the claim. Hence,

0

1

: 0.82 (claim): 0.82

H pH p

= ≠

148

10.2.2 Types of Errors

• No matter which hypothesis represents the claim, we always begin a hypothesis

test by assuming that the equality condition in the null hypothesis is true. So, when

we perform a hypothesis test, we make one of two decisions:

1. Reject the null hypothesis; or

2. Fail to reject the null hypothesis.

Since our decision is based on a sample rather than the entire population, there is

always the possibility that we will make the wrong decision.

• The only way to be absolutely certain of whether 0H is true or false is to test the

entire population. Otherwise, we might reject 0H when it is actually true or fail to

reject 0H when it is actually false.

The following table shows the four possible outcomes of a hypothesis test.

Truth of 0H

Decision 0H is true 0H is false

Do not reject 0H Correct decision Type II error

Reject 0H Type I error Correct decision

Definition 1. A type I error occurs if the null hypothesis is rejected when it is true.

2. A type II error occurs if the null hypothesis is not rejected when it is false.

149

Example 10.2-2

The USDA limit for salmonella contamination for chicken is 20%. A meat inspector

reports that the chicken produced by a company exceeds the USDA limit. You perform

a hypothesis test to determine whether the meat inspector’s claim is true. When will a

type I or II error occur? Which is more serious?

Solution:

Let p be the proportion of chicken that is contaminated.

0

1

: 0.2: 0.2 (claim)

H pH p

≤ >

Type I error occurs when the actual proportion of contaminated chicken is less than or

equal to 0.2 but we decided to reject the null hypothesis.

Type II error occurs when the actual proportion of contaminated chicken is greater

than 0.2 but we do not reject the null hypothesis.

Type II error is more serious because we are allowing chicken that exceeded USDA

contamination limit to be sold to consumers; which could result in sickness and death.

Example 10.2-3

A company specialising in parachute assembly states that its main parachute failure

rate is not more than 1%. You perform a hypothesis test to determine if its claim is

false. When will a type I or type II error occur? Which is more serious?

Solution:

150

10.2.3 Level of Significance

• By setting the level of significance at a small value, you are saying that you want

the probability of rejecting a true null hypothesis to be small. Three commonly used

level of significance are 0.10, 0.05α α= = and 0.01α = .

• The probability of a type II error is denoted by β.

10.2.4 Types of Test and the Rejection Criteria

• Knowing the type of hypothesis tests helps us to decide the criteria for rejecting the null hypothesis. The region of the sampling distribution that favours the alternative hypothesis aH (i.e. the rejection of 0H ) determines the type of test. There are three types of hypothesis tests—a left-, right-, or two-tailed test.

•

• Type 1: Left-tailed test •

•

0:

:

a

H

H

k

k

µ ≥

µ <

• Type 2: Right-tailed test •

•

0:

:

a

H

H

k

k

µ ≤

µ >

• Type 3: Two-tailed test •

•

0:

:

a

H

H

k

k

µ =

µ ≠

Definition In a hypothesis test, the level of significance is your maximum allowable probability

of making a type I error. It is denoted by α.

151

• To find the critical value(s) that defines the rejection region, we need to establish

the type of hypothesis test, the level of significance and the sampling distribution.

The critical value is denoted by: 1.

cz if the sampling distribution follows normal distribution

2. c

t if the sampling distribution follows student-t distribution

• Case 1: Left-tailed test • • The rejection region is the area on the left of the

critical value, i.e. c c

z z t t< < or

• Case 2: Right-tailed test • • The rejection region is the area on the right of the

critical value.

• i. e. c cz z t t> > or

• • • Case 3: Two-tailed test • • The rejection region is the area on the left of the

negative critical value and to the right of the positive critical value.

• i.e. { } { }c c c cz z z z t t t t< − > < − > or or or

Definition A rejection region of the sampling distribution is the range of values for which the

null hypothesis is not probable. A critical value separates the rejection region from

the non-rejection region.

Critical value

Rejection region: Reject H0

Critical value


- ve Critical value +ve Critical value



152

Example 10.2-4

In each of the claims, state the null and alternative hypotheses, determine if the test

is a left-, right- or two-tailed test. At 0.10α = , sketch a normal sampling distribution

and find the critical value(s).Assume that the population follows a normal distribution.

(i) A consumer analyst reports that the mean life of a certain type of automobile

battery is 74 months.

(ii) A radio station publicises that its proportion of the local listening audience is

greater than 39%.

Solution: (i) Null hypothesis : ___________________________

Alternative hypothesis : ___________________________

Type of test : ___________________________

(ii)

10.2.5 Test Statistics and Making Decision

• To use the rejection region to make a conclusion in a hypothesis test:

Case 1: If a test statistic falls in the rejection region, we reject null hypothesis.

Case 2: If a test statistic falls outside of the rejection region, we fail to reject the null hypothesis.

• The following table will help you to interpret your decision:

153

Claim Decision Claim is 0H Claim is aH

Reject 0H There is enough evidence

to reject the claim.

There is enough evidence to

support the claim.

Fail to reject 0H There is not enough

evidence to reject the

claim.

There is not enough evidence

to support the claim.

• The test statistic for the statistical test for a population mean is the sample

mean, x and the standardized test statistic is denoted by

(i) z if the sampling distribution follows normal distribution (or 30n ≥ ) (ii) t if the sampling distribution follows student-t distribution (or 30n < )

(iii) The standardized test statistic sample mean - hypothesized mean

standard error=

• When testing a population mean,

154

Example 10.2-5 (Large Sample)

The CEO of a firm claims that the mean work day of the firm’s accountants is less than

8.5 hours. A random sample of 35 of the firm’s accountants has a mean work day of

8.2 hours with a standard deviation of 0.5 hour. At 0.01α = , test the CEO’s claim.

Solution:

Example 10.2-6 (Small Sample)

A used car dealer says that the mean price of a 2010 Honda Pilot LX is at least $23,900.

You suspect this claim is incorrect and find that a random sample of 14 similar vehicles

has a mean price of $23,000 and a standard deviation of $1113. Is there enough

evidence to reject the dealer’s claim at 0.05α = ? Assume the population is normally

distributed.

Solution:

155

In summary, the steps for hypothesis testing are:

156

Tutorial 10: Hypothesis Testing with One Sample


A.1 Finding Critical Values for Normal Distribution

Find the critical value(s) for the indicated z-test and level of significance.

(a) right-tailed, 0.05α = (b) right-tailed, 0.08α =

(c) left-tailed, 0.03α = (d) left-tailed, 0.09α =

(e) two-tailed, 0.02α = (f) two-tailed, 0.10α =

A.2 Finding Critical Value(s) for Student t-distribution

Find the critical value(s) for the indicated t-test, level of significance and sample.

(a) right-tailed, 0.05, 23nα = = (b) right-tailed, 0.01, 11nα = =

(c) left-tailed, 0.025, 19nα = = (d) left-tailed, 0.05, 14nα = =

(e) two-tailed, 0.01, 27nα = = (f) two-tailed, 0.05, 10nα = =

A.3 Testing the Claim

Test the claim about the population mean µ at the given level of significance using the given sample statistics. Assume population is normally distributed for (iii) and (iv).

(i) Claim: 40; 0.05.µ α= = Sample statistics: 39.2, 3.23, 75x s n= = =

(ii) Claim: 1030; 0.05.µ α> = Sample statistics: 1035, 23, 50x s n= = =

(iii) Claim: 52200; 0.05.µ α≠ = Sample statistics: 53200, 1200, 4x s n= = =

(iv) Claim: 8000; 0.01.µ α≥ = Sample statistics: 7700, 450, 25x s n= = =

157


B1. A report claims that an adult has an average of 130 Facebook friends. A random sample of 50 adults revealed that the average number of Facebook friends was 142 with a standard deviation of 38.2. At 5% significance level, is there enough evidence to reject the claim?

B2. An officer from the utility department claims that the average water usage per

household is more than 12 cubic meters per month. To check the claim, a random sample of 40 households was selected and found that the average monthly water usage was 13 cubic meters with a standard deviation of 3 cubic meters. At 1% significance level, is there enough evidence to support the officer’s claim?

B3. The management of weight loss club claims that its members lose an average

of 3 kg or more within the first month after joining the club. A consumer agency that wanted to check this claim took a random sample of 36 members of this club and found that they lost an average of 2.9 kg with a standard deviation of 0.6 kg within the first month of membership. Test, at 10% significance level, on whether the management’s claim is true.

B4. A psychologist claims that the mean age at which children start walking is 12.5

months. To check this claim, you took a random sample of 18 children and found that the mean age at which these children started walking was 12.9 months with a standard deviation of 0.7 month. Using the 10% significance level, can you conclude that the mean age at which all children start walking is 12.5 months? Assume that the ages at which all children start walking have an approximately normal distribution.

B5. A pharmaceutical company claims that the average selling price per tablet of

its new drug is less than 45 cents. You have been asked to challenge the claim and so you conducted a random sampling of prices at 10 pharmacies across the country. The results (in cents) are as follow:

33.45 28.99 27.45 42.89 53.91 37.95 48.55 36.80 35.95 40.45

Is there sufficient evidence to support the claim that the average price per tablet

is less than 45 cents at the 1% level of significance? Assume that the selling price per tablet is approximately normally distributed.

B6. The average monthly telephone bill was reported to be more than $50.07. A

random sample of 10 people was taken and the following were the monthly charges (in dollars):

55.83, 49.88, 62.98, 70.42, 60.47, 52.45, 49.20, 50.02, 58.60, 51.29

At the 5% significance level, can the claim be supported? Assume all telephone bills to be approximately normal.

158

Answers

A1a, b 0.95 1.645z = A1b 0.92 1.41z =

A1c, d 0.03 1.88z = − A1d 0.09 1.34z = −

A1e, f 0.01 2.33z = − A1f 0.05 1.645z = −

A2a, b 0.05,22 1.717t = A2b 0.01,10 2.764t =

A2c, d 0.025,18 2.101t = A2d 0.05,13 1.771t =

A2e, f 0.01,26 2.779t = A2f 0.05,9 2.262t =

A3i 2.145 1.96z = − < − , reject 0H

A3ii 1.537 1.645z = < , do no reject 0H

A3iii 1.667 3.182t = < , do no reject 0H

A3iv 3.333 2.492t = − < − , reject 0H

B1 Reject 0H B2 Do not reject 0H

B3 Do not reject 0H B4 Reject 0H

B5 Do not reject 0H B6 Reject 0H

159

Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008//2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 11 : Hypothesis Testing with Two Samples

Objectives :

At the end of this lesson, the student should be able to: 1. to distinguish between independent and dependent samples

2. compare the means of 2 independent samples using the hypothesis testing

approach 3. compare the means of 2 dependent samples using the hypothesis testing approach

160

TOPIC 11: Hypothesis Testing with Two Samples

11.1 Introduction

• Oftentimes, we hear people say ‘The kids these days are taller than before’. In

general, teenagers do seem taller than the adults who are now in their 30s. How

can we justify this assumed comparison with sufficient evidence?

• We know that it is both impractical and non-economical to measure the heights of

all teenagers and the adults in their 30s to make the comparison. The more sensible

approach to compare the difference in their heights is to take random samples from

the two different populations and compare their sample means.

• In reality, it is very common to make comparison between two or more distinct

populations. In this module, we will be exploring the comparison of the sample

means of two populations, although in other situations, it might be necessary to

compare other parameters such as the standard deviation and shape of the

distributions.

11.2 Independent and Dependent Samples

• In comparing two means, we want to see how different is one mean (let’s call this

1x ) from the other (let’s call this 2x ), so the most natural thing to do is to observe

the difference between the two means, 1 2x x− .

• The hypothesis testing approach in the comparison of two means allows us to test

and see if there is enough evidence to conclude that

o two means differ from each other.

o one mean is greater/lesser than the other.

• The approach differs for comparison between independent and dependent samples.

161

Example 11.2-1

Classify each pair of samples as independent or dependent. Justify your answer.

(a) Sample 1: Resting heart rates of 35 individuals before drinking coffee.

Sample 2: Resting heart rates of the same individuals after drinking two cups of coffee.

(b) Sample 1: Test scores for 35 statistics students.

Sample 2: Test scores for 42 biology students who do not study statistics. Solution:

11.3 Hypothesis Testing for Two Independent Samples

• The hypothesis test of two independent samples follows the same 6 steps as the hypothesis test of one sample. The difference lies in step 3, which requires us to know the distribution of the difference in sample means 1 2x x− .

• General Steps for a Hypothesis Test between 2 Independent Samples are:

Step 1: State the claim mathematically. Identify the null, 0H and alternative, aH hypotheses. The possible hypotheses are:

0 1 2

1 2

:

:a

H

H

µ ≥ µ

µ µ<

0 1 2

1 2

:

:a

H

H

µ µ

µ µ

≤>

0 1 2

1 2

:

:a

H

H

µ = µ

µ ≠ µ

Definitions: 1. Two samples are independent if member s of one sample are unrelated to

members of the other sample. 2. Two samples are dependent when each member of one sample is related

to the other sample. Dependent samples are also called paired or matched samples.

162

Regardless of which hypothesis, we always assume that the population means

are the same, i.e. 1 2 1 2 0µ µ µ µ= ⇒ − = .

Step 2: Identify (a) the type of test and (b) the level of significance, α of the

hypothesis test.

Step 3: State the type of distribution of the difference in sample means 1 2x x−

follows.

(i) The sampling distribution of 1 2x x− follows a normal distribution;

2 21 2

1 2 1 21 2

~ ,x x Nn nσ σµ µ

− − −

if three conditions are met:

(a) The samples must be randomly selected;

(b) The samples must be independent;

(c) Each sample size must be large ( 30n ≥ ) or each

population follows a normal distribution with known

standard deviation.

(ii) The sampling distribution of 1 2x x− follows a t-distribution if

(a) each sample size is small ( 30n < );

(b) equal but unknown population variance; then

1 2

^

1 2

1 1x x n n

σ σ−

= + with d.f. 1 2 2n n= + − ; or

(c) unknown and unequal population variance; then

1 2

2 21 2

1 2x x

s sn n

σ−

= + with d.f. = smaller of 1 1n − or 2 1n −

Step 4: Determine the rejection criteria using the rejection region

Step 5: Find the standardized test statistics ( ) ( )

1 2

1 2 1 2

X X

x x

−

− − µ − µ=

σ with 1 2 0µ µ− =

since 1 2µ µ= . Step 6: Decide whether to reject or fail to reject 0H and interpret the decision in

the context of the original claim.

163

Example 11.3-1

121 boys and 144 girls sat for the PSLE in 2013. The mean PSLE scores for the boys and girls are 237 and 240 respectively. Assuming a common population standard deviation score of 12, test whether the results provide significant evidence, at the 1% level, that the academic standard of boys is inferior to that of the girls. Solution:

Step 1: 0: B GH µ ≥ µ

( ):a B GH µ < µ claim

Step 2: It is a left-tailed test and 0.01α = Step 3: Since 121 14430, 30

B Gn n= => > with known ,

B Gσ σ ,

B GX X− follows a normal

distribution with 0B GX X

µ−

= and

22 2 212 12 265

121 144 121B G

GB

X X

B Gn n

σσσ

−+ += = =

Step 4: Reject 0H if 2.33z zα< = − . Step 5:: Standardized test statistic,

( ) ( ) ( )237 240 02.03

265

121B G

B G B G

X X

x xz

−

− − µ − µ − −= = −

σ=

Step 6: Since 2.03 2.33z zα= − > = − , we do not reject 0H .

At 0.01α = , there is not enough evidence to support the claim that the academic standard of boys is inferior to that of the girls.

164

Example 11.3-2

The braking distances of 8 Volkswagen GTIs and 10 Ford Focuses were tested when travelling at 60 miles per hour on dry pavement. The results are shown below.

GTI 1

134x = 16.9s = 1

8n = FOCUS

2143x = 2

2.6s = 210n =

Can you conclude that there is a difference in the mean braking distances of the two types of cars? Use 0.01α = . Assume the populations are normally distributed and the population variances are not equal. Solution:

165

Example 11.3-3

A study sought to find out if playing soft classical music to plants helps in plant growth. 40 plants grown from the same batch of seeds are divided equally into two samples, A and B. Sample A is grown for a month under the sound of soft classical music while sample B acts as the control group. The mean growth (in mm) and standard deviation (in mm) of both samples are shown below:

Sample A:

36 38 33 39 31 34 40 33 36 35 35 34 36 38 33 32 39 45 41 34

Sample B:

32 36 31 38 29 32 38 31 34 33 33 32 34 36 31 31 37 39 29 32

Assume the populations are normally distributed and the population variances are equal, test at the 5% level if music has indeed helped in plant growth.

Solution:

166

11.4 Hypothesis Testing for Two Dependent Samples

• In the hypothesis test of 2 dependent samples or paired data, we are interested in the difference between the 2 values within each paired data ( )

1 2,X X . The difference denoted

by d is defined as1 2

d X X= − .The mean of the differences between paired data entries in

the dependent samples is calculated using, d

dn

=∑ , where n is the number of data pairs.

• DISTRIBUTION OF SAMPLE MEAN OF THE DIFFERENCE:

d follows approximately a t – distribution with degrees of freedom 1n − , if the following conditions are satisfied:

(a) the samples are randomly selected (b) the samples are dependent (paired) (c) both populations are normally distributed.

• General Steps for a Hypothesis Test between 2 Dependent Samples:

Step 1: State the claim mathematically. Identify the null, 0H and alternative, aH hypotheses. The possible hypotheses are:

0:

:a

d

d

H

H

k

k

µ ≥

µ < 0:

:a

d

d

H

H

k

k

µ ≤

µ > 0:

:a

d

d

H

H

k

k

µ =

µ ≠

Step 2: Identify (a) the type of test and (b) the level of significance, α of the hypothesis

test. Step 3: State that d follows t – distribution with d.f. = 1n − . Step 4: Determine the rejection criteria using rejection region.

Step 5: Find the standardized test statistic, d

ddt

s

n

− µ=

Step 6: Decide whether to reject or fail to reject 0H and interpret the decision in the

context of the original claim.

167

Example 11.4-1

An advertisement states that a particular lymphatic massage program will help participants lose weight after one month. The table shows the weights of 12 adults before and after the participating in the program. At 0.10α = , can you conclude that the massage program helps participants lose weight? Assume the weights are normally distributed.

Subject 1 2 3 4 5 6 7 8 9 10 11 12 Weight (Before) 157 185 120 212 230 165 207 251 196 140 137 172

Weight (After) 150 181 121 206 215 169 210 232 188 138 145 172

Solution:

Subject 1 2 3 4 5 6 7 8 9 10 11 12 d 7 4 -1 6 15 -4 -3 19 8 2 -8 0 d2 49 16 1 36 225 16 9 361 64 4 64 0

453.75

12

dd

n= = =∑ &

( )( ) ( )2 2

22

45845

12 7.84071 1 11

d

dd

d d nsn n

− −−

= = = =− −

∑∑∑

Step 1:

0: 0dH µ ≤

( ): 0a dH µ > claim

Step 2: It is a right-tailed test and 0.10α = Step 3: d follows t – distribution with d.f. = 12 1 11− = , 0

dµ = and

7.8407

2.263412

d

d

s

nσ = = =

Step 4: Reject 0H if 0.10 1.363t t> = . Step 5: Standardized test statistic,

3.75 01.657

2.2634d

d

dt

− µ −= = =

σ

Step 6: Since 0.101.657 1.363t t= > = , we reject

0H . At 0.10α = , there is enough evidence to support the claim that massage program helps participants lose weight.

168

Example 11.4-2

The table gives the blood pressures (in mm Hg) of seven adults before and after the completion of a special dietary plan.

Individual 1 2 3 4 5 6 7 Before 210 180 195 220 231 199 224

After 193 186 186 223 220 183 233

Let dµ be the mean of the differences between the systolic blood pressures before and after completing this special dietary plan for the population of all adults. Using the 5% significance level, can you conclude that the mean of the paired difference dµ is different from zero? Assume the blood pressures are normally distributed. Solution:

169

Tutorial 11: Hypothesis Testing with Two Populations

A Independent Samples

QUESTION 1 A study was designed to investigate the effect of a calcium-deficient diet on lead consumption in rats. One hundred rats were randomly divided into 2 groups of 50 each. One group served as a control group and the other was the experimental, or calcium-deficient group. The response record was the amount of lead consumed per rat. The results were summarized by:

CONTROL 15.2x =

11.1s =

150n =

EXPERIMENTAL 25.6x =

21.3s =

250n =

At α =0.05, is there sufficient evidence to suggest that the calcium deficient diet results in increased lead consumption in rats? QUESTION 2 A study was conducted to assess whether teenage boys worry more than teenage girls. A scale called the Anxiety Scale was used to measure the level of anxiety experienced by an individual. A higher value on the Anxiety Scale corresponds to a higher level of anxiety. The results obtained are summarized in the table below:

Sample size Sample Mean

Sample Standard Deviation

Boys 102 66.78 9.2 Girls 76 65.33 9.3

Is there sufficient evidence at the 5% level that teenage boys score higher on the Anxiety Scale than the teenage girls? QUESTION 3 An insurance company wants to know if the average speed at which men drive cars is higher than that of women drivers. The company took a random sample of 20 cars driven by men on an expressway and found the mean speed to be 89 km/h with a standard deviation of 3 km/h. Another sample of 18 cars driven by women on the same expressway gave a mean speed of 86 km/h with a standard deviation of 2.5 km/h. Assume that the speeds at which all men and all women drive cars on this expressway are normally distributed with unequal population standard deviations. Test at the 10% significance level whether the mean speed of cars driven by all men drivers on this expressway is higher than that of cars driven by all women drivers.

170

B Dependent Samples

QUESTION 1 Triglyceride is a type of fat found in fatty tissue. Individuals found with high level of triglyceride in their blood have a higher risk of contracting heart diseases. To determine if regular exercise can reduce triglyceride levels, researchers measured the triglyceride level of 8 individuals with mild high cholesterol before and after attending 3 months of intensive aerobics exercise program.

Individual 1 2 3 4 5 6 7 8 Before 200 226 218 246 195 278 254 237

After 135 206 146 172 175 224 233 192

Test, at the 5% significance level, if the aerobics exercise program has been effective in reducing triglyceride level in blood serum. Assume triglyceride levels are normally distributed. QUESTION 2 A dietitian wishes to see if a person’s cholesterol level (in mg/dL) will change if the diet is supplemented by a certain mineral. Six subjects were pretested and then they took the mineral supplement for a six-week period. The results are shown in the table below.

Subject 1 2 3 4 5 6 Before 210 235 208 190 172 244 After 190 170 210 188 173 228

a. State the underlying assumptions needed to perform a hypothesis testing in this context. b. Test, at the 10% significance level, whether there is a change in cholesterol level when the

mineral supplements the diet. QUESTION 3 Susan, the receiving clerk of a chemical distributor, is faced with a continuing problem of broken glassware which includes test tubes, petri dishes and flasks. Susan imposed some additional shipping precautions which she believes can prevent further breakage on these types of glassware. After a month of implementing the precautionary measures, she requested the purchasing clerk to provide her the information on the average number of broken items per shipment. Data from eight different suppliers given to the purchasing clerk are given below.

Supplier 1 2 3 4 5 6 7 8 Before 16 12 18 7 14 19 6 17 After 14 13 12 6 9 15 8 15

Does the data indicate, at 0.05α = , that the new measures have lowered the average number of broken items? Assuming the number of broken glassware is normally distributed.

171

Answers

A1 Reject 0H A2 Do not reject 0H A3 Reject 0H

B1 Reject 0H B2 Do not reject 0H B3 Reject 0H

School of Engineering · 2018-04-11 · 1 . School of Engineering . Course : Diploma in Electronic...

Documents

Transcript of School of Engineering · 2018-04-11 · 1 . School of Engineering . Course : Diploma in Electronic...