School of Engineering · 2018-04-11 · 1 . School of Engineering . Course : Diploma in Electronic...
Transcript of School of Engineering · 2018-04-11 · 1 . School of Engineering . Course : Diploma in Electronic...
1
School of Engineering
Course : Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B EGB/D/F/H/J/M207
LECTURE NOTES Revised since Apr 18
2
Content Page
• Study Plan, Assessment Plan, Formula List …..………………….....….
3
• Topic 1: Descriptive Statistics …………………………………………..…
11
• Topic 2: Linear Regression and Correlation …..…………………………
30
• Topic 3: Principles of Counting ……………………………………………
50
• Topic 4: Probability ……………………………………………..………….
64
• Topic 5: Discrete Probability Distribution ……………………………......
82
• Topic 6: Binomial and Poisson Distributions …………….………………
90
• Topic 7: Normal Distribution …………………………….....………...........
103
• Topic 8: Distribution of Sample Means ……………..………...................
121
• Topic 9: Estimation of Parameters and the t-Distribution …..…………..
129
• Topic 10: Hypothesis Testing with One Sample .………………………..
145
• Topic 11: Hypothesis Testing with Two Samples …..............................
159
Suggested Study Plan
Wk Topic Study Materials
Practice
1 Descriptive Statistics
Topic 1: Descriptive Statistics ◊ Interpret frequency distributions ◊ Describe measures of central frequency ◊ Describe measures of variability
Tutorial 1
2
Scatter diagram,
correlation & simple linear regression
Topic 2: Linear Regression & Correlation ◊ Explain relationship between two variables by
observing scatter diagram & using the correlation coefficient
◊ Perform regression analysis by using the least square regression line
◊ Evaluate the reliability of an estimation by using the correlation coefficient & the range of dataset
** Task for assignment 1 released to students **
Tutorial 2
3
Principles of Counting
Topic 3: Principle of Counting ◊ Fundamental Counting Principle ◊ Permutation and Combination ** Quiz 1: Chapters 1 and 2 **
Tutorial 3
4
Probability
Topic 4: Probability ◊ Probability experiments, types of probability ◊ Properties of Probability & its Applications ◊ Conditional Probability ◊ Types of events (independent, mutually exclusive) ◊ Multiplication & Addition Rules
Tutorial 4
5
Discrete Random Variables
Topic 5: Discrete Probability Distribution ◊ Define random variables ◊ Distinguish discrete and continuous random
variables ◊ Define a discrete probability distribution ◊ Compute mean, variance & standard deviation of
discrete random variable
Tutorial 5
6 Binomial Distribution
Topic 6A: Binomial Distribution ◊ Binomial Experiments ◊ Binomial probability function & applications ◊ Calculate mean & variance
** Quiz 2: Chapters 3, 4 and 5 **
Tutorial 6
7
Poisson Distribution
Topic 6B: Poisson Distribution ◊ Define Poisson random variable & conditions
◊ Poisson probability function & applications
◊ Calculate mean and variance
Tutorial 6
8 Normal Distribution
Topic 7: Normal Distribution ◊ Properties of Normal & Standard Normal
Distribution ◊ Compute probabilities using tables ** Due for assignment submission **
Tutorial 7
9-10 Term Break
11 Sampling Distribution
Topic 8: Distribution of Sample Means ◊ Describe sampling distribution of sample means ◊ Apply Central Limit Theorem
Tutorial 8
12
Estimation of the population
mean
Topic 9A: Estimation of Parameters ◊ Calculate point estimators of population
parameters ◊ Construct & interpret confidence intervals for the
population mean for known population standard deviation
◊ Calculate the sample size necessary for estimating population mean with specified margin of error
** E-Quiz: Chapters 7 and 8 **
Tutorial 9
13 The t-distribution
Topic 9B: Estimation of Parameters ◊ Describe the properties of the t-distribution ◊ Construct confidence interval to estimate
population mean when sample size is small with unknown population standard deviation
Tutorial 9
14 Introduction to
hypothesis testing
Topic 10: Introduction to hypothesis testing ◊ Formulate a hypothesis test ◊ Evaluate its reliability by explaining type I and II
errors ** Written assignment: Chapters 10 **
Tutorial 10
15 Testing a population
mean
Topic 10: Testing a population mean ◊ Evaluate the hypothesis of a population mean by
using the z-test or t-test
Tutorial 10
16
Testing the difference
between two population
means
Topic 11: Testing the difference between two population means ◊ Evaluate the hypothesis for the difference between
two population means by using the z-test or t-test
Tutorial 11
17 Revision Final Exam Revision
Assessment Plan AY18 S1
Assessment Methods Percentage
Components
Quizzes 20%
Assignments 30% Semester Exam 50 %
Total 100 %
5
Formula Tables
Probability
( ) ( )( )
n EP E
n S= , E is an event & S a sample space
( ) 1 ( )P E P E′ = − , E′ is a complement event of E
( ) ( ) ( ) ( )P A B P A P B P A B∪ = + − ∩
( ) ( )( )
P A BP A B
P B
∩= or ( ) ( ) ( )P A B P A B P B∩ =
A, B mutually exclusive ( ) 0P A B⇔ ∩ =
A, B independent ( ) ( ) ( )P A B P A P B⇔ ∩ =
Counting
Permutation: ( )!
!
nnPrn r
=−
Repeated objects, ik : 1 2
!! !... !
nk
m
nPk k k
=
Combination: ( )!
! !
nnCrr n r
=−
Linear Regression Line
y mx C= +) , m is the slope/gradient and C is the y-intercept
Binomial Distribution
( )( ) ( )( )( )
~
1 0 1 2 ...
1
n xn xx
X B n p
P X x C p p x n
E X np
Var X npq q p
−= = − =
=
= = −
,
, , , , ,
where
Poisson Distribution
( )
( )
( )( )
~
0 1 2 3 ...!
ox
X P
eP X x xx
E X
Var X
−= = =
=
=
, , , , ,µ
µ
µ
µ
µ
Mean and Variance of a Random Variable
The Mean of a discrete random variable X ,
( )k P X kµ = ⋅ =∑
The Variance of a discrete random variable is
2 2 2( )k P X k σ = ⋅ = − µ ∑
Measures of Central Tendency
Sample mean: x
xn
∑=
Sample variance: ( )2
2
211
sn
xx
n=
−
−
∑∑
X
6
Confidence Interval for Population Mean
2
,
30 ,
30 ,
Population Sample Confidence Intervalsize, Variance,
known any
unknown
unknown
c c
c c
c c
cn
n x z x zn ns sn x z x zn ns sn x t x tn n
σ
σ σ − + ≥ − + < − +
Testing a Mean
30
301
2Population Sample Test
size, StatisticVariance, Testing a single knownsample value
Testing a mean known any
Testing a mean unknown
Testing a mean unknownwith
nxz
xn zn
xn zs n
xtn s n
df n
σ−µ
− =σ−µ
=σ
−µ≥ =
− µ=
<
= −
7
Testing the difference of two means (Independent Samples)
Population variance
Sample Size
Test statistic
Known (Unknown
sσ ≈ ) 1 2, 30n n ≥
( ) ( )1 2 1 2
2 21 2
1 2
x xz
n n
µ µ
σ σ
− − −=
+
Unknown & 21 σσ ≠ 1 2, 30n n <
( ) ( )1 2 1 2
2 21 2
1 2
x xt
s sn n
µ µ− − −=
+
With smaller of 1 1df n= − or 2 1df n= −
Unknown & 21 σσ = 30, 21 <nn
( ) ( )1 2 1 2
1 2
1 1ˆ
x xt
n n
µ µ
σ
− − −=
+
where 1 2 2df n n= + −
and 2 2
1 1 2 2
1 2
( 1) ( 1)ˆ2
s n s nn n
σ − + −=
+ −
Testing the difference of two means (Dependent Samples)
d
d
dt sn
µ−= where
( )2
2
1d
dd
ns
n
− =−
∑∑
8
Table 1: Standard Normal Distribution
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 -3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002 -3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003 -3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005 -3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007 -3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010 -2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014 -2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019 -2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026 -2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036 -2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048 -2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064 -2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084 -2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110 -2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143 -2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183 -1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233 -1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294 -1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367 -1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455 -1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559 -1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681 -1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823 -1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985 -1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170 -1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379 -0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611 -0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867 -0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148 -0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451 -0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776 -0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121 -0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483 -0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859 -0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247 0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
9
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
z
0 z
10
Table 2: t – Distribution
Level of
confidence, c 0.50 0.80 0.90 0.95 0.98 0.99
One tail, α 0.25 0.10 0.05 0.025 0.01 0.005 d.f. Two tails, α 0.50 0.20 0.10 0.05 0.02 0.01 1 1.000 3.078 6.314 12.706 31.821 63.657 2 0.816 1.886 2.920 4.303 6.965 9.925 3 0.765 1.638 2.353 3.182 4.541 5.841 4 0.741 1.533 2.132 2.776 3.747 4.604 5 0.727 1.476 2.015 2.571 3.365 4.032 6 0.718 1.440 1.943 2.447 3.143 3.707 7 0.711 1.415 1.895 2.365 2.998 3.499 8 0.706 1.397 1.860 2.306 2.896 3.355 9 0.703 1.383 1.833 2.262 2.821 3.250 10 0.700 1.372 1.812 2.228 2.764 3.169 11 0.697 1.363 1.796 2.201 2.718 3.106 12 0.695 1.356 1.782 2.179 2.681 3.055 13 0.694 1.350 1.771 2.160 2.650 3.012 14 0.692 1.345 1.761 2.145 2.624 2.977 15 0.691 1.341 1.753 2.131 2.602 2.947 16 0.690 1.337 1.746 2.120 2.583 2.921 17 0.689 1.333 1.740 2.110 2.567 2.898 18 0.688 1.330 1.734 2.101 2.552 2.878 19 0.688 1.328 1.729 2.093 2.539 2.861 20 0.687 1.325 1.725 2.086 2.528 2.845 21 0.686 1.323 1.721 2.080 2.518 2.831 22 0.686 1.321 1.717 2.074 2.508 2.819 23 0.685 1.319 1.714 2.069 2.500 2.807 24 0.685 1.318 1.711 2.064 2.492 2.797 25 0.684 1.316 1.708 2.060 2.485 2.787 26 0.684 1.315 1.706 2.056 2.479 2.779 27 0.684 1.314 1.703 2.052 2.473 2.771 28 0.683 1.313 1.701 2.048 2.467 2.763 29 0.683 1.311 1.699 2.045 2.462 2.756 ∞ 0.674 1.282 1.645 1.960 2.326 2.576
tt t tt−t−t− t t t
11
Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 1 : Descriptive Statistics
Objectives :
At the end of this lesson, the student should be able to: 1 Identify the type of variables 2 Interpret frequency distributions 3 Organise the data and represent through various graphical methods 4 Describe measures of central tendency – mean, median and mode 5 Describe measures of variability – range, variance and standard deviation
12
Topic 1: Descriptive Statistics
1.1.1 Variables
• A variable is the characteristic of an object that can be assigned a value or a category. Variables can be classified as two broad types: Categorical and Quantitative.
• Categorical variables refer to those that cannot be measured numerically. Quantitative variables are those that can be measured numerically.
• Categorical variables can be further divided in to two sub-categories: Ordinal and Nominal. Ordinal variables have categories with a natural ranking while nominal variables don’t.
• Quantitative variables can be further divided into two sub-categories: discrete and
continuous. Discrete variables have a countable number of values (usually integers) while continuous variables have an uncountable number of values.
The following chart illustrates the relationship between the various types of variables.
1.1.2 Population and sample
Variables
Categorical Quantitative
Ordinal
• Size of shirt: S, M, L.
• Survey results: Poor, Good, Excellent.
Nominal
• Hair colour: Brown, Black,..
• Type of drinks: Cola, coffee,…
Discrete
• Number of girls: 1, 2, 3, …
Continuous
• Body Weight (kg): 71.5, 77.2,…
13
• A population is a set of all the possible observations that can be made. A
sample is a subset of the population.
• Taking samples is usually necessary as it is very tedious to obtain every single
data from the population.
• Usually we represent a variable with capital letters such as : X, Y, M,…. We
represent observed values of the variable using small letters. For example x =
750.
Example 1.1.2-1
The Manpower Ministry wishes to conduct a survey on a Singapore citizen’s monthly
salary. A representative from the ministry conducted the survey with 100 people out
of 3.34 million Singapore citizens. Let X represents the salary of a Singapore citizen.
_____________ is the variable of interest _____________ is the observational unit _____________ is the sample size
x = 5000 is an ________________
1.2 Organising data • The raw data given will need to be organised neatly before we draw some basic
conclusions on the findings. For example, the data below does not show a clear
indication on the distribution of the various blood groups.
A A AB O B B O A O A O O O A O A O O B A B O A O AB O A O O O O A O O A O O O B B
AB O B O B O A A A AB
1.2.1 Bar chart, histogram
14
• We can sort the data into two types: ungrouped or grouped. Ungrouped data is one given as individual data points, while grouped data is one given in intervals.
• A frequency distribution is commonly used to organise data into well specified categories. Grouping of data is usually done when a variable has many different values. We can use a bar chart or histogram to have a visual view of the data.
Example 1.2.1-1 (Ungrouped data)
Using the data set on blood groups of 50 people above, use the frequency table and complete the bar chart below:
Blood Types Frequency A 14
AB 4
B 8
O 24
• To create a histogram for grouped data, we need to calculate the common interval length of each class (also known as the “bin width”). To ensure there are no gaps between each class, the class intervals will be extended by half a unit of measurement on the left and right.
Example 1.2.1-2 (Grouped data)
1 2 3 4 5 6
1
2
3
4
x
y
Blood Type
Freq
uenc
y
A AB B O 0
10
20
30
15
A dentist measured the width (in mm) of the last lower molar of 60 female adult. The
results were as follows: 7.6 10.6 8.2 10.3 9.6 7.8 10.1 8.7 9.1 7.7
8.2 9.9 10.9 9.5 10.4 8.8 9.4 9.1 9.7 9.2 8.7 9.4 9.1 7.9 9.5 9.3 8.5 10.8 8.3 8.6 10.1 9.8 8.3 10.5 8.7 9.8 7.6 9.7 10.7 10.4 9.2 9.7 8.6 8.7 8.1 9.2 9.6 10.2 8.9 9.3 8.0 9.3 8.4 9.9 8.7 11.0 8.9 10.0 8.6 8.4 The frequency distribution table is shown below:
Class Interval Class Boundaries Class Midpoint Frequency 7.6 - 8.0
7.55 – 8.05 7.8 6
8.1 – 8.5
8.05 – 8.55 8.3 8
8.6 – 9.0
8.55 – 9.05 8.8 11
9.1 – 9.5
9.05 – 9.55 9.3 13
9.6 – 10.0
9.55 – 10.05 9.8 10
10.1 – 10.5
10.05 – 10.55 10.3 7
10.6 –11.0
10.55 – 11.05 10.8 5
Complete the histogram below:
1.2.2 Stem and Leaf Plot
Width of lower molar (in mm)
Freq
uenc
y
1 2 3 4 5 6
1
2
3
4
x
y
1 2 3 4 5 6
1
2
3
4
x
y
Width of lower molar (in mm)
7.55 8.05 8.55 9.05 9.55 10.05 10.55 11.05 0
5
10
15
16
• Stem-and-leaf plot is a method for showing the frequency with which certain classes of values occur. One common approach is to let the last digit be the “leaves” and the remaining digits form the “stems”.
• Unlike the histogram or bar chart, we can still observe the individual data value in
the stem and leaf plot.
Example 1.2.2-1
The following are the scores of 20 students on a Statistics test:
83 84 77 64 71 87 72 92 57 92
75 52 80 65 79 71 87 93 96 95
Construct a stem-and-leaf plot. Solution:
First we organise the scores in ascending order:
52, 57, 64, 65, 71, 71, 72, 75, 77, 79, 80, 83, 84, 87, 87, 92, 92, 93, 95, 96
The first digit will form the “stems”, while the second digit will form the “leaves”.
1.3 Measures of Central Tendency
5
6
7
8
9
17
1.3.1 Mean
• Given a set of numerical data, 1 2, ,..., nx x x ,
Mean = 1
n
ii
x
n=∑
.
• Notation: Population mean: µ (data set contains the entire population information).
Sample mean: x (data set contains only a sample’s information).
Example 1.3.1-1 (Refer to Appendix 1.2)
Given a sample data set: 2 5 7 10 11 13, , , , , , calculate the sample mean. Solution: 1.3.2 Median • Median refers to a value such that 50% of the data is below this value and 50% of
the data is above this value. It is the middle set of data.
• Suppose our data set has n values. Step 1: Arrange the values in ascending order.
Step 2: If n is odd, then the median is the th1
2n +
number.
If n is even, then the median is the average of the th
2n
andth
12n +
numbers.
Example 1.3.2-1
For each of the following data set, find its median.
18
(a) 2, 5, 6, 4, 7, 4, 7, 2, 8, 9, 4, 11, 9, 1, 3
(b) 3, 1, 4, 7, 9, 5, 6, 8, 3, 1, 2, 9, 12, 4, 4, 15
Solution:
a) Arrange the numbers in ascending order:
1, 2, 2, 3, 4, 4, 4, 5, 6, 7, 7, 8, 9, 9, 11
Since there are 15 (odd) data values, median = ____ th value =
b) Arrange the numbers in ascending order:
1, 1, 2, 3, 3, 4, 4, 4, 5, 6, 7, 8, 9, 9, 12, 15
Since there are 16 (even) data values, median is the average of ____th and _____ th values =
1.3.3 Mode
• Mode refers to the data value that appears the most frequent (highest frequency). A set of data can have more than 1 mode.
Example 1.3.3-1
For each of the following data sets, find the mode.
(a) 2, 5, 6, 4, 7, 4, 7, 2, 8, 9, 4, 11, 9, 1, 3
(b) 2, 3, 5, 9, 6, 4, 7, 4, 7, 2, 8, 9, 2, 9, 1, 3
Solution:
(a) Mode(s) is / are ____________
(b) Mode(s) is / are ____________
1.4 Measures of Dispersion
1.4.1 Range, Interquartile Range
19
• Range of a data set = Largest value – Smallest value.
• Quartiles: Q1, Q2, Q3. Q1 refers to the value such that 25 % of the data values are below it. Q2 is the median of the data and Q3 is the value such that 75 % of the data values are below it.
• Interquartile range = Q3 – Q1 • Procedure: Step 1: Find Q2 (the median).
Step 2: Find Q1 (the median of the data values below Q2).
Step 3: Find Q3 (the median of the data values above Q2).
Step 4: Interquartile range = Q3 – Q1.
Example 1.4.1-1
The annual profit (rounded to millions of dollars) of 12 randomly selected companies in 2013 are as follows:
8 12 7 17 14 45 10 13 17 13 9 11 Find the values of the range, the three quartiles and the interquartile range. Solution: Arrange the numbers in ascending order
7, 8, 9, 10, 11, 12, 13, 13, 14, 17, 17, 45
Range = Q2 (median) = 12 13 12.52+
=
Q1 = median of {7, 8, 9, 10, 11, 12} =
Q3 = median of {13, 13, 14, 17, 17, 45} =
Hence IQR = Q3 – Q1 =
1.4.2 Variance and Standard deviation
• Variance is a measurement of the spread of data, in particular how each data value deviates away from the mean.
20
Higher variance Lower variance
• Notation:
Population variance: 2
2 2 = x
Nσ µ−∑ , where N is the
total population size.
Sample variance: ( )2
2 21 = 1
ii
xs x
n n
∑ − −
∑ , where n is the sample size.
• Standard deviation = variance . ** In this course we will focus on using a scientific calculator to obtain the values of sample mean and variance from a sample data set. **
Example 1.4.2-1 (Refer to Appendix 1.2)
Consider the sample data marks for a mathematics class test (upon 10)
1, 5, 4, 2, 6, 2, 1, 1, 5, 3
Calculate the sample standard deviation.
Solution:
Appendix 1.1 Using Excel for Descriptive Statistics
A1.1.1 Create Bar Chart and Histogram
From Excel, go to Options add-ins analysisAnalysis TookPak.
• Refer to Example 1.2.1-1 (Bar Chart)
21
Step 1: Enter two columns of information, one column for categories and the other
for the frequency.
Step 2: To create a heading for the bar chart, you may key in the heading at the cell above the frequency values’ column.
Step 3: Highlight both columns, proceed to “Insert” tab, select “Column” and “Clustered Column.
• Refer to example 1.2.1-2 (Histogram)
22
Step 1: Copy and paste the data values into an excel spread sheet. Also copy the
the upper limit of the class boundaries on a separate column.
Step 2: Proceed to the “Data” tab, click on the “Data Analysis” option and choose
“Histogram”.
Step 3: Under “Input Range”, highlight all the cells containing the data values.
Under the “Bin Range”, highlight the cells containing the upper limits of the class boundaries (including the header).
Also select the option “Chart Output”.
23
Step 4: Once the diagram is generated, highlight on of the bars and right click to
select “Format Data Series”. Select the gap width to 0.
24
Step 5: You can relabel the class intervals and also the header, etc.
A1.1.2 Calculate mean, median, standard deviation and quartiles
• Refer to Example 1.4.1-1. To calculate the mean of the data: Step 1: Key in the data values in a column and choose the tab “Formulas”, choose “Statistical” and the option “Average”. Step 2: Highlight the values to be computed.
25
• To obtain the median, repeat the same steps as above, choose the function “Median” instead of “Average”.
• To obtain the standard deviation, repeat the same steps as above, choose the function “STDEV.P” if the data is the population data or “STDEV.S” if the data comes from a sample.
• To find the quartiles: Step 1: Arrange the values in ascending order by highlighting the column of data
value, right click and select “Sort” followed by “”smallest to largest”.
Step 2: Use the “Median” function to find Q2. Now apply the “Median” function again on the set of data values lower than Q2 to obtain Q1. Lastly apply the “Median” function again on the set of data values higher than Q2 to obtain Q3.
26
Appendix 1.2 Using Calculator to obtain mean and standard deviation
Step 1: To enter data into a list:
• Mode 2: STAT 1: 1 – Var
• Key in a number and press “ = ” to input the value.
• If frequency list is required:
Shift Setup Down ( ) 4: Frequency
Step 2: To calculate sample standard deviation
• Exit the data input screen by pressing “AC”. • Press “Shift” 1: STAT 4: VAR 4: xs
Step 3: To calculate sample mean
• Exit the data input screen by pressing “AC”. • Press “Shift” 1: STAT 4: VAR 2: x .
Mode Shift
STAT
27
Step 1: To key in data in the calculator
• Mode 1: STAT 0: SD.
• Key in a number and press “ DATA ” to store the value in the list.
Step 2: To calculate sample standard deviation and sample mean
• Press “ALPHA” + “5” for sample standard deviation xs .
• Press “ALPHA” + “4” for sample mean x .
Mode
DATA
x
xs
28
Tutorial 1: Descriptive Statistics Tutorial
1 Classify each of the variables below by placing a “” in the correct categories.
E.g. Flavour of milk Categorical Nominal Ordinal Quantitative Discrete Continuous
(i) Age of a driver (in whole numbers)
Categorical Nominal Ordinal Quantitative Discrete Continuous
(ii) Gender of a driver Categorical Nominal Ordinal Quantitative Discrete Continuous
(iii) Colour of bag Categorical Nominal Ordinal Quantitative Discrete Continuous
(iv) Volume of drink Categorical Nominal Ordinal Quantitative Discrete Continuous
(v) Size of a shirt (S, M, L, XL)
Categorical Nominal Ordinal Quantitative Discrete Continuous
(vi) Number of students
Categorical Nominal Ordinal Quantitative Discrete Continuous
(vii) Examination grades
Categorical Nominal Ordinal Quantitative Discrete Continuous
2 Consider the following 2 sets of sample data
A: 3, 4, 5, 5, 5, 6, 8, 10, 11, 11, 11, 12, 12, 14, 18
B: 3, 4, 5, 5, 5, 5, 6, 8, 10, 11, 11, 11, 12, 12, 14, 18
For each sample, find the following by using the calculator:
(i) mean, (ii) median, (iii) mode, (iv) interquartile range, (v) range, (vi) standard deviation
3 The daily number of Internet system crashes is observed over 30 days at a university computer centre. The daily Internet system crashes are shown in the table below:
Complete the frequency table and compute its mean, median and mode.
Value, x 0 1 2 3 4 5 6 Frequency
1 3 1 1 0 1 0 1 1 0
2 2 0 0 0 1 2 1 2 0
0 1 6 4 3 3 1 2 4 0
29
Answers
1
2 i) 9, 8.75A Bx x= = ii) median 10A = , median 9B =
iii) mode A = 5, 11, mode B = 5,
iv) IQRA = 7, IQRB = 6.5,
v) range A = range B = 15,
vi) 4.28, 4.25A Bs s= =
3
Value, x 0 1 2 3 4 5 6 Frequency 9 10 5 3 2 0 1
i mean = 1.43 ii median = 1 iii mode = 1
E.g. Flavour of milk Categorical Nominal Ordinal Quantitative Discrete Continuous
(i) Age of a driver (in whole numbers)
Categorical Nominal Ordinal Quantitative Discrete Continuous
(ii) Gender of a driver Categorical Nominal Ordinal Quantitative Discrete Continuous
(iii) Colour of bag Categorical Nominal Ordinal Quantitative Discrete Continuous
(iv) Volume of drink Categorical Nominal Ordinal Quantitative Discrete Continuous
(v) Size of a shirt (S, M, L, XL)
Categorical Nominal Ordinal Quantitative Discrete Continuous
(vi) Number of students
Categorical Nominal Ordinal Quantitative Discrete Continuous
(vii) Examination grades
Categorical Nominal Ordinal Quantitative Discrete Continuous
30
Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 2 : Linear Regression and Correlation
Objectives :
At the end of this lesson, the student should be able to: 1 explain the line of best fit and linear correlation between two variables 2 find the correlation coefficient and the equation of regression line 3 interpret the output from simple linear regression analysis and find the
predicted values
Topic 2: Linear Regression and Correlation
31
2.1 Introduction
• In the previous chapter we have been dealing with data of one variable. In this chapter, we will study data with two variables and the relationship between them.
• Examples of data with two variables:
(a) Class test results vs final exam results, (b) Blood pressure vs age of a person, (c) Price of a car vs price of a 3 room HDB flat.
• An overview of this chapter is shown below:
2.2 Scatter Diagram
• Given pairs of observed data ( ) ( ) ( )1 1 2 2, , , ,..., ,n nx y x y x y , we can plot them on the x
– y axes to obtain a scatter diagram.
• Scatter diagrams are useful as they provide visual information whether variables X
and Y share any special relationship. Some examples of scatter diagrams are
shown below:
Scatter diagram Correlation Coefficient
Linear Regression Estimation / Prediction
32
Diagram 2.2.1 Diagram 2.2.2
Diagram 2.2.3 Diagram 2.2.4
• Variables X and Y have a positive relationship if X increases, Y increases (i.e. there is a upward trend). Variables X and Y have a negative relationship if X increases, Y decreases (i.e. there is a downward trend).
• Variables X and Y have a linear relationship if the observed values of X and Y can be described using a straight line equation y mx c= + .
• Diagrams 2.2.1 and 2.2.3 illustrate a positive and negative linear relationship
between X and Y respectively. Diagram 2.2.2 shows that X and Y have a curvilinear relationship while Diagram 2.2.4 shows that X and Y have no obvious relationship.
2 4 6 8 10 12 14 163
5
7
9
x
y
● ●
● ●
●
● ●
● ●
●
2 4 6 8 10 12 14 165
7
9
x
y
33
Example 2.2-1
For each of the scatter plot shown below, describe whether
(i) the relationship between X and Y is positive or negative,
(ii) the relationship between X and Y is linear, curvilinear or not obvious.
(a)
(b)
Solution:
a) The relationship between X and Y is _____________ and ________________.
b) The relationship between X and Y is _____________ and ________________.
• In some situations we may have outliers (observation points that is distant from
the rest) that will distort the shape of the scatter plot. We may need to apply
transformation of the data values so that the relationship between the variables
is more visible.
110115120125130135140145150155160
40 50 60 70 80
Pres
sure
,y
Age, x
Graph of Blood Pressure Vs Age
40
50
60
70
80
90
100
0 5 10 15
Gra
de, y
Absence, x
Graph of GradeVs Absence
34
Example 2.2-2
We want to investigate the relationship between body and brain weights of different
animals. The scatter diagram of the brain weights (in grams) and the body weights (in
kilograms) of 28 animals is shown below:
We observed at least one point that is distant from the rest. Hence it distorts the shape
of the scatter diagram and we may not see any clear relationship between the
variables.
We can apply transformation by taking logarithm to both the observed values of body
and brain weight and plot the scatter diagram shown below:
Now we observe that the new variables exhibit a clearer linear relationship.
0
1000
2000
3000
4000
5000
6000
0 20000 40000 60000 80000 100000
Brai
n W
eigh
t (in
Gra
m)
Body Weight (in Kilogram)
Graph of Brain Weight Vs Body Weight
0
1
2
3
4
-2 -1 0 1 2 3 4 5 6
Log
of B
rain
Wei
ght)
Log of Body Weight
Graph of Log Brain Weight Vs Log Body Weight
outlier
35
2.3 Correlation Coefficient
• Sometimes it is difficult to use our eyes to determine through the scatter diagram
whether there is indeed a linear relationship between the variables. (Refer to
Example 2.2.1).
• Therefore we will need precise mathematical calculation to help us determine the
degree of linearity in the relationship between two variables. This is done through
the Pearson product moment correlation coefficient, r. • The table below shows how the correlation coefficient, r indicates the linear
relationship between two variables.
1 1r− ≤ ≤ Strength of linear relationship Positive Negative
Perfect 1r = 1r = − Very strong 0.8 1r≤ < 1 0.8r− < ≤ −
Strong 0.4 0.8r≤ < 0.8 0.4r− < ≤ − Weak 0.2 0.4r≤ < 0.4 0.2r− < ≤ −
Little / no relationship 0 0.2r≤ < 0.2 0r− < ≤ • The examples below show the shape of various scatter diagrams and r values.
Very strong, positive correlation Strong positive correlation Weak positive correlation
Very strong, negative correlation Strong negative correlation Weak negative correlation Source of data: http://www.seeingstatistics.com/seeing1999/resources/opening.html
36
• We can obtain the correlation coefficient of a data set through EXCEL. The
correlation coefficient is obtained from “Multiple R” and the sign of “X variable
coefficient”. The two examples below illustrate how to interpret the summary output
from EXCEL.
Example 2.3-1
For each of the summary output below, state the value of the correlation coefficient
and describe the relationship between X and Y
(a)
(b)
SUMMARY OUTPUT
Regression Statistics Multiple R 0.896673 R Square 0.804022 Adjusted R Square 0.755028 Standard Error 5.641091 Observations 6
Coefficients Standard
Error t Stat P-value Lower 95% Upper 95%
Low 95.0
Intercept 81.04809 13.88088 5.838829 0.004289 42.50858 119.5876 42.5 X Variable 1 0.964381 0.238061 4.050984 0.015463 0.303418 1.625344 0.30
SUMMARY OUTPUT Regression Statistics
Multiple R 0.944215 R Square 0.891542 Adjusted R Square 0.869851 Standard Error 6.054643 Observations 7
Coefficients Standard
Error t Stat P-value Lower 95% Upper 95%
Lower 95.0%
Intercept 102.4925 5.138068 19.94768 5.85E-06 89.28471 115.7004 89.284 X Variable 1 -3.62189 0.564949 -6.411 0.00137 -5.07414 -2.16964 -5.074
37
Solution:
a) The correlation coefficient, r = ______________. X and Y has a _____________,
____________ linear relationship.
b) The correlation coefficient, r = ______________. X and Y has a _____________,
____________ linear relationship.
2.4 Simple Linear Regression
• If scatter diagram and correlation coefficient indicate that two variables share a
linear relationship, we will model them using a straight line equation and see how
one variable (dependent variable) changes its value according to another variable
(independent variable).
• Some examples of dependent and independent variables Dependent variables Y Independent variables X
Blood pressure Age
Sales of cold drinks Climate temperature
Price of house bought Monthly salary
Final exam score Number of lessons missed
• Usually we will denote the independent variable as X and dependent variable as
Y.
• A linear regression line Y on X is of the form y mx c= + , where the values of m
and c can be obtained from EXCEL summary output (refer to Example 2.3-1): o m = coefficient of X variable,
o c = coefficient of Intercept.
• The equation of the linear regression line (best fit line) is obtained using the
principle of least squared error (refer to Appendix 2.2).
38
Example 2.4-1
The table below shows the high school GPA and the college GPA at the end of the 1st
year for 10 different students:
Student High School GPA, x College GPA, y 1 2.7 2.2 2 3.1 2.8 3 2.1 2.4 4 3.2 3.8 5 2.4 1.9 6 3.4 3.5 7 2.6 3.1 8 2.0 1.4 9 3.1 3.4 10 2.5 2.5
(a) Using the summary output
(i) state the correlation coefficient and the relationship between the two variables.
(i) write the equation of the regression line Y on X.
SUMMARY OUTPUT Regression Statistics
Multiple R 0.843923 R Square 0.712206 Adjusted R Square 0.676232 Standard Error 0.433342 Observations 10
ANOVA
df SS MS F Significance F Regression 1 3.717716 3.717716 19.79767 0.002141 Residual 8 1.502284 0.187786 Total 9 5.22
Coefficients Standard
Error t Stat P-value Lower 95% Upper 95%
Intercept −0.95037 0.831773 -1.14258 0.286254 -2.86844 0.967706
X Variable 1 1.346999 0.302733 4.449458 0.002141 0.648895 2.045103
1
1.5
2
2.5
3
3.5
4
1.8 2.3 2.8 3.3 3.8
Colle
ge G
PA,y
High School GPA, x
39
(b) Using the equation of the line in part (aii), find the college GPA if the High School GPA is 3.6.
(c) Using the line of best fit in part (aii), find the High School GPA if the College GPA is 2.3.
Solution:
(ai) The correlation coefficient is _______________. The two variables has a
___________ , ______________ and _____________ relationship.
(aii) The linear regression line Y on X is _______________________.
(b) when 3.6x = , (c) when 2.3y = ,
40
Example 2.4-2
The Financial World magazine uses its own complex formula to estimate how much
the following brand names would be worth in cash. The table gives the brand name,
its value in billions of dollars, Y and the company’s revenue in billions, X:
(a) Using the summary output, write the equation of line of best fit.
(b) Using the equation in (a), find the value of the brand name if the company’s revenue is $5 billion, $10 billion and $25 billion,
Brand Name Revenue Value Marlboro 15.4 31.2 Coca-Cola 0.4 4.4 Budweiser 6.2 10.1 Pepsi-Cola 5.5 9.6 Nescafe 4.3 8.5 Kellogg 4.7 8.4 Winston 3.6 6.1 Pampers 4 6.1 Camel 2.3 4.4 Campbell 2.4 3.9 Nestle 6 3.7 Hennessy 0.9 3 Heineken 3.5 2.7 Johnnie Walker 1.5 2.6 Louis Vuitton 0.9 2.6 Hershey 2.6 2.3 Guinness 1.8 2.3 Barbie 0.8 2.2 Kraft 2.8 2.2 Smirnoff 1 2.2 Del Monte 2.3 1.6 Wrigley's 1 1.5 Schweppes 1.3 1.4 Tampax 0.6 1.4 Heinz 0.8 1.3 Quaker 1.1 1.2
Brand Name Revenue Value Colgate 1.1 1.2 Gordon's 0.6 1.1 Hermes 0.5 1 Kleenex 0.7 0.8 Carlsberg 0.8 0.7 Haagen-Dazs 0.5 0.6 Fisher-Price 0.6 0.6 Nivea 0.9 0.6 Sara Lee 0.8 0.5 Oil of Olay 0.6 0.5 Planters 0.7 0.5 Green Giant 1 0.4 Jell-o 0.3 0.4 Band-Aid 0.2 0.2 Ivory 0.4 0.2 Birds Eye 0.3 0.2 Source of data: Financial World, August 12, 1992
ANOVA df SS MS F Significance F
Regression 1 968.6591 968.6591 337.9662 4.11E-21 Residual 40 114.6457 2.866142 Total 41 1083.305
Coefficients Standard
Error t Stat P-value Lower 95%
Intercept -0.55226 0.333114 -1.65787 0.105167 -1.22551 X Variable 1 1.819783 0.098988 18.38386 4.11E-21 1.61972
41
Solution:
(a) The regression line is: _______ ________*y x= +
(b) Substituting the values 5, 10 and 25 in for x and computing the values of y yield
the following predicted values of brand names when the revenues are $5, $10
and $25 billion:
Revenue Predicted value
$5 billion −0.55226 +1.819783 (____) = $___________ billion
$10 billion −0.55226 +1.819783 (____) = $___________ billion
$25 billion −0.55226 +1.819783 (____) = $___________ billion
2.5 Reliability of estimation / prediction
(a) Regression and correlation analysis only attempts to find a relationship
between two variables. Even if there is a very strong linear relationship,
we cannot conclude any causation between the variables.
(i.e. If there is a strong positive linear relationship between blood
pressure and age, we cannot conclude that age causes high blood
pressure).
(b) Given a value of the independent variable X, we say the estimation on Y is reliable if
the x value is within the data range of X provided,
the correlation coefficient r is close to 1 or 1− .
42
Appendix 2.1 Using EXCEL for Regression and Correlation
A2.1.1 Plot scatter diagram and regression line
• Refer to Example 2.4-1 Step 1: Highlight the data values for x and y (x on left y on right column) and go to
“Insert” tab, select “Scatter”. You may use the chart tools to add titles or to do adjustment of other settings.
Step 2: You can highlight the scatter diagram under “Chart Tools”, choose the tab
“Layout” and select “Trendline” “Linear Trendline”. Under “Trendline
Options” you may check the box “display equation on chart” to obtain the
equation of the linear regression line.
A2.1.2 Generate Summary Output
43
• Using Example 2.4-1
Step 1: Ensure you have the Analysis TookPak installed in Excel (Chapter 1).
Under “Data” tab “Data Analysis” Regression.
Step 2: Highlight the data cells for the X and Y variables respectively. Click “OK”
and the summary output will be generated.
Appendix 2.2 Principle of Least Squared Error
44
• Given a scatter diagram, there can be a few possible straight lines to model the
linear pattern. To avoid any ambiguity, statisticians adopt the “best fit” line that
meets the criteria of having the least squared error value.
• The “dots” on the scatter diagram represent the actual observed values of the
variables, while the values on the best fit line are just estimations.
• The errors “ 1,..., ne e ” represent the difference between the actual and estimated
values (error). The line that has the least value of 2
1
n
rr
e=∑ is the best fit line.
Tutorial 2: Linear Regression and Correlation
A Self Practice Questions
110
120
130
140
150
160
40 50 60 70 80
Bloo
d Pr
essu
re,y
Age, x
Graph of Blood Pressure Vs Age
45
1 Each table gives the summary output of the linear regression analysis of y on x. Write down the correlation coefficient and comment on the relationship between x and y.
(a)
(b)
2 Write down the equation of the regression line in the form y mx c= + for
Question 1 (a). Explain what do the values of m and c represent. 3 Using your answer to Question 2, estimate the value of y when 20x = .
46
B Discussion Questions
1 A survey is conducted on the relationship between the maximum height in feet of the roller coasters and their top speeds in miles per hour. The scatter diagram and the Excel summary output of the data are given below:
SUMMARY OUTPUT
Regression Statistics Multiple R 0.89321 R Square 0.79782 Adjusted R Square 0.77760 Standard Error 6.29004 Observations 12
Coefficients Standard Error t Stat P-
value Lower 95%
Upper 95%
Intercept 39.06121 8.16739 4.7826 0.0007 20.8631 57.2593 X Variable 1 0.170724 0.02718 6.2818 9.1E05 0.11017 0.23128
(i) State the correlation coefficient and comment on the relationship between the
height of roller coaster ( ) and the top speed ( ). (ii) Find the equation of the line of best fit. (iii) What is the predicted top speed for a new roller coaster of height 325 feet?
Top
Speed
Height of Roller Coaster
Top Speed Vs Height of Roller Coaster
x y
47
(iv) What must be the height of a new roller coaster if it is designed to go at a top speed of 90 miles per hour?
2 A study is carried out to investigate the relationship between the mid-parent’s height and the daughter’s height. Mid-parent’s height is the average of father’s and mother’s heights. The heights of eleven female students and their mid-parent’s heights in inches were collected. The scatter diagram and the Excel summary output of the data are given below:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.8504 R Square 0.7232 Adjusted R Square 0.6924 Standard Error 1.4506 Observations 11.000
(a) State the correlation coefficient and comment on the relationship between mid-parent’s height ( ) and daughter’s height ( ). (b) Find the equation of the line of best fit, . (c) Predict the daughter’s height if the mid-parent’s height is 69 inches. (d) Briefly state the physical significance of the coefficients a and b. (e) Comment on the reliability of the estimate of daughter’s height when the mid –
parent’s height is 73 inches.
Daug
hter
's He
ight
s(in
ches
)
Mid-parent's Heights (inches)
Daughter's Height Vs Mid-parent's Heights
x y
y = a +bx
Coefficients Standard Error t Stat
Intercept 1.6497 13.363 0.1235
X Variable 1 0.9555 0.1971 4.8487
65 67 68 69 70 71
48
3 In an experiment involving two chemicals x and y, a researcher recorded observations of values of y for controlled values of x and the summary output and scatter diagram are shown below:.
(a) Explain whether a linear model is appropriate.
(b) The researcher realised that some of the observations came from contaminated materials. He then considered only the seven pairs of observations for which the values of x exceeded 6 and discarded the other observations.
(i) On the scatter diagram, circle the data points that are to be removed.
(ii) The researcher proposed two models for the remaining seven pairs of
data:
Model A: y = ax2 + b, correlation coefficient, 0.912961r = −
Model B: y = a ln x + b, correlation coefficient, 0.970794r = −
where a and b are constants
State which model is a better choice, giving a reason for your choice.
(iii) Hence using the better model with 2.69, 11.0a b= − = , estimate the value
of x when y = 6.1. For this model, comment on the validity of this
estimated value.
x
y
4.4 12.3
2.7
6.3
49
Answers
A1 (a) 0.972r = − (b) 0.948r =
A2 0.0636 22.7y x= − +
A3 estimated value of 21.4y =
B1 (i) 0.893r = (ii) 0.171 39.1y x= + (iii) 94.5 mph
(iv) 298 feet
B2 (a) 0.850r = (b) 0.956 1.65y x= + (c) 67.6 inches
B3 (bi) (4.4,3.2) , (5.1, 2.7) (bii) Model B (biii) estimated 6.18x =
50
Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 3 : Principles of Counting
Objectives :
At the end of this lesson, the student should be able to: 1 understand the counting principles 2 explain the difference between permutations and combinations 3 use permutations and combinations in counting problems
51
Topic 3: Principles of Counting
3.1.1 Multiplication Principle
• Suppose that the hardware you wish to create consists of three parts: (a) a battery,
(b) a LED screen and (c) a memory card. There are two types of batteries, three
types of LED screens and two types of memory cards available. How many
products of different specifications can you create?
In total there are _____________ specifications of the product we can create.
I.e. ______ x _______ x _________ = 12 ways.
Battery A
Battery B
LED 1
LED 2
LED 3
LED 1
LED 2
LED 3
Card 1
Card 2
Card 1
Card 2
Card 1
Card 2
Card 1
Card 2
Card 1
Card 2
Card 1
Card 2
Spec 1
Spec 2
Spec 3
Spec 4
Spec 5
Spec 6
Spec 7
Spec 8
Spec 9
Spec 10
Spec 11
Spec 12
52
• In a counting event whereby it can be broken down into n stages. If there is m1
ways for step 1, m2 ways for step 2, …, mn ways for step n. Then by multiplication
principle, there are a total 1 2 ... nm m m× × × ways.
Example 3.1.1-1
A female student is preparing for her graduation dance party and her wardrobe
contains 4 blouses, 7 skirts, 6 pairs of shoes and 3 sets of jewellery and 5 handbags.
Assuming that all the items can be matched in terms of colours and styles, how many
possible ways could she dress herself up?
Solution:
Example 3.1.1-2
A typical PIN (personal identification number) consists of any four letters followed by
two numeric digits. How many different PINs are possible if
(a) repetition of alphanumeric are allowed;
(b) such repetition is not allowed.
Solution:
53
Example 3.1.2-3 The diagram below shows the keypad for an automatic teller machine. The same
sequence of keys represents a variety of different PINs. For instance, 2133, AZDE
and BQ3F are all keyed in exactly the same way.
(a) How many different PINs are represented by the same sequence of keys as
2133?
(b) How many different PINs are represented by the same sequence of keys as
6809?
Solution:
1 QZ
2 ABC
3 DEF
4 GHI
5 JKL
6 MNO
7 PRS
8 TUV
9 WXY
0
54
3.1.2 Addition Principle
• Suppose you are considering to buy a laptop from three brands, Acer, Lenovo and
Dell. Acer, Lenovo and Dell has three, five and two different models available to
choose from. How many choices can you have to buy your laptop?
Case 1: Buy from Acer: _________ choices
Case 2: Buy from Lenovo: __________ choices
Case 3: Buy from Dell: _________ choices
In total there are __________________________ choices.
• In a counting event whereby it can be broken down into n non – overlapping cases
and there are m1 ways for case 1, m2 ways for case 2, …, mn ways for case n. Then
by addition principle, there are a total 1 2 ... nm m m+ + + ways.
3.2 Permutation
• Permutation involves arrangement of objects whereby if we switch their positions
we will get a different outcome.
• Examples include: (a) Arranging alphabets / digits to form words or codes,
(b) Arranging people in a formation to take photographs.
55
• A summary of the various scenarios on permutation is shown below:
Example 3.2-1
A class of 10 students consist of 6 men and 4 women.
(a) How many ways can all of them arrange themselves in a row?
(b) How many ways can we arrange 6 of them in a row?
Solution:
Given n objects and arrange r of them, 0 r n≤ ≤
n objects all distinct
n rP
• Example 3.2-1,2 • !n nP n=
• !
( )!n rnP
n r=
−
r = n n objects not all distinct k1 objects identical of type 1, k2 objects identical of type 2,…, km objects identical of type m
1 2
!! !... !m
nk k k
• Example 3.2-3
56
Example 3.2-2
A biologist has decided to use colours to label the collection of cell specimens in the
laboratory. If he has 5 colours (red, blue, green, yellow and pink) to choose from, how
many 3-colour codes can he make with no repetitions of each colour selected?
Solution:
Example 3.2-3
(a) How many ways can we arrange all the letters in the word “RANDOM”? (b) How many ways can we arrange all the letters in the word “ENGINEERING”? Solution:
3.3 Combination
• Combination involves selection of objects whereby the positions of the objects
does not matter.
• Examples include: (a) Forming a team of 5 people out of 10 people,
(b) Choosing 5 balls from a box full of balls in various colours.
57
• n distinct objects select r of them, 0 r n≤ ≤ . Number of selections:
!( )! !n r
nCn r r
=−
• Alternate notation: nrC or
nr
.
Example 3.3-1 3 different species of orchid are to be selected from 20 unique species for cross-
breeding. How many possible selections can be made?
Solution: Example 3.3-2 The manager of a marketing department wants to form a four- person committee from
the 15 employees in the department. In how many ways can the manager form this
committee?
Solution:
58
3.4 More counting examples
Example 3.4-1 (Cases)
A four-member research team is to be chosen from 6 men and 5 women.
(a) How many teams can be formed if there are no restrictions?
(b) How many teams can be formed if there must be more men than women?
Solution:
Example 3.4-2
Eleven cards each bear a letter, and together they can be made to spell the word
“EXAMINATION”. Three cards are selected from the eleven cards and the order of
selection is not important. Find how many selections can be made
(i) if the three cards all bear different letters,
(ii) if two of the three cards bear the same letter.
Solution:
59
Example 3.4-3 (Complement & Slot – in Method)
In how many ways can the letters of the word “EXCELLENCE” be arranged if
(i) the four E’s are not all together,
(ii) the four E’s are all separated.
Solution:
60
Tutorial 3: Counting
A Self Practice Questions
Permutation
1 Using the letters from the word COMPUTER, find
(i) the number of words that can be formed using all the letters,
(ii) the number of 4 – letter words that can be formed.
2 Find the number of 3-digit PIN codes that can be formed using the digits 1, 2, 3, 4 ,5 ,6 if
(i) no repetitions are allowed,
(ii) repetitions are allowed.
3 In how many distinguishable ways can the letters in the following words be arranged?
(a) PAPAYA (b) PERMUTATIONS
Combination
4 Space shuttle astronauts each consume an average of 3000 calories per day. One meal normally consists of a main dish, a vegetable dish, and two different desserts. The astronauts can choose from 10 main dishes, 8 vegetable dishes, and 13 desserts. How many different meals are possible?
5 In a class of 20 people there are 13 girls and 7 boys. Find the number of ways to form a committee of 8 members if
(i) there are no restrictions,
(ii) the committee is made up of all girls,
(iii) there is exactly 1 boy in the committee,
(iv) there are less than 2 boys in the committee.
61
6 In a box there are 3 green, 5 red, 7 yellow and 6 blue balls. The balls are identical except for the colours. Find the number of ways to select 2 balls of different colours.
B Discussion Questions
1 (a) How many different ways can three of the letters of the word BYTES be chosen and written in a row?
(b) How many different ways can this be done if the first letter must be “B”?
2 Janet has 10 different books that she is going to put on her bookshelf. Of these, 4 are Chemistry books, 3 are Biology books, 2 are Statistics books, and 1 Physics book. Janet wants to arrange her books so that all the books dealing with the same subject are together on the shelf. How many different arrangements are possible?
3 In how many ways can three distinct letters and two distinct digits be arranged if
(i) there is no restriction, (ii) the letters must come first, (iii) the digits must always be together.
4 Find the number of distinguishable ways the word STATISTICS can be arranged
(i) without conditions, (ii) if the letter “T”s must be together,
(iii) if no two “T”s are together.
5 A sample of 5 mice is to be chosen from 7 male and 6 female mice. In how many ways can the sample be selected if it must have at least 2 male and 1 female mice?
6 A shipment of 10 microwave ovens contains two defective units. In how many ways can a restaurant buy three of these units and receive
(a) no defective units? (b) one defective unit? (c) at least two non-defective units?
62
7 Four sales representatives for a company are to be chosen to participate in a training program. The company has eight sales representatives, two in each of four regions. In how many ways can the four representatives be chosen if
(a) there are no restrictions? (b) the selection must include a sales representative from each region? (c) the selection must be from only two of the four regions?
8 There are 10 students who are going to spend the evenings in 2 groups; one group goes to the Library and the other plays football. In how many ways can the group for football be selected if there must be at least 4 people in each group?
63
Answers
A1 i 40320 ii 1680
A2 i 120 ii 216
A3 a 60 b 239 500 800
A4 6240
A5 i 125970 ii 1287 iii 12012 iv 13299
A6 161
B1 a 60 b 12
B2 6912
B3 i 120 ii 12 iii 48
B4 i 50400 ii 3360 iii 23520
B5 1155
B6 a 56 b 56 c 112
B7 a 70 b 16 c 6
B8 672
64
Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 4 : Probability
Objectives :
At the end of this lesson, the student should be able to: 1 describe probability experiments 2 calculate classical and conditional probabilities 3 distinguish between independent and dependent events 4 apply the multiplication rule 5 identify mutually exclusive events 6 apply the addition rule
Topic 4: Probability
65
4.1 Sample space and events
• Suppose we throw a die once and look at the number that appears on the top face.
The set of all possible outcomes is {1, 2, 3, 4, 5, 6}. Suppose we are interested
in getting an even number, then the set of outcomes of this event is {2, 4, 6}.
• Sample space is the set of all possible outcomes in an experiment. Event is a subset of the sample space.
Example 4.1-1
An unbiased coin is tossed three times, list out the sample space and the event in which there is exactly one head.
Solution:
4.2 Probability of single events
• Probability is a measure of how likely an event will happen in an experiment.
• In an experiment, if each outcome in the sample space is equally likely to happen:
P =no. of outcomes in event(Event)
no. of outcomes in sample space
66
Example 4.2-1 (List of outcomes)
Following Example 4.1-1, calculate the probability that there is exactly one head.
Solution:
Example 4.2-2 (Frequency Table)
The table below shows the gender and blood pressure categories of 300 participants.
Blood Pressure Female Male Row Total Normal 39 25 64
Pre-hypertension 61 50 111 High Stage 1 42 47 89 High Stage 2 20 16 36
Column Total 162 138 300
A participant is randomly chosen. Calculate the probability that
(a) the participant is male,
(b) the participant has high stage 2 blood pressure.
Solution:
67
Example 4.2-3 (Counting)
A four-member research team is to be chosen from 6 men and 5 women. What is the
probability that the team formed has more men than women?
(Refer to Example 3.4-1)
Solution:
Number of teams without restriction = 330
Number of teams with more men than women = 115
P (team has more men than women) =
4.3 Probability involving multiple events
4.3.1 Complement Event
• Given an event E, its complement event, 'E is the set of outcomes in the sample
space that is not in E.
• ( ) 1 ( ')P E P E= −
Example 4.3.1-1
Referring to Example 4.2-3, find the probability that the team formed has at most two
men.
Solution:
Let E = { team of 4 has at most two men }. Observe that E’ = { team of 4 has more men
than women }.
( )P E∴ =
E 'E
68
4.3.2 Intersection and union of two events
• Let A, B be two events. The intersection of them is known as “A and B” (notation:
A B∩ ) refers to the set of outcomes that is common to both A and B.
• The union of A and B is known as “A or B” (notation: A B∪ ) refers to the set of
outcomes that is either in A or B.
• Addition formula: ( ) ( ) ( ) ( )P A B P A P B P A B∪ = + − ∩ .
Example 4.3.2-1
Refer to the table in Example 4.2-2, find the probability that
(i) a participant is a female and has high stage 2 blood pressure,
(ii) a participant is a male or has pre-hypertension.
Solution:
4.3.3 Mutually exclusive events
A B A B∩
A B A B∪
69
• Two events A and B are mutually exclusive if they share no common outcome.
I.e. ( ) 0P A B∩ = .
Example 4.3.3-1
Suppose we draw a card from a standard deck of poker cards. Find the probability that the card is a “4” or an ace.
Solution:
4.3.4 Conditional events
• The conditional event A given B (notation: |A B ) refers to the event that A will
occur based on the knowledge that B has occurred.
• For example, we draw two cards from a deck of 52 poker cards without
replacement. Let B be the event that the first card is an ace of heart, A be the event
that the second card drawn is an ace. Then |A B = {ace of spade, ace of diamond,
ace of club}
• ( )( | )( )
P A BP A BP B∩
= .
Example 4.3.4-1
B A
70
Two ordinary dice are thrown. Let A be the event that the numbers shown on both dice
are equal, B be the event that the total sum of the two numbers is 8. Calculate ( | )P A B
and ( | )P B A .
Solution: A = { (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6) }
B = { (2, 6), (6, 2), (3, 5), (5, 3), (4, 4) }
A B∩ = { (4, 4) }
( | )P A B =
4.3.5 Independent Events • Two events A and B are independent if the probability of one event occurring does
not affect the probability of the other event occurring.
• Mathematically, A and B are independent if and only if either condition holds
(a) ( ) ( ) ( )P A B P A P B∩ =
(b) ( | ) ( )P A B P A=
Example 4.3.5-1 The probability of a successful appendicitis operation is 98%. Find the probability that
(i) out of three operations, all are successful.
(ii) out of two operations at least one is unsuccessful.as
Solution: Assume that the outcomes of the operations are independent of each other.
71
4.4 Tree diagram and multiplication rule
• Recall from Section 4.3.4: ( )( | )( )
P A BP A BP B∩
= . We can rearrange the terms to
obtain ( ) ( | ) ( )P A B P A B P B∩ = ∗
This is also known as the multiplication rule.
• A tree diagram is useful to calculate probabilities involving experiments happening
in stages with multiple events happening one after another. For example, we want
to calculate the event whether a person smoke followed by whether he or she has
lung cancer.
• An example of tree diagram looks like this:
We can use the tree diagram to compute various probabilities such as ( ) ( )* ( | )P A B P A P B A∩ =
[ ] [ ]( ) ( ) ( ' )
( )* ( | ) ( ')* ( | ')P B P A B P A B
P A P B A P A P B A= ∩ + ∩
= + ∗
(i.e. add up all the “branches” leading to event B).
Example 4.4.1
A
'A
'B
B
B
'B
( )P A
( ')P A
( | )P B A
( ' | )P B A
( | ')P B A
( ' | ')P B A
72
15% of Singaporean adult smokes cigarettes. It is found that 62% of the smokers and
12% of non-smokers develop lung problem by age 60.
(a) Find the probability that a randomly selected 60-year adult has lung problem.
(b) Given that a randomly selected 60-year adult has lung problem, what is the
probability that he smokes?
Solution:
(a) ( )P =lung problem (b) ( )P smokes lung problem =
Example 4.4-2
smokes 0.15
0.85
0.62
0.38
0.12 Doesn’t smoke
Lung problem
No lung problem
Lung problem
No lung problem 0.88
73
Machine A, B and C makes components. Machine A makes 20% of the components,
machine B makes 30% of the components and machine C makes the rest. The
probability that a component is faulty is 0.07 when is made by machine A, 0.06 when
made by machine B and 0.05 when made by machine C. A component is picked at
random. Calculate the probability that the component is
(a) made by machine A and is faulty.
(b) made by machine B given that it is faulty.
Solution:
74
4.5 More probability examples (Self Practice) Example 4.5-1 The blood samples given by donors over one week were being catalogued according
to the types of blood, including the positive and negative Rhesus factor. The 2 by 4
matrix of Rhesus factor against the blood type is given below:
Blood Type
O A B AB Total
Rhesus
Factor
Positive 156 139 37 12 344
Negative 28 25 8 4 65
Total 184 164 45 16 409
Find the probability that a randomly selected donor has
(i) Atype blood,
(ii) positive Rhesus factor,
(iii) Atype blood and is positive Rhesus factor,
(iv) Otype blood or is negative Rhesus factor,
(v) Btype blood given that it is positive Rhesus factor,
(vi) ABpositive Rhesus factor given that it is type blood.
Solution:
Ans: (i) 164409
, (ii) 344409
, (iii) 139409
, (iv) 221409
, (v) 37344
, (vi) 34
75
Example 4.5-2 A, B and C are three random events. A Band are mutually exclusive , A and C are
independent. ( ) ( ) ( ) ( )1 1 7 23 5 10 15 60
P A P B P A C P B C= = = =, , or and or are given.
(a) Find ( ) P A Bor , ( )P C ,and ( ) .P B Cand
(b) State whether B and C are independent.
Solution: (a) ( )A B P A B⇒ ∩ =and are mutually exclusive
( ) P A B =or
( ) ( )* ( )A C P A C P A P C⇒ ∩ =and are independent
(b) ( and )P B C =
( )* ( )P B P C =
Ans: (a) ( ) 3 10
P A B =or , ( ) 13
P C = , ( ) 1 20
P B C =and , (b) not independent
Tutorial 4: Probability
76
A Self Practice Questions
1 A quiz has 3 true/false questions. Suppose you are randomly selecting the answers and have equal chance of being correct for each question. Let CCW indicate that you were correct on the first two questions and wrong on the third. (a) List the sample space. (b) List the possible outcomes with at least two questions answered
correctly. 2 A pair of unbiased die is tossed. Find the probability of getting
(i) a total of 7; (ii) at most a total of 6.
3 Given that 3 1( ) , ( )7 3
P A P B= = and 1( )9
P A B∩ = . Find
(i) ( ')P A (ii) ( )P A B∪ (iii) ( ' ')P A B∩ (iv) ( ')P A B∩ (v) ( | )P A B
4 A group of files in a medical clinic classifies the patients by gender and by type of diabetes (I or II). The groupings may be shown as follows. The table gives the number in each classification.
Type of Diabetes
I II Gender Male 25 20
Female 35 20
If one file is selected at random, find the probability that the individual is a (a) female. (b) Type II. (c) Type II, given that the patient is a male. (d) Are the events “Type II” and “a male” independent? (e) Are the events “Type I” and “a female” mutually exclusive?
5 A study showed that one out of every ten women will get breast cancer. Among
those who do, one out of four will die of it. (i) Complete the tree diagram below.
77
(ii) Calculate the probability that a randomly chosen woman get breast cancer and not die of it.
B Discussion Questions
1 In a group of 10 persons, 4 have a type A personality and 6 have a type B personality. If two persons are selected at random from this group, what is the probability that the two will have different personality type?
2 If 3 books are picked at random from shelf containing 6 novels, 5 cook books and 1 computer book, what is the probability that (a) the computer book is selected? (b) 2 novels and 1 cook book are selected?
3 In a road show, the compere holds a bag containing 4 movie tickets and 6 concert tickets. 4 tickets are to be drawn at random and given away to 4 lucky winners on stage. Find the probability that (a) all 4 drawn are concert tickets. (b) 4 tickets are not of the same type. (c) at least 2 movie tickets are drawn.
4 Independent events A and B are such that ( ) ( )P A P B p= = and ( ) 59
P A B∪ = .
Find p and ( )P A B∩ .
5 Events A and B are such that 1( )3
P A = , 1( | )4
P B A = and 1( ' ')6
P A B∩ = . Find
(i) ( )P A B∪ ,
Has breast cancer
Does not have breast cancer
Dies from cancer
Does not die from cancer
110
78
(ii) ( )P B .
6 The probability that a family owns a car is 0.48, that it owns a 5-room flat is 0.35, and that it owns both a car and a 5-room flat is 0.21. What is the probability that a randomly selected family owns a car or a 5-room flat?
7 1000 people were randomly selected and they were asked whether they are right-handed or left-handed. The following table shows the result of the survey:
Men Women Left-handed 63 50 Right-handed 462 425
(a) A person is selected at random from the sample. Find the probability that
the person is (i) left-handed or a woman; (ii) right-handed or a man; (iii) not right handed given the person is a man;
(iv) right-handed woman. (b) Are the events “being right-handed” and “being a woman” mutually
exclusive? Explain. 8 Two thousand randomly selected adults were asked if they think they are better
off financially than their parents. The following table gives the two-way classification of the responses based on the education levels of the adults and whether they are financially better off, the same, or worse off than their parents.
Primary Secondary Tertiary Better off 140 450 420 Same 60 250 110 Worse off 200 300 70
Suppose one adult is selected at random from these 2000 adults. Find the probability that the adult is (i) better off and has secondary education, (ii) not the same financially, (iii) worse off or has primary education, (iv) not better off given secondary education.
79
9 The table below shows the results of a survey of the 120 cars in a carpark, in which the colour of each car and the gender of the driver were recorded.
Male Female Green 18 12 Blue 48 22 Red 6 14
One of the cars is selected at random.
M is the event that the car selected has a male owner.
G is the event that the car selected is green.
B is the event that the car selected is blue.
R is the event that the car selected is red.
Find the following probabilitites:
(i) ( )P M B∪ , (ii) ( | ')P M R . (iii) Determine whether the events M and G are independent, justifying your
answer.
10 A shipment of two boxes, each containing 6 calculators is received by a store. Box 1 contains one defective calculator and box 2 contains two defective calculators. After the boxes are unpacked, a calculator is selected and found to be defective. Find the probability that it came from box 2.
11 A certain virus infects 0.5 % of the population. A test will be positive 80% of the
time if the person has the virus and 5 % of the time if the person does not have the virus. Suppose A is the event “the person is infected” and B is the event “the person tests positive”.
(a) Draw a tree diagram to show the outcomes of the tests.
(b) Find the probability that
(i) the person is infected and is tested positive,
80
(ii) the person is tested positive.
12 Two children, Tan and Mui, are each to be given a pen from a box containing 3 red pens and 5 blue pens. One pen is chosen at random and given to Tan. A green pen is then put in the box. A second pen is chosen at random from the box and given to Mui.
(i) Draw a tree diagram to represent the possible outcomes.
(ii) Find the conditional probability that Mui’s pen is blue, given that Tan’s pen is red.
(iii) Find the probability that Mui’s pen is red.
(iv) Find the conditional probability that Tan’s pen is red, given that Mui’s pen is blue.
Answers
A1 i { CCC, CCW, CWC, WCC, CWW, WCW, WWC, WWW }
ii { CCC, CCW, CWC, WCC }
A2 i 16
ii 512
81
A3 i 47
ii 4163
iii 2263
iv 2063
v 13
A4 a 1120
b 25
c 49
d No e No
A5 ii 340
B1 815
B2 a 14
b 1544
B3 a 114
b 97105
c 2342
B4 13
; 19
B5 a 56
b 712
B6 0.62
B7 ai 269500
aii 1920
aiii 325
iv 1740
b No
B8 i 940
ii 79100
iii 77200
iv 1120
B9 i 4760
ii 3350
iii Independent
B10 23
B11 bi 0.004 bii 0.05375
B12 ii 58
iii 2164
iv 37
82
Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 5 : Discrete Probability Distribution
Objectives :
At the end of this lesson, the student should be able to: 1 define random variables 2 distinguish between discrete and continuous random variables 3 define a discrete probability distribution 4 compute the mean, variance and standard deviation of a discrete random
variable
83
Topic 5: Discrete Probability Distribution
5.1 Random variable
• A variable is an alphabetical representation of a quantity that can take on various numerical values.
• A random variable, usually denoted by X, Y, …, is a variable that takes on different values due to random phenomenon (or by chance).
• A random variable can be discrete or continuous (refer to Chapter 1).
Example 5.1-1
(a) A coin is tossed three times.
If X is a random variable representing the number of heads, then
0 1 2 3X = , , ,
There can be no head or 1 head or 2 heads or 3 heads in the three toss.
(b) Supposing Y is a random variable representing the time a sales person
spends on making calls per day.
The time spends on making calls can be any value (e.g. 2.4 minutes, 49.5
minutes, etc), Y is said to be continuous random variable.
The values of a continuous random variable can be represented as an interval
on a number line.
0 3 6 9 12 15 18 21 24
84
5.2 Discrete probability distribution
• We may not know the exact value of a random variable at any specific moment.
However we may calculate the likelihood (probability) that a random variable may
take a specific value.
• A probability distribution is a table or an equation that links each value of a
random variable with its probability of occurrence. The probability distribution of a
discrete random variable may be represented using a table.
Example 5.2-1
A fair coin is tossed twice. If X is a random variable representing the number of heads, then construct the probability distribution for X.
Solution:
( 0)P X = = 1 1( )2 2
P TT = × =
( 1)P X = =1 1( ) ( ) 22 2
P TH P HT+ = × × =
( 2) ( )P X P HH= = =1 12 2× =
A probability distribution must satisfy the following conditions:
(a) 0 ( ) 1P X k≤ = ≤ for all values of k ,
(b) ( ) 1all k
P X k= =∑ (sum of all probabilities is 1).
X k= 0 1 2
( )P X k=
85
Example 5.2-2
Explain whether each of the following is a discrete probability distribution function.
X k= 5 6 7 8
( )P X k= 116
58
14
116
X k= 1 2 3 4 ( )P X k= 0.09 0.36 0.49 0.05
Solution:
(a) It is a discrete probability distribution function since
(i) ( )____ ______P X k≤ = ≤
(ii) ( ) ( ) ( ) ( ) ( )8
55 6 7 8 _________
kP X k P P P P
=
= = + + + =∑
(b) It is not a discrete probability function since
( )4
1______ ______ ______ ______ 1
kP X k
=
= = + + + ≠∑
5.3 Mean and Variance of a discrete probability distribution
• In Chapter 1 (Section 1.3.1 and 1.4.2), we learnt to calculate the mean and
variance for a set of data values.
• In this chapter, we will learn to calculate the theoretical population mean µ and
population variance 2σ from a discrete probability distribution.
• The expectation of a random variable (or expected value) is the same as the
population mean.
(a)
(b)
86
( )kP X kµ = =∑
( )2 2 2k P X kσ µ = ⋅ = − ∑ or ( ) ( )22 k P X kσ µ= − =∑
Example 5.3-1
Find the mean, variance and standard deviation of the random variable in the following probability distribution:
X k= 1 2 3 4 5 ( )P X k= 0.16 0.22 0.28 0.20 0.14
Solution:
Mean, ( ) 1(0.16) 2(0.22) 3(0.28) 4(0.20) 5(0.14)kP X kµ = = = + + + + =∑
( )2 2 2 2 2 21 (0.16) 2 (0.22) 3 (0.28) 4 (0.20) 5 (0.14)k P X k= = + + + + =∑
Variance, 2σ = ( )2 2k P X k µ= − =∑
Standard deviation, σ =
87
Example 5.3-2
The random variable X represents the number of defective tires. The probability
distribution of X is given below:
k 0 1 2 3 4 ( )P X k= m 0.16 0.06 0.04 0.20
(a) Find the value of m . (b) Compute
(i) the expectation of X,
(ii) the standard deviation of the distribution.
Solution:
(a) For a probability distribution, 0.16 0.06 0.04 0.2 1m + + + + =
m =
(bi) ( ) ( )E X kP X k= = =∑
(bii) 2σ = ( )2 2k P X k µ= − =∑
Standard deviation, σ =
Example 5.3-3
The following table shows the distribution of household sizes in a small town.
k 1 2 3 4 5 6
( )P X k= 0.266 0.330 0.166 0.140 0.064 0.034
(i) Show that the distribution is a probability distribution.
(ii) What is the expected size of a household in the town?
Solution:
(i) Since 0.266 0.330 0.166 0.140 0.064 0.034+ + + + + =
(ii) ( ) ( )E X kP X k= = =∑
88
Tutorial 5: Discrete Probability Distribution
1 Randomly selected households from a particular estate were asked on the number of children they have and the following frequency distribution shows the result of the survey:
Number of children 0 1 2 3 Households 300 280 95 20
(a) Construct a probability distribution table. (b) Let X denotes the number of children from the particular estate. Find the
following probabilities:
(i) P (X = 1) (ii) P (X ≥ 2)
(iii) P (X < 1) (iv) (1 3)P X≤ ≤
2 An electrical appliance company offers its customers a number of different instalment plans. Let the random variable X represents the number of instalments for a randomly selected customer and the probability distribution for X is given below:
x 6 12 24 36
( )P X x= 0.20 0.30 k 0.15
(a) Find the constant value, k.
(b) Find the mean of the distribution, X.
3 Let the random variable X be the number of errors that a randomly selected page of a book contains. The following table lists the probability distribution of X.
x 0 1 2 3 4 ( )P X x= 0.73 0.16 k 0.04 0.01
Find the value of k ; hence, find the mean and standard deviation of X.
89
4 A charity organisation is selling $4 raffle tickets as part of a fund-raising programme. The first prize is a computer valued at $3150, and the second prize is a vacuum cleaner valued at $450. The remaining 15 prizes are $25 gift vouchers. The number of tickets sold is 5000.
(a) Find the expected net gain to the player for one play of the game.
(b) Interpret your answer to part (a).
Answers
1(a)
No. of children 0 1 2 3
No. of households 60/139 56/139 19/139 4/139
(b) (i) 0.403 (ii) 0.165 (iii) 0.432 (iv) 0.568
2 (a) 0.35 (b) 18.6 months
3 k = 0.06; mean = 0.44; standard deviation = 0.852
4 $3.21−
90
Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/ H/J/M207 Topic 6 : Binomial and Poisson Distributions Objectives :
At the end of this lesson, the student should be able to: 1 list the conditions of a binomial experiment 2 explain the binomial probability function ( ) n x n x
xP x C p q −=
3 calculate the mean, variance and standard deviation of a binomial distribution
4 use the binomial probability distribution in problem solving
5 define Poisson random variable
6 list the conditions of the Poisson distribution
7 explain the Poisson probability function ( )!
xeP xx
λλ −
=
8 calculate the mean and variance of a Poisson distribution 9 use the Poisson probability distribution in problem solving
91
Topic 6: Binomial and Poisson Distributions 6.1 Binomial Distribution • Suppose we throw a die ten times and we want to create a probability distribution
for the number of times “6” appear. It is reasonable to make the following
assumptions:
(a) Each trial has only two outcomes: success or failure (a “6” or not a “6”).
(b) The outcome of each trial is independent of other trials. (The next throw is not affected by previous throws.)
(c) The probability of success, p, for each trial is the same
(e.g. 1( "6")6
p P= =get a ).
• In general if we conduct an experiment for n trials and the experiment satisfies the
conditions (a) to (c), then we can model the number of successes using a Binomial Distribution.
• Notation: Let X be denote the number of successes in a Binomial experiment with n trials
and p = probability of success for each trial. Then
~ ( , )X B n p
( )( ) 1 n kkn kP X k C p p −= = − , 0,1,2,...,k n=
** Note that X is a discrete random variable.**
92
Example 6.1-1 A certain surgical procedure has an 85 % chance of success. A doctor performs the
procedure on eight patients. The random variable X represents the number of
successful surgeries. State the distribution of X .
Solution: Let X be the number of successful surgeries out of 8 patients. n = 8, p = 0.85
~X
Example 6.1-2 A jar contains 5 red marbles, 9 blue marbles, and 6 green marbles. You randomly
select 3 marbles from the jar, without replacement. The random variable X
represents the number of red marbles. Explain whether X is a Binomial random
variable.
Solution: P ( first marble is red ) =
Since the probability of obtaining a red marble for each draw is
___________________
X is NOT a Binomial random variable. Example 6.1-3 Microfracture knee surgery has a 75 % chance of success on patients with
degenerative knees. The surgery is performed on three patients. Find the probability
of the surgery being successful on exactly two patients.
Solution: Let X be the number of successful surgery out of 3 patients.
~ ( , )X B
( 2)P X = =
93
Example 6.1-4 Childhood asthma is a public health problem in country A. It is known that one out of
10 children in country A has asthmatic problems. In a randomly chosen group of 14
children from the population, what is the probability that
(a) 3 has asthmatic problems?
(b) 1 or less has asthmatic problems?
(c) more than 1 has asthmatic problems?
Solution: Let X be the number of children with asthmatic problems out of 14 children.
~ ( , )X B
(a) ( 3)P X = =
(b) ( 1) ( 0) ( 1)P X P X P X≤ = = + = =
(c) ( 1) 1 ( 1)P X P X> = − ≤ =
6.2 Mean and Variance of a Binomial Distribution • For a Binomial random variable, ~ ( , )X B n p
Expectation or population mean, npµ =
Population Variance, 2 (1 )np pσ = −
94
Example 6.2-1 5 % of workers at construction sites are known to suffer from hearing impaired problem
due to the unhealthy noise level. If we randomly select 28 workers from construction
sites, find
(a) the probability that exactly 4 of them suffer from hearing impaired problem ,
(b) the mean and standard deviation of the number of workers suffering from
hearing impaired problem.
Solution: Let X be the number of workers with hearing impaired problems out of 28 workers.
~ ( , )X B
(a) ( 4)P X = =
(b) ( )E X =
2( ),Var X σ =
Standard deviation, σ =
Example 6.2-2 The random variable X which follows a Binomial distribution is such that the mean is
2 and variance is 2413
. Find the values of n and p.
Solution: ~ ( , )X B
( ) 2E X np= = --- (1) 24( ) (1 )13
Var X np p= − = --- (2)
95
6.3 Poisson Distribution • Suppose in a country it is known that a cyclone will arrive at a rate of 1.5 times
every 2 years. We want to create a probability distribution on the number of times
a cyclone arrives in a specific time period. It is reasonable to make the following
assumptions: (a) The mean rate of events, µ , occurring in an unit interval / region is the same
for every other unit interval / region.
(E.g. Mean rate of cyclones arriving is the same across any interval of 2 years.).
(b) Events occurring in an interval / region are independent of events occurring in
other non-overlapping intervals/ regions. (E.g. The number of cyclones in year 2013 to 2014 is independent of the number
of cyclones in year 2011 to 2012.) (c) No two events can occur at the same time.
(E.g. we assume that no two cyclones can happen together.)
• In general if we are counting the number of events occurring in an interval / region
and conditions (a) to (c) are satisfied, we can model the number of events occurring
using a Poisson Distribution.
• Notation: Let X be denote the number of events occurring in an interval / region with a mean
rate µ . Then
~ ( )X Po µ
( )!
k
P X k ek
µ µ−= = , 0,1,2,...k =
** Note that X is a discrete random variable.**
96
Example 6.3-1 The mean number of accidents per month at a certain intersection is 3. What is the
probability that in any given month,
(a) 4 accidents will occur at this intersection?
(b) more than 1 accidents will occur at this intersection?
Solution: Let X be the number of accidents per month.
~ ( 3 )X Po
(a) 4
3 3( 4)4!
P X e−= = =
(b) ( 1) 1 ( 0) ( 1)P X P X P X> = − = − = =
Example 6.3-2 2000 brown trout are introduced into a small lake. The lake has a volume of 20000
cubic meters. Find the probability that
(a) 3 brown trout are found on any given cubic meter of the lake.
(b) less than 2 brown trout are found on any 10 cubic meters of the lake.
Solution: Let X be the number of trouts per cubic meter of lake.
~ ( )X Po
(a) ( 3)P X = =
(b) Let Y be the number of trouts per 10 cubic meters of lake.
~ ( )Y Po
( 2) ( 0) ( 1)P Y P Y P Y< = = + = =
97
6.4 Mean and Variance of Poisson Distribution
• For a Poisson random variable, ~ ( )oX P µ
Expectation or population mean µ=
Population Variance, 2σ µ=
Example 6.4-1 A school “Lost and Found” department receives an average of 3.7 reports per week of
lost student ID cards.
(a) Find the probability that at most 2 such reports will be received during a given
week by this department.
(b) Find the probability that there will be 1 to 3 (inclusive) such reports received
during a given week by this department.
(c) Find the variance and standard deviation of the probability distribution.
Solution: Let X be the number of reports per week.
~ ( )X Po
(a) ( 2) ( 0) ( 1) ( 2)P X P X P X P X≤ = = + = + = =
(b) (1 3) ( 1) ( 2) ( 3)P X P X P X P X≤ ≤ = = + = + = =
(c) ( ) ( )E X Var X= =
98
Appendix Binomial and Poisson Distributions using Excel • Under the tab “Formulas” “More Functions” “Statistical” there are 2 options to
calculate probabilities for Binomial and Poisson distributions.
(a) BINOM.DIST: For Binomial distribution.
(b) POISSON.DIST: For Poisson distribution.
99
• To compute ( 4)P X = and ( 4)P X ≤ , given that ~ (6,0.3)X B . Select BINOM.DIST
For ( 4)P X = : Key “FALSE” under the “CUMULATIVE” option.
For ( 4)P X ≤ : Key “TRUE” under the “CUMULATIVE” option.
( 4) 0.0595P X = = , ( 4) 0.989P X ≤ =
• To compute ( 4)P X = and ( 4)P X ≤ , given that ~ (1.8)X Po . Select POISSON.DIST
Under the option “MEAN”, enter 1.8 . The rest are the same as the Binomial
distribution shown above.
100
Tutorial 6: Binomial and Poisson Distributions
A Self Practice Questions
A.1 Binomial Distribution
1 Given that ~ (10,0.3)X B , calculate the following:
(i) mean, µ (ii) variance, 2σ
(iii) ( 4)P X = (iv) ( 2)P X ≤
(v) ( 2)P X < (vi) ( 2)P X ≥
(vii) ( 8)P X >
2 If X~B 4,5
n
and ( ) 1 015625
P X = = ,
(a) How many outcomes are there in each trial?
(b) How many trials are there?
(c) How many possible values that X can take.
(d) Find the mean and standard deviation of this distribution.
A.2 Poisson Distribution
3 Given that ~ (3)X Po , calculate the following:
(i) mean, µ (ii) variance, 2σ
(iii) ( 4)P X = (iv) ( 2)P X ≤
(v) ( 3)P X ≥
4 The number of calls arriving, X, is Poisson distributed with a rate of 2 per hour. Write the distribution of the number of calls arriving in
(i) 3 hours, (ii) 45 minutes.
101
B Discussion Questions
1 10% of drivers do not wear seat-belts. Find the probability that, in the next 10 cars to pass, less than 2 drivers will not be wearing seat-belts.
2 A telephone enquiry service is so busy that only 80% of calls to it are
successfully connected. It may be assumed that all calls are independent. Twelve calls are made at random to the service. Find the probability that at least 10 are successfully connected.
3 The probability of a patient recovering from a heart operation is 0.85. In a particular hospital, 10 patients went through such an operation in a particular month. What is the probability that
(i) exactly 4 survive the operation? (ii) the actual number of survivors is more than the expected value? (iii) exactly 2 do not survive the operation?
4 Coach A has four wheels and equipped with two spare tires, and coach B has six wheels and equipped with three spare tires. These coaches travel from town A to town B independently. The probability that a tire needed to be replaced during the journey is 0.1.
(i) State an assumption required for the Binomial distribution to be a suitable model.
(ii) Determine whether coach A or coach B has the higher probability for a successful journey.
5 On average, a household receives 1.8 junk mails per day. Using the Poisson
formula, find the probability that a randomly selected household receives
(a) exactly 3 junk mails on a certain day,
(b) at most 2 junk mails on a certain day.
6 A budget airline receives an average of 9.7 complaints per day from its passengers. Using the Poisson formula, find the probability that on a certain day this airline will receive
(a) exactly 5 complaints.
(b) at least 3 complaints.
102
7 A customer service department receives an average of 1.6 telephone calls in any 10-minute interval. Find the probability that the department receives
(a) no calls in any 10-minute interval.
(b) at most 1 calls in any 5-minute interval.
(c) more than 2 calls in any 15-minute interval.
Answers
A1 i 3, ii 2.1, iii 0.200, iv 0.383, v 0.149
vi 0.851 vii 0.000144
A 2 a 2 b 6 c 7 d 4.8, 0.980
A 3 i 3 ii 3 iii 0.168 iv 0.423 v 0.577
A 4 i ~ (6)X Po ii 6~4
X Po
B 1 0.736 B2 0.558
B 3 i 0.00125 ii 0.544 iii 0.276
B 4 i independent ii Coach B
B 5 a 0.161 b 0.731
B 6 a 0.0439 b 0.996
B 7 a 0.202 b 0.809 c 0.430
103
Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 7 : Normal Distribution
Objectives :
At the end of this lesson, the student should be able to: 1 describe the characteristics of a normal distribution including its shape and the
relationship among its mean, median and mode 2 define normal random variable and standard normal random variable 3 compute normal probabilities using standard normal tables 4 use the normal probability distribution to approximate the binomial probabilities
(including correction for continuity)
104
Topic 7: Normal Distribution
7.1 Introduction
• Many continuous random variables can be modelled using the normal distribution. Examples include:
(a) Students’ examination scores,
(b) Height and weight of people. • The normal distribution can be described using two features: mean µ and
variance 2σ . Notation: 2~ ( , )X N µ σ .
• The normal distribution can be represented using a bell – shaped curve with the
properties:
o The curve is symmetrical about the mean.
o The mean, median, mode are the same.
o Approximately 95% of the distribution lies within 2 standard deviations of the mean. This is sometimes known as the ‘2σ rule’.
Mean, median, mode
~ (1,1)X N ~ (2,6)X N
1µ = 2µ =
105
7.2 Probability for normal distribution
• For normal distribution, the probability is interpreted as area under the curve. For example ~ (1, 2)X N :
**Note that ( ) 0P X k= = . Hence for normal distribution, ( ) ( )P X k P X k≤ = < .**
7.3 Standard normal random variable
• A normal random variable can have many different values of mean and variance. When the mean is 0 and variance is 1, we call it a standard normal random variable, denoted as Z.
~ (0,1)Z N
• To convert from 2~ ( , )X N µ σ to ~ (0,1)Z N , we apply the formula: XZ µσ−
=
This procedure is also known as standardization.
( 1.5)P X < ( 0.5)P X >
1µ = 1.5 1µ =0.5
(0.2 1.8)P X< <
1µ = 1.80.2
( 0.7)P X =
1µ =0.7
106
Example 7.3-1
Given that ~ (2,5)X N , rewrite the following probabilities in the form ( ).P Z k≤
(a) ( 3)P X ≤ , (b) ( 1.5)P X ≥ , (c) (1.5 3)P X< <
Solution:
(a) ( )2 3 2( 3) 0.455 5
XP X P P Z− − ≤ = ≤ = ≤
(b) ( 1.5)P X ≥ =
(c) (1.5 3) ( 3) ( 1.5)P X P X P X< < = < − ≤ =
7.4 Standard Normal Table
• To calculate probabilities involving normal distribution, we will obtain the probability value via the standard normal table (on pages 112 and 113).
Step 1: Apply standardization from 2~ ( , )X N µ σ to ~ (0,1)Z N .
Step 2: Ensure the probability is expressed in the form ( ).P Z k≤
Step 3: Obtain the required probabilities’ value from the standard normal table.
• The following example illustrates how the standard normal table is to be read:
(a) Suppose we want to find ( 0.52)P Z ≤ :
( 0.52) 0.6985P Z ≤ =
1st decimal place
2nd decimal place
Probability value
107
(b) Suppose we want to find the value of k such that ( ) 0.0020P Z k≤ = :
2.88k = −
Example 7.4-1 Let ~ (0,1)Z N . Use the standard normal table on pages 112 and 113 to evaluate the
following probabilities:
(a) ( 0.99)P Z < − , (b) ( 1.06)P Z > , (c) ( 1.5 1.25)P Z− < < −
Solution: (a)
(b)
(c)
Example 7.4-2 Given the normally distributed variable X with mean 20 and standard deviation 4, find
(a) ( 28)P X >
(b) (17.5 22.5)P X< <
(c) the value of k such that ( ) 0.1539P X k> =
Probability value
1st decimal place
2nd decimal place
108
Solution:
(a) Step 1: Switch the inequality sign to " "< or " "≤
( )( 28) 1 28P X P X> = − ≤
Step 2: Convert to standard normal random variable, Z
( ) 20 28 20( 28) 1 28 1 1 ( 2)4 4
XP X P X P P Z− − > = − ≤ = − ≤ = − ≤
Step 3: Obtain the probability value from standard normal table
( ) 20 28 20( 28) 1 28 1 1 ( 2)4 4
1 0.9772 0.0228
XP X P X P P Z− − > = − ≤ = − ≤ = − ≤
= − =
(b) (17.5 22.5) ( 22.5) ( 17.5)P X P X P X< < = < − ≤ =
(c) ( ) 0.1539 1 ( ) 0.1539 ( ) 0.8461P X k P X k P X k> = ⇒ − ≤ = ⇒ ≤ =
20 0.8461
4kP Z − ⇒ ≤ =
From standard normal table,
20 1.02 24.084
k k−= ⇒ =
109
Example 7.4-3 The serum cholesterol levels of a certain population of 40-year-olds male adults follow
approximately a normal distribution with mean 185 mg/dl and standard deviation 36
mg/dl. If a 40-year-old male adult is chosen at random from this population, what is the
probability that he has serum cholesterol level
(a) greater than 195 mg/dl ?
(b) less than 178 mg/dl ?
(c) between 178 and 195 mg/dl ?
Solution: Let X be the cholesterol levels of a 40 year old male
2~ (185,36 )X N
(a) ( ) 195 185( 195) 1 195 136
P X P X P Z − > = − ≤ = − ≤
=
(b) 178 185( 178)
36
P X P Z − < = ≤
=
(c) (178 195) ( 195) ( 178)P X P X P X≤ ≤ = ≤ − < =
110
Example 7.4-4 The weights of a certain batch of obese male recruits are approximately normally
distributed with mean 88 kg and standard deviation 9. The lightest 15% of the recruits
receive a classification of A whilst the heaviest 12.5% receive a classification of F.
Find
(i) the minimum weight required to obtain a classification of F,
(ii) the weight of the heaviest recruit in classification A.
Solution: Let X be the weight of a obese male recruit.
2~ (88,9 )X N
(i) Let m be the minimum weight to be in classification F.
( ) 0.125 ( ) 0.875 P X m P X m≥ = ⇒ < =
⇒
(ii) Let k be the largest weight to be in classification A.
( ) 0.15P X k≤ = ⇒
111
Standard Normal Table
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 -3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002 -3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003 -3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005 -3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007 -3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010 -2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014 -2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019 -2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026 -2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036 -2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048 -2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064 -2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084 -2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110 -2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143 -2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183 -1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233 -1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294 -1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367 -1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455 -1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559 -1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681 -1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823 -1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985 -1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170 -1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379 -0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611 -0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867 -0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148 -0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451 -0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776 -0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121 -0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483 -0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859 -0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247 0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
z z 0
112
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990 3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993 3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995 3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997 3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
z z 0
113
Appendix A Normal approximation to Binomial • Suppose ~ (100,0.45)X B and we wish to calculate ( 60)P X ≤ . If we use the
Binomial distribution then we have to calculate ( ), 0,1, 2,...,60P X k k= = before
adding up the probabilities. This is quite tedious and hence we may want to use
some approximation methods instead.
• Given ~ ( , )X B n p , if 30, 5, (1 ) 5n np n p> > − > then
~ ( , (1 ))X N np np p− approximately.
• Since Binomial random variable is discrete but a normal random variable is
continuous, we will need to do some adjustment to the calculation of the
probabilities known as continuity correction.
Step 1: Rewrite the probabilities into the form ( )P X k≤ , when ~ ( , )X B n p .
Step 2: Using the approximate normal distribution, calculate the probability when
we add 0.5 to k, i.e. using ~ ( , (1 ))X N np np p− , calculate ( 0.5)P X k≤ + .
Example A-1 A Binomial random variable is given by ( ) ~ 50 0.45X B , .
(a) State reasons why normal distribution can be used as an approximation.
(b) Find
(i) ( )14P X ≤
(ii) ( )26P X >
(iii) ( )15 26P X≤ ≤
114
Solution:
(a) Since n = 50 is large, 50*0.45 22.5 5np = = > , ( )1 50*0.55 27.5 5n p− = = > ,
X can be approximated using normal distribution.
(bi) mean of X = np = 22.5, variance of X = np(1 – p) = 12.375
( ) ~ 22.5 12.375X N , approximately
( ) ( ). 14.5 22.514 14.5
12.375
c cP X P X P Z − ≤ ≈ < = < =
(bii) ( ) ( ) ( ).
26 1 26 1 26.5c c
P X P X P X> = − ≤ ≈ − < =
(biii) ( )15 26 ( 26) ( 14)P X P X P X≤ ≤ = ≤ − ≤ =
Ans: (bi) 0.0116, (bii) 0.1271, (biii) 0.01155
115
Example A-2 Sing-Chip produces computer chips. On average, 2% of all computer chips produced
are defective. In a sample of 500 chips, the quality-control inspector accepts the batch
if less than 1% of the chips tested are defective.
(i) Explain why the number of defective computer chips, X can be approximated by
a normal distribution. Hence determine the mean and standard deviation of X
(ii) Use the normal approximation of X , find the probability that a batch is accepted.
Solution: (i) Let X be the number of defective chips.
Since n = 50 is large, 500*0.02 10 5np = = > , ( )1 500*0.98 490 5n p− = = > ,
X can be approximated using normal distribution.
Mean = 500*0.02 10np = = , variance = (1 ) 9.8np p− = ⇒
standard deviation = 9.8
(ii) ~ (10,9.8)X N approx.
Batch is accepted if there are less than 1% 500 5∗ = defects.
( ) ( ).
( 5) 4 4.5c c
P X P X P X< = ≤ ≈ < =
Ans: (i) 10, 3.13µ σ= = , (ii) 0.0392
~ (500,0.02)X B
116
Appendix B Normal Distribution using Excel • Under the tab “Formulas” “More Functions” “Statistical” there are 2 options
related to normal distribution.
When 2~ ( , )X N µ σ :
(a) NORM.DIST: Calculate probability value ( )P X k≤ , with k known.
(b) NORM.INV: Given the value of the probability ( )P X k≤ , find k.
• To compute ( 4)P X ≤ , given that ~ (3,5)X N . Select NORM.DIST
( 4) 0.673P X ≤ =
117
• To find the value of k such that ~ (3,5)X N and ( ) 0.388P X k≤ = , select
NORM.INV.
2.36k =
118
Tutorial 7: Normal Distribution
A Self Practice Questions
1 Let Z be a standard normal random variable. Use the normal table provided to find:
(i) ( 2.11)P Z < (ii) ( 0.35)P Z < −
(iii) ( 1.02)P Z > (iv) ( 0.99)P Z > −
(v) ( 0.35 2.11)P Z− < < (vi) ( 1.02 0.35)P Z Z> < −or
2 Given that ~ (3, 4)X N , use the normal table provided to find:
(i) ( 1)P X < (ii) ( 4)P X ≤
(iii) ( 0.5)P X > (iv) ( 3.5)P X ≥
(v) (1 4)P X≤ < (vi) ( 1 or 3.5)P X X< ≥
3 Let Z be a standard normal random variable, find m such that
(i) ( ) 0.9082P Z m< = (ii) ( ) 0.0096P Z m> =
4 Given that ~ (3, 4)X N , find m such that:
(i) ( ) 0.6217P X m≤ = (ii) ( ) 0.7734P X m> =
119
B Discussion Questions
1 The brain weights of a certain population of 18-year olds follow a normal distribution with mean 1380 gm and standard deviation 80 gm. Suppose an 18-year old is chosen at random, find the probability that the person’s brain weight is
(i) less than 1300 gm, (ii) more than 1400 gm, (iii) between 1320 and 1420 gm.
2 The random variable X has the distribution (1, 20)N . Find a such that ( ) 2 ( )P X a P X a< = > .
3 The masses of articles are normally distributed such that 4.36% are under 30
kg and 6.3% are over 60 kg. Calculate the mean and standard deviation of the distribution.
4 A recent survey on a group of adults shows that the average daily calories
intake of an adult is normally distributed with mean 1380 calories and standard deviation 320 calories. (i) Find the probability that an adult chosen at random from this group
consumes less than 1000 calories per day. (ii) What should be the recommended daily caloric intake if 90% of the group
has average daily calories below this recommended daily intake? (iii) If 12 000 adults participated in the survey, find the expected number of
people, to the nearest integer, to consume more than 1200 calories per day?
5 The mass, in kilograms, of an apple sold in a supermarket has a normal
distribution with mean 0.15 and standard deviation 0.03. Suppose the apples are sold at $9 per kilogram, find
(i) the probability that a single apple cost between $1.30 and $1.50;
(ii) the minimum price set for an apple such that the probability of an apple being sold for less than this minimum price is at least 0.9.
120
Answers
A1 i 0.9826 ii 0.3632 iii 0.1539 iv 0.8389
v 0.6194 vi 0.5171
A2 i 0.1587 ii 0.6915 iii 0.8944 iv 0.4013
v 0.5328 vi 0.560
A3 i 1.33 ii 2.34
A4 i 3.62 ii 1.50
B1 i 0.1587 ii 0.4013 iii 0.4649
B2 a = 2.92
B3 45.8, 9.26µ σ= =
B4 i 0.1170 ii 1789.6 iii 8548
B5 i 0.2876 ii $1.70
121
Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science
Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 8 : Distribution of sample means
Objectives :
At the end of this lesson, the student should be able to: 1 identify distribution of sample means 2 apply the Central Limit Theorem to find the probability of a sample mean for
sufficiently large samples
122
Topic 8: Distribution of Sample Means
8.1 Introduction of X
• In Chapter 1 we introduced the concept of sample mean, x , for a single sample
data set. In this case x is a single value. In this chapter we will look at the
sample means of multiple samples and the distribution of the sample means.
• Introduction example:
Let X denote the weight of a single peanut from a packet of peanuts. Suppose we
weigh each peanut in that packet, we can calculate the sample mean of the weight
of the peanuts, x , in that single packet.
1 packet:
Suppose we have n packets of peanuts and we will calculate the sample mean of
each packet of peanuts. The sample mean for a packet of peanut will most likely
be different from the sample mean of other packets.
Hence in general, the sample mean, X , is a random variable (as we cannot
determine the actual value of x for a randomly chosen sample.)
x
nx1x 2x ,...,,
123
8.2 Distribution of X
• Since X is a random variable, we can calculate probabilities involving X once
we know its distribution which is shown below:
• Let µ and 2σ be the mean and variance of a random variable X (single quantity).
If we have a sample of n objects,
(a) If 2~ ( , )X N µ σ , then 2
~ ,X Nnσµ
.
(b) If the distribution of X is not normal (or unknown), and 30n ≥
2
~ ,X Nnσµ
approximately.
This is known as the Central Limit Theorem.
• From above, we see that the mean of X µ= , variance of2
Xnσ
= .
• Standard deviation of Xnσ
= is also known as the standard error of the mean.
Example 8.2-1 The mass of garlic bulbs produced by a particular farm is approximately normally
distributed, with a mean of 60 g and a standard deviation of 5 g. State the distribution
of the sample mean of a random sample of 16 garlic bulbs.
Solution:
Let X be the mass of a garlic bulb. ( )2~ 60,5X N
The sample mean of 16 garlic bulbs, ( )~ , X N
Example 8.2-2
124
The waistline of forty-year-old male Singaporeans is known to have a mean of 33
inches and a variance of 9 square inches. A random sample of 36 forty-year-old male
Singaporeans was selected. Find the probability that the sample mean
(a) is greater than 31.5 inches,
(b) lies between 32 and 34 inches,
(c) differs from the population mean by more than one inch.
Solution: Let X be the waistline of a forty year old male.
36n =
Sample mean of 36 male, 9 1~ 33, ~ 33,
36 4X N approx X N approx ⇒
(by CLT)
a) ( ) ( ) 31.5 3331.5 1 31.5 11/ 4
P X P X P Z − > = − ≤ = − ≤
=
b) ( ) ( ) ( ) 34 33 32 3332 34 34 321/ 4 1/ 4
P X P X P X P Z P Z− − ≤ ≤ = ≤ − < = ≤ − <
= c) ( ) ( )32 or 34 1 32 34P X X P X< > = − ≤ ≤ = Example 8.2-3
125
The body length (excluding the tail) of a particular species of mice is approximately
normally distributed, with a mean of 12 cm and a standard deviation of 2.4 cm.
(a) If a random sample of 16 mice is selected, what is the probability that it will have
an average body length of between 11 and 13 cm?
(b) If a random sample of 25 mice is selected, what is the probability that it will have
an average body length of between 11 and 13 cm?
(c) Comment on the answers obtained in part (a) and (b).
Solution:
Let X be the body length of a mouse. 2~ (12,2.4 )X N
a) 16n =
Sample mean of 16 mice, ( )22.4~ 12, ~ 12, 0.36
16X N X N
⇒
( ) ( ) ( ) 13 12 11 1211 13 13 110.36 0.36
P X P X P X P Z P Z− − ≤ ≤ = ≤ − < = ≤ − <
=
b) 25n =
Sample mean of 25 mice, ( )22.4~ 12, ~ 12, 0.2304
25X N X N
⇒
( ) ( ) ( ) 13 12 11 1211 13 13 110.2304 0.2304
P X P X P X P Z P Z− − ≤ ≤ = ≤ − < = ≤ − <
=
c) The required probability becomes ___________ when sample size
___________.
126
Tutorial 8: Distribution of Sampling Means
A Self Practice Questions
1 Let 1 2, ,..., nX X X be independent random variables. Write down the mean and
variance of X for each of the following:
(i) n = 15, mean of X = 4, variance of X = 7.
(ii) n = 30, mean of X = 5, standard deviation of X = 3.
2 Let 1 2, ,..., nX X X be independent random variables with mean 3 and variance
5. Write down the distribution of X (with explanation if necessary) when:
(i) iX ’s are normal, n = 10, (ii) iX ’s are normal, n = 60,
(iii) Distribution of iX ’s are unknown, n = 35.
3 Calculate ( )4P X < for Question 2(iii).
B Discussion Questions
1 In a certain population of swordtail fish, the lengths of the individual fish follow approximately a normal distribution with mean 52.0 mm and standard deviation of 6.0 mm. Find the probability that a random sample of 25 swordtail fishes will have an average length of
(i) less than 48.6 mm
(ii) between 52.4 and 54.4 mm.
2 According to an article, root-canal therapy costs from $200 to $700. Suppose the mean cost for root-canal therapy is $450 and the standard deviation is $125. If a sample of 100 dentists was selected across the country, find the probability that the mean cost per root canal for the sample would fall between $425 and $475.
127
3 The average number of days spent in a particular hospital for a coronary bypass in 2013 was 9 days and the standard deviation was 4 days. What is the probability that a random sample of 30 patients will have an average stay longer than 9.5 days? State any assumptions required on the distribution on the days spent.
4 The intelligence quotient (IQ) score of a certain population of children is
approximately normally distributed with a mean of 102 and a standard deviation of 10. Let Y be the random variable ‘the IQ score of children’.
(i) If a random sample of n children is selected, find the value of n given
that ( 103) 0.3446P Y > = .
(ii) Using the value of n found in part(i), find the value of k if ( 105) 0.6730P k Y< < = .
5 The heartbeat rate of a certain population of babies follows a normal distribution with mean 70 beats/min and standard deviation of 10 beats/min.
(i) Find the probability that a baby randomly selected from this population has a heartbeat rate of less than 66 beats/min.
(ii) If a sample of 8 babies is randomly selected, find the probability 3 of them will have a heartbeat rate of less than 66 beats/min
(iii) If a random sample of 36 babies is selected, what is the probability that it will have a mean heartbeat rate of more than 68 beats/min.
6 The masses of Giant apple follow a normal distribution with mean 700 g and standard deviation of 100 g.
(i) Find the probability that the total mass of 10 Giant apples will be more than 7.2 kg.
(ii) A random sample of n apples is chosen. Find the least value of n such that there is a probability of not more than 0.25 that the sample mean differs from its mean mass by more than 20 g.
128
Answers
A1 i mean = 4 , variance = 715
, ii mean = 5 , variance =
310
.
A2 i 1~ 3,2
X N
ii 1~ 3,
12X N
iii 1~ 3, approx7
X N
A3 0.9960
B1 i 0.0023 ii 0.3479
B2 0.9544
B3 0.2483
B4 i n = 16 ii k = 100
B5 i 0.3446 ii 0.2771 iii 0.8849
B6 i 0.2643 ii least n = 34
129
Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008/2681/2916/2961 Mathematics 2B – EGB/D/F/HJ/M207 Topic 9 : Estimation of Parameters
Objectives :
At the end of this lesson, the student should be able to: 1 calculate the point estimators of population parameters 2 construct and interpret confidence intervals for the population mean using the
appropriate distributions (standard normal or t-distribution) 3 explain how the confidence interval is related to sample size and confidence
level
130
Topic 9: Estimation of Parameters
9.1 Estimation of the Population Mean
• It is common that we do not know the population mean for a random variable
we are interested in. Using the example in Chapter 8, it would be impossible for us
to determine the population mean of all the peanuts in this world.
• Hence a common approach is to take a sample and use the information from it to
estimate the population mean.
Example 9.1-1 The following set of data is the heights (in cm) of 16 children:
101 118 125 116 113 102 117 126
106 114 109 121 107 119 116 115
Find a point estimate of the mean height, µ of all the children (the population).
Solution:
Definition: A point estimate is a single value estimate for a population parameter. The most unbiased point estimate of the population mean µ is the sample mean x
131
9.2 Confidence Interval for the Mean (Large Samples)
• In example 9.1-1, the probability that the population mean height of the children is
exactly 114.0625 cm is virtually nil. So instead of using a point estimate to estimate
µ to be exactly 114.0625 cm, we can estimate that µ lies in an interval.
Although the point estimate in example 9.1-1 is not equal to the actual population
mean, it is probably close to it. To form an interval estimate, use the point estimate
as the centre of the interval, then add and subtract a margin of error.
If the margin of error is 3.95, then the interval estimate in example 9.1-1 will be
computed as 114.0625 3.95± or 110.1 118.0µ< < .
• Before finding a margin of error for an interval estimate, we must first determine
how confident we need to be that our interval estimate contains the population
mean µ.
For example,
Definition: An interval estimate is an interval, or a range of values, used to estimate a population parameter.
Definition: A confidence level 100c % refers to the percentage of the intervals from all possible samples that we can expect to contain the true population mean.
The diagram shows that there are 10 intervals obtained from 10 samples. If it is a 90% confidence interval, then it is expected that 9 out of 10 intervals contain the population mean.
132
• When the sample size is large, i.e. 30n ≥ , by Central Limit Theorem, the sampling distribution of sample means is a normal distribution. The level of confidence 100c % is the area under the standard normal curve between the critical values,. cz− and
cz .
•
Example 9.2-1
Find the critical values cz necessary to form a confidence interval at the following given level of confidence: (i) 80%, (ii) 85%, (iii) 97%.
Solution:
100 %c
cz
( )100 100%
2c−
0cz−
133
• Given a level of confidence 100c %, the margin of error, E is the greatest possible
distance between the point estimate and the value of the parameter it is estimating.
E is also known as the maximum error of estimate or error tolerance.
• The margin of error, E can be calculated as follows:
•
cE znσ =
or c
sE zn
=
If the population standard deviation, σ is known or when 30n ≥ , the sample
standard deviation, s is used in place of σ .
Example 9.2-2
Find the margin of error for the given values of c, s and n.
(i) 0.90, 2.5, 36c s n= = = ;
(ii) 0.95, 3.0, 60c s n= = = ;
(iii) 0.975, 4.6, 100c s n= = =
Solution:
Note: In general, the margin of error decreases as the sample size increases.
134
• Using a point estimate and a margin of error, an interval estimate for a population
parameter such as µ can be constructed. This interval estimate is called a
confidence interval. • Hence, a 100c % confidence interval for the population mean µ is given as:
x E x Eµ− < < + or ( ),x E x E− +
where the probability that the confidence interval contains µ is 100c %.
• Steps for constructing a confidence interval for a population mean ( 30n ≥ or σ is
known with a normally distributed population) are:
1. Find the sample statistics n and 1x xn
= ∑ .
2. Specify σ if known. Otherwise, if 30n ≥ , find the sample standard deviation
( )211
s x xn
= −− ∑ and use it as an estimate for σ .
3. Find the critical value cz that corresponds to the given level of confidence.
4. Find the margin of error, cE znσ =
.
5. Form the confidence interval; ( ),x E x E− + .
135
Example 9.2.3
After a few rainy days, numerous tadpoles appeared on a wet field. 12 tadpoles were randomly picked and their lengths measured. It is found that the sample mean is 11.1 mm. If this sample came from a normally distributed population with variance 4, calculate a 95% confidence interval for the mean length of all the tadpoles in the field.
Solution:
( )0.95 0.950.975P Z z z≤ = ⇒ =
95% confidence interval = 0.95 0.95,x z x zn nσ σ − + =
Example 9.2-4 Fifty 2-year-old cows were injected with an antibiotic A, at a dosage of 12 mg/kg body
weight. It is found that the sample mean of the blood serum concentrations ( mlg /µ )
of the antibiotic 2 hrs after injection is 25.5 and the sample standard deviation is 3.03.
Construct a 90% confidence interval for the population mean.
Solution:
( )0.90 0.900.95P Z z z≤ = ⇒ =
90% confidence interval = 0.90 0.90,s sx z x zn n
− + =
95 %2.5 %
0.95z
90 %5 %
0.90z
136
Example 9.2-5 A random sample of 150 readings was taken from a population with mean µ and
variance 2σ . Given that 1623=Σx and 36.178142 =Σx ,
(a) calculate x and s .
(b) construct a 95 % confidence interval for the population mean.
Solution:
a) Recall x
xn
= =∑ , ( )2
2 211
xs x
n n
= − =
−
∑∑
b)
( )0.95 0.950.975P Z z z≤ = ⇒ =
95% confidence interval = 0.95 0.95,s sx z x zn n
− + =
9.3 Confidence Interval for the Mean (Small Samples) • In many real-life situations, the population standard deviation is unknown.
Moreover, due to constraints such as cost and time, it is often not practical to collect
samples of size 30 or more. If the random variable is normally or approximately
normally distributed, we can use a t-distribution.
• When X is a normal random variable, with the population standard deviation, σ
unknown, the random variable T
XT sn
µ−=
follows a t – distribution with degrees of freedom, d.f. = n – 1.
137
• The value of ct can be obtained from the t – distribution table.
• For example, if 7n = and we want to construct a 95% confidence interval, we can
obtain the value of 0.95t as follows:
d.f. 7 1 6= − =
0.95 2.447t∴ =
Example 9.3-1 Twelve packets of a particular brand of sweets are selected at random and their
weights noted. The weights obtained (in grams) are
407.3, 409.6, 391.0, 402.9, 406.8, 390.0, 407.6, 402.1, 390.8, 390.6, 396.8, 400.2.
Assuming that the sample is taken from an approximately normal population with
mean massµ , calculate
(a) the 95% confidence interval for µ ,
(b) the 99% confidence interval for µ .
Solution:
Using calculator, x = , s = , df 1n= − =
a) Since sample size, 30n < and population variance unknown, 0.95t =
95% confidence interval is 0.95 0.95,s sx t x tn n
− + =
138
b) Since sample size, 30n < and population variance unknown, 0.99t =
99% confidence interval is 0.99 0.99,s sx t x tn n
− + =
9.4 Minimum Sample Size to Estimate Population Mean µ • Sometimes we will need to determine the sample size required before we conduct
an experiment. Given a pre-determined confidence level 100c % and margin of error,
2
cznEσ =
or
2cz snE
=
or 2
ct snE
=
Example 9.4-1 You want to estimate the mean number of sentences in a magazine advertisement.
How many magazine advertisements must be included in the sample if you want to be
95% confident that the sample mean is within one sentence of the population mean?
Assume that the population standard deviation is 5.0 and the number of sentences is
normally distributed.
Solution:
Given E = 1, σ = 5.0, & 0.95z =
Hence the number of advertisements required in the sample is at least :
2
cznEσ = =
139
In summary,
YES
NO
NO
YES
NO
YES
140
Table 2: t – Distribution
Level of
confidence, c 0.50 0.80 0.90 0.95 0.98 0.99
One tail, α 0.25 0.10 0.05 0.025 0.01 0.005 d.f. Two tails, α 0.50 0.20 0.10 0.05 0.02 0.01 1 1.000 3.078 6.314 12.706 31.821 63.657 2 0.816 1.886 2.920 4.303 6.965 9.925 3 0.765 1.638 2.353 3.182 4.541 5.841 4 0.741 1.533 2.132 2.776 3.747 4.604 5 0.727 1.476 2.015 2.571 3.365 4.032 6 0.718 1.440 1.943 2.447 3.143 3.707 7 0.711 1.415 1.895 2.365 2.998 3.499 8 0.706 1.397 1.860 2.306 2.896 3.355 9 0.703 1.383 1.833 2.262 2.821 3.250 10 0.700 1.372 1.812 2.228 2.764 3.169 11 0.697 1.363 1.796 2.201 2.718 3.106 12 0.695 1.356 1.782 2.179 2.681 3.055 13 0.694 1.350 1.771 2.160 2.650 3.012 14 0.692 1.345 1.761 2.145 2.624 2.977 15 0.691 1.341 1.753 2.131 2.602 2.947 16 0.690 1.337 1.746 2.120 2.583 2.921 17 0.689 1.333 1.740 2.110 2.567 2.898 18 0.688 1.330 1.734 2.101 2.552 2.878 19 0.688 1.328 1.729 2.093 2.539 2.861 20 0.687 1.325 1.725 2.086 2.528 2.845 21 0.686 1.323 1.721 2.080 2.518 2.831 22 0.686 1.321 1.717 2.074 2.508 2.819 23 0.685 1.319 1.714 2.069 2.500 2.807 24 0.685 1.318 1.711 2.064 2.492 2.797 25 0.684 1.316 1.708 2.060 2.485 2.787 26 0.684 1.315 1.706 2.056 2.479 2.779 27 0.684 1.314 1.703 2.052 2.473 2.771 28 0.683 1.313 1.701 2.048 2.467 2.763 29 0.683 1.311 1.699 2.045 2.462 2.756 ∞ 0.674 1.282 1.645 1.960 2.326 2.576
tt t tt−t−t− t t t
141
Appendix A Confidence intervals using EXCEL • In EXCEL, select the tab “Formulas” “More Functions” “Statistical”.
You can calculate the margin of error using the functions:
CONFIDENCE.NORM (z table) or CONFIDENCE.T (t table).
• Suppose we will construct a 95 % confidence interval from the z table with
population deviation 4 and sample size 20:
“Alpha” = 1 – 0.95 = 0.05
Hence the margin or error = 1.7530
142
Tutorial 9: Estimation of parameters
A Self Practice Questions
1 Determine the cz value for the following:
(a) 90 % confidence interval for µ .
(b) 95 % confidence interval for µ .
(c) 98 % confidence interval for µ .
(d) 99 % confidence interval for µ .
2 Determine the ct value for the following:
(a) 10n = , 90 % confidence interval for µ .
(b) 22n = , 95 % confidence interval for µ .
(c) 25n = , 98 % confidence interval for µ .
(d) 18n = , 99 % confidence interval for µ .
3 Given that 10x = , calculate the 98 % confidence interval for µ when
(a) X is a normal random variable, 20, 3n σ= = .
(b) X is not a normal random variable, 50, 3n σ= = .
(c) X is a normal random variable, 10,n = σ is unknown and 2s = .
143
B Discussion Questions
1 In a particular factory, the quantity of mineral water dispensed by automated machines into plastic bottles is approximately normally distributed with standard deviation of 24 millilitres. A random sample of 25 such bottles was found to have a mean quantity of 503 millilitres.
(a) Find the standard error of the mean.
(b) Find a 90 % confidence interval for the mean quantity of mineral water dispensed by the machines.
(c) Find a 98 % confidence interval for the mean quantity of mineral water dispensed by the machines.
2 The heights of a random sample of 40 NYP students yield a mean of 173.8 cm and a standard deviation of 6.8 cm. Assume population is normally distributed.
(a) Construct a 95 % confidence interval for mean height of all NYP students.
(b) With reference to the 95 % confidence interval, what is the maximum possible error of using the sample mean as an estimate of the population mean?
3 One of the objectives of a large medical study was to estimate the mean physician fee for cataract removal. For n randomly selected cases the mean fee was found to be $1550 with a standard deviation of $125.
(a) Find a 99 % confidence interval on µ , the mean fee for all physicians when n = 35.
(b) Find a 99 % confidence interval on µ , the mean fee for all physicians when n = 25 and the distribution of the fees is normally distributed.
144
4 A researcher selected a random sample of 8 chick embryos to study the development of thymus gland. He weighed the glands of these 8 chick embryos after 12 days of incubation. The thymus weights (in mg) were as follows:
28.4 20.8 27.6 33.0 40.8 36.5 29.1 31.8
(a) Using your calculator, find the sample mean and the sample standard deviation.
(b) Construct a 90% confidence interval for the population mean.
(c) State whether any assumption is required on the distribution of the embryos.
C Conceptual Questions
Determine whether the following statements are true or false. Explain your reasoning.
1 For a given standard error, lower confidence levels produce wider confidence intervals.
2 If you increase sample size, the width of the confidence interval will increase.
3 To reduce the width of a confidence interval by half, we have to increase the sample size by four times.
Answers
A1 a 1.645 b 1.96 c 2.33 d 2.575
A2 a 1.833 b 2.080 c 2.492 d 2.898
A3 a (8.44, 11.6) b (9.01, 11.0) c (8.22, 11.8)
B1 a 4.8 b (495, 511) c (492, 514)
B2 a (172,176) b 2.11
B3 a (1495.49,1604.51) b ( )14980.08,1619.93
B4 a 31, 6.06 b (26.9,35.1)
C1 F C2 F C3 T
145
Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008//2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 10 : Hypothesis Testing with One Sample
Objectives :
At the end of this lesson, the student should be able to: 1. formulate a hypothesis test by using its characteristics such as formulating the
null and alternate hypothesis, identifying the correct test statistics and applying the critical regions
2. evaluate its reliability by explaining the type I and II errors
3. evaluate the hypothesis of a population mean by using the z-test or t-test
146
TOPIC 10: Hypothesis Testing with One Sample
10.1 Introduction to Hypothesis Testing
• Suppose a car manufacturer advertises that its new hybrid car has a mean mileage
of 50 miles per gallon. This statement may be true but it has yet been proven. Such
a statement is known as a statistical hypothesis.
• One way of testing the above hypothesis is to literally test all the hybrid cars made by this manufacturer; which is both impractical and non-economical. The more sensible approach is to test the validity by considering random samples taken from the population of this hybrid cars.
• In this chapter, you will learn how to test a claim or a hypothesis about a population parameter, based on the information obtained from a random sample. In this module, we are only concerned with testing the population mean.
10.2.1 Stating a Hypothesis
• A statement about a population parameter is called a statistical hypothesis. To
test a population parameter, you must state a pair of hypotheses – one that
represents the claim and the other its complement. When one of these hypotheses
is false, the other must be true. Either hypotheses – the null hypothesis or the
alternative hypothesis may represent the original claim.
• To write the null and alternative hypotheses, translate the claim made about the
population parameter from a verbal statement to a mathematical statement. Then
Definition
1. A null hypothesis 0H is a statistical hypothesis that contains a statement of equality, such as ≤, = or ≥ . 2. The alternate hypothesis aH is the complement of the null hypothesis. It is a statement that must be true if 0H is false and it contains statement of strict inequality, such as >, ≠ or < .
147
write its complements. For instance, if the claim value is k and the population
parameter is µ, then some possible pairs of null and alternative hypotheses are:
• 1st possible pair 2nd possible pair 3rd possible pair
0
1
::
H kH k
µµ≤
> 0
1
::
H kH k
µµ≥
< 0
1
::
H kH k
µµ=
≠
• Thereafter, we will examine the sampling distribution and determine whether or not
a sample statistic is unusual.
Example 10.2-1
Write the following claims as a mathematical sentence. State the null and alternative
hypotheses and identify which represents the claim.
(a) A university publicises that the proportion of its students who graduate in 4 years
is 82%.
(b) A water faucet manufacturer announces that the mean flow rate of a certain type
of faucet is less than 2.5 gallons per minute.
(c) A cereal company advertises that the mean weight of the contents of its 20-ounce
size cereal boxes is more than 20 ounces.
Solution: (a) The claim “the proportion … is 82%” can be written as 0.82p = . Its complement
is 0.82p ≠ . Since 0.82p = contains the statement of equality, it becomes the null hypothesis. In this case, the null hypothesis is also the claim. Hence,
0
1
: 0.82 (claim): 0.82
H pH p
= ≠
148
10.2.2 Types of Errors
• No matter which hypothesis represents the claim, we always begin a hypothesis
test by assuming that the equality condition in the null hypothesis is true. So, when
we perform a hypothesis test, we make one of two decisions:
1. Reject the null hypothesis; or
2. Fail to reject the null hypothesis.
Since our decision is based on a sample rather than the entire population, there is
always the possibility that we will make the wrong decision.
• The only way to be absolutely certain of whether 0H is true or false is to test the
entire population. Otherwise, we might reject 0H when it is actually true or fail to
reject 0H when it is actually false.
The following table shows the four possible outcomes of a hypothesis test.
Truth of 0H
Decision 0H is true 0H is false
Do not reject 0H Correct decision Type II error
Reject 0H Type I error Correct decision
Definition 1. A type I error occurs if the null hypothesis is rejected when it is true.
2. A type II error occurs if the null hypothesis is not rejected when it is false.
149
Example 10.2-2
The USDA limit for salmonella contamination for chicken is 20%. A meat inspector
reports that the chicken produced by a company exceeds the USDA limit. You perform
a hypothesis test to determine whether the meat inspector’s claim is true. When will a
type I or II error occur? Which is more serious?
Solution:
Let p be the proportion of chicken that is contaminated.
0
1
: 0.2: 0.2 (claim)
H pH p
≤ >
Type I error occurs when the actual proportion of contaminated chicken is less than or
equal to 0.2 but we decided to reject the null hypothesis.
Type II error occurs when the actual proportion of contaminated chicken is greater
than 0.2 but we do not reject the null hypothesis.
Type II error is more serious because we are allowing chicken that exceeded USDA
contamination limit to be sold to consumers; which could result in sickness and death.
Example 10.2-3
A company specialising in parachute assembly states that its main parachute failure
rate is not more than 1%. You perform a hypothesis test to determine if its claim is
false. When will a type I or type II error occur? Which is more serious?
Solution:
150
10.2.3 Level of Significance
• By setting the level of significance at a small value, you are saying that you want
the probability of rejecting a true null hypothesis to be small. Three commonly used
level of significance are 0.10, 0.05α α= = and 0.01α = .
• The probability of a type II error is denoted by β.
10.2.4 Types of Test and the Rejection Criteria
• Knowing the type of hypothesis tests helps us to decide the criteria for rejecting the null hypothesis. The region of the sampling distribution that favours the alternative hypothesis aH (i.e. the rejection of 0H ) determines the type of test. There are three types of hypothesis tests—a left-, right-, or two-tailed test.
•
• Type 1: Left-tailed test •
•
0:
:
a
H
H
k
k
µ ≥
µ <
• Type 2: Right-tailed test •
•
0:
:
a
H
H
k
k
µ ≤
µ >
• Type 3: Two-tailed test •
•
0:
:
a
H
H
k
k
µ =
µ ≠
Definition In a hypothesis test, the level of significance is your maximum allowable probability
of making a type I error. It is denoted by α.
151
• To find the critical value(s) that defines the rejection region, we need to establish
the type of hypothesis test, the level of significance and the sampling distribution.
The critical value is denoted by: 1.
cz if the sampling distribution follows normal distribution
2. c
t if the sampling distribution follows student-t distribution
• Case 1: Left-tailed test • • The rejection region is the area on the left of the
critical value, i.e. c c
z z t t< < or
• Case 2: Right-tailed test • • The rejection region is the area on the right of the
critical value.
• i. e. c cz z t t> > or
• • • Case 3: Two-tailed test • • The rejection region is the area on the left of the
negative critical value and to the right of the positive critical value.
• i.e. { } { }c c c cz z z z t t t t< − > < − > or or or
Definition A rejection region of the sampling distribution is the range of values for which the
null hypothesis is not probable. A critical value separates the rejection region from
the non-rejection region.
Critical value
Rejection region: Reject H0
Critical value
Rejection region: Reject H0
- ve Critical value +ve Critical value
Rejection region: Reject H0
Rejection region: Reject H0
152
Example 10.2-4
In each of the claims, state the null and alternative hypotheses, determine if the test
is a left-, right- or two-tailed test. At 0.10α = , sketch a normal sampling distribution
and find the critical value(s).Assume that the population follows a normal distribution.
(i) A consumer analyst reports that the mean life of a certain type of automobile
battery is 74 months.
(ii) A radio station publicises that its proportion of the local listening audience is
greater than 39%.
Solution: (i) Null hypothesis : ___________________________
Alternative hypothesis : ___________________________
Type of test : ___________________________
(ii)
10.2.5 Test Statistics and Making Decision
• To use the rejection region to make a conclusion in a hypothesis test:
Case 1: If a test statistic falls in the rejection region, we reject null hypothesis.
Case 2: If a test statistic falls outside of the rejection region, we fail to reject the null hypothesis.
• The following table will help you to interpret your decision:
153
Claim Decision Claim is 0H Claim is aH
Reject 0H There is enough evidence
to reject the claim.
There is enough evidence to
support the claim.
Fail to reject 0H There is not enough
evidence to reject the
claim.
There is not enough evidence
to support the claim.
• The test statistic for the statistical test for a population mean is the sample
mean, x and the standardized test statistic is denoted by
(i) z if the sampling distribution follows normal distribution (or 30n ≥ ) (ii) t if the sampling distribution follows student-t distribution (or 30n < )
(iii) The standardized test statistic sample mean - hypothesized mean
standard error=
• When testing a population mean,
154
Example 10.2-5 (Large Sample)
The CEO of a firm claims that the mean work day of the firm’s accountants is less than
8.5 hours. A random sample of 35 of the firm’s accountants has a mean work day of
8.2 hours with a standard deviation of 0.5 hour. At 0.01α = , test the CEO’s claim.
Solution:
Example 10.2-6 (Small Sample)
A used car dealer says that the mean price of a 2010 Honda Pilot LX is at least $23,900.
You suspect this claim is incorrect and find that a random sample of 14 similar vehicles
has a mean price of $23,000 and a standard deviation of $1113. Is there enough
evidence to reject the dealer’s claim at 0.05α = ? Assume the population is normally
distributed.
Solution:
155
In summary, the steps for hypothesis testing are:
156
Tutorial 10: Hypothesis Testing with One Sample
A Self Practice Questions
A.1 Finding Critical Values for Normal Distribution
Find the critical value(s) for the indicated z-test and level of significance.
(a) right-tailed, 0.05α = (b) right-tailed, 0.08α =
(c) left-tailed, 0.03α = (d) left-tailed, 0.09α =
(e) two-tailed, 0.02α = (f) two-tailed, 0.10α =
A.2 Finding Critical Value(s) for Student t-distribution
Find the critical value(s) for the indicated t-test, level of significance and sample.
(a) right-tailed, 0.05, 23nα = = (b) right-tailed, 0.01, 11nα = =
(c) left-tailed, 0.025, 19nα = = (d) left-tailed, 0.05, 14nα = =
(e) two-tailed, 0.01, 27nα = = (f) two-tailed, 0.05, 10nα = =
A.3 Testing the Claim
Test the claim about the population mean µ at the given level of significance using the given sample statistics. Assume population is normally distributed for (iii) and (iv).
(i) Claim: 40; 0.05.µ α= = Sample statistics: 39.2, 3.23, 75x s n= = =
(ii) Claim: 1030; 0.05.µ α> = Sample statistics: 1035, 23, 50x s n= = =
(iii) Claim: 52200; 0.05.µ α≠ = Sample statistics: 53200, 1200, 4x s n= = =
(iv) Claim: 8000; 0.01.µ α≥ = Sample statistics: 7700, 450, 25x s n= = =
157
B Discussion Questions
B1. A report claims that an adult has an average of 130 Facebook friends. A random sample of 50 adults revealed that the average number of Facebook friends was 142 with a standard deviation of 38.2. At 5% significance level, is there enough evidence to reject the claim?
B2. An officer from the utility department claims that the average water usage per
household is more than 12 cubic meters per month. To check the claim, a random sample of 40 households was selected and found that the average monthly water usage was 13 cubic meters with a standard deviation of 3 cubic meters. At 1% significance level, is there enough evidence to support the officer’s claim?
B3. The management of weight loss club claims that its members lose an average
of 3 kg or more within the first month after joining the club. A consumer agency that wanted to check this claim took a random sample of 36 members of this club and found that they lost an average of 2.9 kg with a standard deviation of 0.6 kg within the first month of membership. Test, at 10% significance level, on whether the management’s claim is true.
B4. A psychologist claims that the mean age at which children start walking is 12.5
months. To check this claim, you took a random sample of 18 children and found that the mean age at which these children started walking was 12.9 months with a standard deviation of 0.7 month. Using the 10% significance level, can you conclude that the mean age at which all children start walking is 12.5 months? Assume that the ages at which all children start walking have an approximately normal distribution.
B5. A pharmaceutical company claims that the average selling price per tablet of
its new drug is less than 45 cents. You have been asked to challenge the claim and so you conducted a random sampling of prices at 10 pharmacies across the country. The results (in cents) are as follow:
33.45 28.99 27.45 42.89 53.91 37.95 48.55 36.80 35.95 40.45
Is there sufficient evidence to support the claim that the average price per tablet
is less than 45 cents at the 1% level of significance? Assume that the selling price per tablet is approximately normally distributed.
B6. The average monthly telephone bill was reported to be more than $50.07. A
random sample of 10 people was taken and the following were the monthly charges (in dollars):
55.83, 49.88, 62.98, 70.42, 60.47, 52.45, 49.20, 50.02, 58.60, 51.29
At the 5% significance level, can the claim be supported? Assume all telephone bills to be approximately normal.
158
Answers
A1a, b 0.95 1.645z = A1b 0.92 1.41z =
A1c, d 0.03 1.88z = − A1d 0.09 1.34z = −
A1e, f 0.01 2.33z = − A1f 0.05 1.645z = −
A2a, b 0.05,22 1.717t = A2b 0.01,10 2.764t =
A2c, d 0.025,18 2.101t = A2d 0.05,13 1.771t =
A2e, f 0.01,26 2.779t = A2f 0.05,9 2.262t =
A3i 2.145 1.96z = − < − , reject 0H
A3ii 1.537 1.645z = < , do no reject 0H
A3iii 1.667 3.182t = < , do no reject 0H
A3iv 3.333 2.492t = − < − , reject 0H
B1 Reject 0H B2 Do not reject 0H
B3 Do not reject 0H B4 Reject 0H
B5 Do not reject 0H B6 Reject 0H
159
Course : Diploma in Electronics, Computer & Communications Engineering Diploma in Electronic Systems Diploma in Telematics & Media Technology Diploma in Aerospace Systems & Management Diploma in Electrical Engineering with Eco-Design Diploma in Mechatronics Engineering Diploma in Digital & Precision Engineering Diploma in Aeronautical & Aerospace Technology Diploma in Biomedical Engineering Diploma in Nanotechnology & Materials Science Diploma in Engineering with Business Module : Engineering Mathematics 2B / − EG2008//2681/2916/2961 Mathematics 2B – EGB/D/F/H/J/M207 Topic 11 : Hypothesis Testing with Two Samples
Objectives :
At the end of this lesson, the student should be able to: 1. to distinguish between independent and dependent samples
2. compare the means of 2 independent samples using the hypothesis testing
approach 3. compare the means of 2 dependent samples using the hypothesis testing approach
160
TOPIC 11: Hypothesis Testing with Two Samples
11.1 Introduction
• Oftentimes, we hear people say ‘The kids these days are taller than before’. In
general, teenagers do seem taller than the adults who are now in their 30s. How
can we justify this assumed comparison with sufficient evidence?
• We know that it is both impractical and non-economical to measure the heights of
all teenagers and the adults in their 30s to make the comparison. The more sensible
approach to compare the difference in their heights is to take random samples from
the two different populations and compare their sample means.
• In reality, it is very common to make comparison between two or more distinct
populations. In this module, we will be exploring the comparison of the sample
means of two populations, although in other situations, it might be necessary to
compare other parameters such as the standard deviation and shape of the
distributions.
11.2 Independent and Dependent Samples
• In comparing two means, we want to see how different is one mean (let’s call this
1x ) from the other (let’s call this 2x ), so the most natural thing to do is to observe
the difference between the two means, 1 2x x− .
• The hypothesis testing approach in the comparison of two means allows us to test
and see if there is enough evidence to conclude that
o two means differ from each other.
o one mean is greater/lesser than the other.
• The approach differs for comparison between independent and dependent samples.
161
Example 11.2-1
Classify each pair of samples as independent or dependent. Justify your answer.
(a) Sample 1: Resting heart rates of 35 individuals before drinking coffee.
Sample 2: Resting heart rates of the same individuals after drinking two cups of coffee.
(b) Sample 1: Test scores for 35 statistics students.
Sample 2: Test scores for 42 biology students who do not study statistics. Solution:
11.3 Hypothesis Testing for Two Independent Samples
• The hypothesis test of two independent samples follows the same 6 steps as the hypothesis test of one sample. The difference lies in step 3, which requires us to know the distribution of the difference in sample means 1 2x x− .
• General Steps for a Hypothesis Test between 2 Independent Samples are:
Step 1: State the claim mathematically. Identify the null, 0H and alternative, aH hypotheses. The possible hypotheses are:
0 1 2
1 2
:
:a
H
H
µ ≥ µ
µ µ<
0 1 2
1 2
:
:a
H
H
µ µ
µ µ
≤>
0 1 2
1 2
:
:a
H
H
µ = µ
µ ≠ µ
Definitions: 1. Two samples are independent if member s of one sample are unrelated to
members of the other sample. 2. Two samples are dependent when each member of one sample is related
to the other sample. Dependent samples are also called paired or matched samples.
162
Regardless of which hypothesis, we always assume that the population means
are the same, i.e. 1 2 1 2 0µ µ µ µ= ⇒ − = .
Step 2: Identify (a) the type of test and (b) the level of significance, α of the
hypothesis test.
Step 3: State the type of distribution of the difference in sample means 1 2x x−
follows.
(i) The sampling distribution of 1 2x x− follows a normal distribution;
2 21 2
1 2 1 21 2
~ ,x x Nn nσ σµ µ
− − −
if three conditions are met:
(a) The samples must be randomly selected;
(b) The samples must be independent;
(c) Each sample size must be large ( 30n ≥ ) or each
population follows a normal distribution with known
standard deviation.
(ii) The sampling distribution of 1 2x x− follows a t-distribution if
(a) each sample size is small ( 30n < );
(b) equal but unknown population variance; then
1 2
^
1 2
1 1x x n n
σ σ−
= + with d.f. 1 2 2n n= + − ; or
(c) unknown and unequal population variance; then
1 2
2 21 2
1 2x x
s sn n
σ−
= + with d.f. = smaller of 1 1n − or 2 1n −
Step 4: Determine the rejection criteria using the rejection region
Step 5: Find the standardized test statistics ( ) ( )
1 2
1 2 1 2
X X
x x
−
− − µ − µ=
σ with 1 2 0µ µ− =
since 1 2µ µ= . Step 6: Decide whether to reject or fail to reject 0H and interpret the decision in
the context of the original claim.
163
Example 11.3-1
121 boys and 144 girls sat for the PSLE in 2013. The mean PSLE scores for the boys and girls are 237 and 240 respectively. Assuming a common population standard deviation score of 12, test whether the results provide significant evidence, at the 1% level, that the academic standard of boys is inferior to that of the girls. Solution:
Step 1: 0: B GH µ ≥ µ
( ):a B GH µ < µ claim
Step 2: It is a left-tailed test and 0.01α = Step 3: Since 121 14430, 30
B Gn n= => > with known ,
B Gσ σ ,
B GX X− follows a normal
distribution with 0B GX X
µ−
= and
22 2 212 12 265
121 144 121B G
GB
X X
B Gn n
σσσ
−+ += = =
Step 4: Reject 0H if 2.33z zα< = − . Step 5:: Standardized test statistic,
( ) ( ) ( )237 240 02.03
265
121B G
B G B G
X X
x xz
−
− − µ − µ − −= = −
σ=
Step 6: Since 2.03 2.33z zα= − > = − , we do not reject 0H .
At 0.01α = , there is not enough evidence to support the claim that the academic standard of boys is inferior to that of the girls.
164
Example 11.3-2
The braking distances of 8 Volkswagen GTIs and 10 Ford Focuses were tested when travelling at 60 miles per hour on dry pavement. The results are shown below.
GTI 1
134x = 16.9s = 1
8n = FOCUS
2143x = 2
2.6s = 210n =
Can you conclude that there is a difference in the mean braking distances of the two types of cars? Use 0.01α = . Assume the populations are normally distributed and the population variances are not equal. Solution:
165
Example 11.3-3
A study sought to find out if playing soft classical music to plants helps in plant growth. 40 plants grown from the same batch of seeds are divided equally into two samples, A and B. Sample A is grown for a month under the sound of soft classical music while sample B acts as the control group. The mean growth (in mm) and standard deviation (in mm) of both samples are shown below:
Sample A:
36 38 33 39 31 34 40 33 36 35 35 34 36 38 33 32 39 45 41 34
Sample B:
32 36 31 38 29 32 38 31 34 33 33 32 34 36 31 31 37 39 29 32
Assume the populations are normally distributed and the population variances are equal, test at the 5% level if music has indeed helped in plant growth.
Solution:
166
11.4 Hypothesis Testing for Two Dependent Samples
• In the hypothesis test of 2 dependent samples or paired data, we are interested in the difference between the 2 values within each paired data ( )
1 2,X X . The difference denoted
by d is defined as1 2
d X X= − .The mean of the differences between paired data entries in
the dependent samples is calculated using, d
dn
=∑ , where n is the number of data pairs.
• DISTRIBUTION OF SAMPLE MEAN OF THE DIFFERENCE:
d follows approximately a t – distribution with degrees of freedom 1n − , if the following conditions are satisfied:
(a) the samples are randomly selected (b) the samples are dependent (paired) (c) both populations are normally distributed.
• General Steps for a Hypothesis Test between 2 Dependent Samples:
Step 1: State the claim mathematically. Identify the null, 0H and alternative, aH hypotheses. The possible hypotheses are:
0:
:a
d
d
H
H
k
k
µ ≥
µ < 0:
:a
d
d
H
H
k
k
µ ≤
µ > 0:
:a
d
d
H
H
k
k
µ =
µ ≠
Step 2: Identify (a) the type of test and (b) the level of significance, α of the hypothesis
test. Step 3: State that d follows t – distribution with d.f. = 1n − . Step 4: Determine the rejection criteria using rejection region.
Step 5: Find the standardized test statistic, d
ddt
s
n
− µ=
Step 6: Decide whether to reject or fail to reject 0H and interpret the decision in the
context of the original claim.
167
Example 11.4-1
An advertisement states that a particular lymphatic massage program will help participants lose weight after one month. The table shows the weights of 12 adults before and after the participating in the program. At 0.10α = , can you conclude that the massage program helps participants lose weight? Assume the weights are normally distributed.
Subject 1 2 3 4 5 6 7 8 9 10 11 12 Weight (Before) 157 185 120 212 230 165 207 251 196 140 137 172
Weight (After) 150 181 121 206 215 169 210 232 188 138 145 172
Solution:
Subject 1 2 3 4 5 6 7 8 9 10 11 12 d 7 4 -1 6 15 -4 -3 19 8 2 -8 0 d2 49 16 1 36 225 16 9 361 64 4 64 0
453.75
12
dd
n= = =∑ &
( )( ) ( )2 2
22
45845
12 7.84071 1 11
d
dd
d d nsn n
− −−
= = = =− −
∑∑∑
Step 1:
0: 0dH µ ≤
( ): 0a dH µ > claim
Step 2: It is a right-tailed test and 0.10α = Step 3: d follows t – distribution with d.f. = 12 1 11− = , 0
dµ = and
7.8407
2.263412
d
d
s
nσ = = =
Step 4: Reject 0H if 0.10 1.363t t> = . Step 5: Standardized test statistic,
3.75 01.657
2.2634d
d
dt
− µ −= = =
σ
Step 6: Since 0.101.657 1.363t t= > = , we reject
0H . At 0.10α = , there is enough evidence to support the claim that massage program helps participants lose weight.
168
Example 11.4-2
The table gives the blood pressures (in mm Hg) of seven adults before and after the completion of a special dietary plan.
Individual 1 2 3 4 5 6 7 Before 210 180 195 220 231 199 224
After 193 186 186 223 220 183 233
Let dµ be the mean of the differences between the systolic blood pressures before and after completing this special dietary plan for the population of all adults. Using the 5% significance level, can you conclude that the mean of the paired difference dµ is different from zero? Assume the blood pressures are normally distributed. Solution:
169
Tutorial 11: Hypothesis Testing with Two Populations
A Independent Samples
QUESTION 1 A study was designed to investigate the effect of a calcium-deficient diet on lead consumption in rats. One hundred rats were randomly divided into 2 groups of 50 each. One group served as a control group and the other was the experimental, or calcium-deficient group. The response record was the amount of lead consumed per rat. The results were summarized by:
CONTROL 15.2x =
11.1s =
150n =
EXPERIMENTAL 25.6x =
21.3s =
250n =
At α =0.05, is there sufficient evidence to suggest that the calcium deficient diet results in increased lead consumption in rats? QUESTION 2 A study was conducted to assess whether teenage boys worry more than teenage girls. A scale called the Anxiety Scale was used to measure the level of anxiety experienced by an individual. A higher value on the Anxiety Scale corresponds to a higher level of anxiety. The results obtained are summarized in the table below:
Sample size Sample Mean
Sample Standard Deviation
Boys 102 66.78 9.2 Girls 76 65.33 9.3
Is there sufficient evidence at the 5% level that teenage boys score higher on the Anxiety Scale than the teenage girls? QUESTION 3 An insurance company wants to know if the average speed at which men drive cars is higher than that of women drivers. The company took a random sample of 20 cars driven by men on an expressway and found the mean speed to be 89 km/h with a standard deviation of 3 km/h. Another sample of 18 cars driven by women on the same expressway gave a mean speed of 86 km/h with a standard deviation of 2.5 km/h. Assume that the speeds at which all men and all women drive cars on this expressway are normally distributed with unequal population standard deviations. Test at the 10% significance level whether the mean speed of cars driven by all men drivers on this expressway is higher than that of cars driven by all women drivers.
170
B Dependent Samples
QUESTION 1 Triglyceride is a type of fat found in fatty tissue. Individuals found with high level of triglyceride in their blood have a higher risk of contracting heart diseases. To determine if regular exercise can reduce triglyceride levels, researchers measured the triglyceride level of 8 individuals with mild high cholesterol before and after attending 3 months of intensive aerobics exercise program.
Individual 1 2 3 4 5 6 7 8 Before 200 226 218 246 195 278 254 237
After 135 206 146 172 175 224 233 192
Test, at the 5% significance level, if the aerobics exercise program has been effective in reducing triglyceride level in blood serum. Assume triglyceride levels are normally distributed. QUESTION 2 A dietitian wishes to see if a person’s cholesterol level (in mg/dL) will change if the diet is supplemented by a certain mineral. Six subjects were pretested and then they took the mineral supplement for a six-week period. The results are shown in the table below.
Subject 1 2 3 4 5 6 Before 210 235 208 190 172 244 After 190 170 210 188 173 228
a. State the underlying assumptions needed to perform a hypothesis testing in this context. b. Test, at the 10% significance level, whether there is a change in cholesterol level when the
mineral supplements the diet. QUESTION 3 Susan, the receiving clerk of a chemical distributor, is faced with a continuing problem of broken glassware which includes test tubes, petri dishes and flasks. Susan imposed some additional shipping precautions which she believes can prevent further breakage on these types of glassware. After a month of implementing the precautionary measures, she requested the purchasing clerk to provide her the information on the average number of broken items per shipment. Data from eight different suppliers given to the purchasing clerk are given below.
Supplier 1 2 3 4 5 6 7 8 Before 16 12 18 7 14 19 6 17 After 14 13 12 6 9 15 8 15
Does the data indicate, at 0.05α = , that the new measures have lowered the average number of broken items? Assuming the number of broken glassware is normally distributed.
171
Answers
A1 Reject 0H A2 Do not reject 0H A3 Reject 0H
B1 Reject 0H B2 Do not reject 0H B3 Reject 0H