Random Variable Qualitative (categorical) Quantitative (numeric) NominalOrdinal Ratio Interval...

Post on 17-Jan-2016

239 views 5 download

Tags:

Transcript of Random Variable Qualitative (categorical) Quantitative (numeric) NominalOrdinal Ratio Interval...

Random Variable

Qualitative (categorical)

Quantitative (numeric)

Nominal Ordinal RatioInterval

ContinuousDiscrete

SUMMARIZING NUMERIC DATA

• Simple Frequency Table

• Grouped Frequency Table

• Histogram

• Frequency Polygon

• Cumulative Frequency Distribution

• Arithmetic Mean

• Median

• Mode.

3- 3

Measures of Central Location

Mean for grouped data:

3- 4

N

fxmeanpopulation :

n

fxxmeansample

:

3- 5

MedianMedian for grouped data:

mf

Fn

cLmedian

2

3- 6

ModeMode for grouped data:

cffff

ffLodem

mm

m

21

1

Measures of Dispersion (Variability)

• Range

• Variance and Standard Deviation

• Coefficient of Variation

• Non-central Locations: Inter-fractile Ranges

Standard Deviation

(grouped data)

(ungrouped data)

)1(

)()( 22

nn

fxfxns

)1(

)()( 22

nn

xxns

Coefficient of variation:

%)100(x

sCV

68%

95%

99.7%

3- 10

Empirical Rule:

The Relative Positions of the Mean, Median, and Mode:

Symmetric Distribution

Zero skewness → :Mean =Median = Mode

M o d e

M ed ia n

M ea n

3- 11

Positively skewed: Mean>Median>Mode

M o d e

M ed ia n

M ea n

3- 12

Negatively Skewed: Mean<Median<Mode

M o d eM ea n

M ed ia n

3- 13

Non-Central Location Measures (Fractiles or Quantiles)

• Quartiles• Sextiles• Octiles• Deciles• Percentiles

n = sample sizeL = lower limit of jth quartile classF = < cumulative frequency of immediately preceding class.fQj = frequency of jth quartile class.

The jth quartile for grouped data is given by:

Calculating Quartiles for Grouped Data

jQj f

cFjn

LQ

4

Example

A sample of 20 randomly-selected hospitals in the US revealed the following daily charges (in $) for a semiprivate room.

153 159 142 146

141 140 130 148

142 163 134 151

122 167 137 152

143 168 159 1411.1 Using class intervals of width 10 units, construct a less-than cumulative frequency distribution of the above data. Let 120 units be the lower limit of the smallest class.

1.2 Draw a less-than ogive and use it to estimate the 80th percentile.

1.3 For the grouped data of question 1.1 above, calculate: 1.3.1 The mean, median and mode 1.3.2 The interquartile range.. 1.3.3 The coefficient of variation. Interpret the result obtained.

Solution

Class Freq, f <cum freq, F

120 - 130 1 1

130 - 140 3 4

140 - 150 8 12

150 - 160 5 17

160 - 170 3 20

  ∑ = 20

1.1

1.2

80th percentile = 158

Class Freq, f <cum freq, F midpt, x fx

120 - 130 1 1 125 125

130 - 140 3 4 135 405

140 - 150 8 12 145 1160

150 - 160 5 17 155 775

160 - 170 3 20 165 495

  ∑ = 20 ∑ = 2960

14820

2960

f

fxx

c

f

FLx

med

n

medmed2 5.14710

8

410140

cfff

ffLx

e

eee )2( 21mod

1modmodmod 3.14610

)5316(

)38(140

1.3.1

Class Freq, f <cum freq, F

120 - 130 1 1

130 - 140 3 4

140 - 150 8 12

150 - 160 5 17

160 - 170 3 20

  ∑ = 20

7.143.141156

3.141108

)45(140

156105

1215150

13

1

3

QQIQR

Q

Q

1.3.2

Class Midpt, x fx fx2

120 - 130 125 125 15625

130 - 140 135 405 54675

140 - 150 145 1160 168200

150 - 160 155 775 120125

160 - 170 165 495 81675

  ∑ = 2960 ∑ = 440300

1.3.3

8.1019

20/2960440300

)1(

/)( 222

n

nfxfxs

CV = standard deviation/mean

→ CV = 10.8/148 0.073 ≡ 7.3% → data clustered around mean.

BASIC PROBABILITY CONCEPTS

• Random Experiment• Sample Space• Event• Collectively Exhaustive Events • Dependent Events • Independent Events

• Marginal Probability

• Joint Probability: P(A∩B) = P(B∩A) • Conditional Probability: P(A|B) = P(A∩B)/P(B) P(B|A) = P(A∩B)/P(B)

.

Complement Rule:

P(A’) = 1 – P(A) or P(A) = 1 – P(A’)

P(A and B) = P(AB) = P(A)P(B/A) or

P(A and B) = P(AB) = P(B)P(A/B)

General Multiplication Rule:

Special Multiplication Rule:

P(A and B) = P(A)P(B) = P(B)P(A)

Special Addition Rule:

P(A or B) = P(A)+P(B)

GeneralAddition Rule:

P(A or B) = P(A)+P(B) – P(A and B)

Example

A company manufactures a total of 8000 motorcycles a month in three plants A, B and C. Of these, plant A manufactures 4000, and plant B manufactures 3000. At plant A, 85 out of 100 motorcycles are of standard quality or better. At plant B, 65 out of 100 motorcycles are of standard quality or better and at plant C, 60 out of 100 motorcycles are of standard quality or better. The quality controller randomly selects a motorcycle and finds it to be of substandard quality. Calculate the probability that it has come from plant B.

Solution

P(B/substd) = No. of substd items from B/Total no. of substd items

No of substd items from A = 4000x(100 – 85)/100 = 40x15 = 600 No of substd items from B = 3000x(100 – 65)/100 = 30x35 = 1050 No of substd items from C =1000x(100 – 60)/100 = 10x40 = 400   Total number of substd items = 600 +1050 + 400 = 2050 P(B/substd) = 1050/2050 = 0.512 

PROBABILITY DISTRIBUTIONS

• Properties

• Discrete distributions

• Normal distributions

xnx

xnx

nxP

)1(

)!(!

!)(

Binomial Probability Distribution

Example

According to a leading newspaper, the largest cellular phone service in the US has about 36 million subscribers out of a total of 180 million cell phone users. If six cell phone users are randomly selected, what is the probability that at least two of them subscribes to this service?

xnx

xnx

nxP

)1(

)!(!

!)(

2.0180/36

)1()0(1)2( PPxP

262.0)2.01()2.0()!06(!0

!6)0( 60

P

393.0)2.01()2.0()!16(!1

!6)1( 51

P

345.0393.0262.01)2( xP

n = 6

!)(

x

exP

x

Poisson Probability Distribution

Example

Customers arrive randomly and independently at a service point at an average rate of 30 per hour.

1. Calculate the probability that exactly 20 customers arrive at the service point during any given hour.

2. Calculate the probability that during any 5 minute period at least 3 customers arrive at the service point.

ex

xPx

!)(

)2()1()0(1)3( PPPxP

5.20

!0

5.2)0( eP

5.21

!1

5.2)1( eP 5.2

2

!2

5.2)2( eP

5.20

!0

5.2 e 5.21

!1

5.2 e5.2

2

!2

5.2 e

; λ = 30/60 min = 2.5/5 min

→ P(x ≥ 3) = 1 -

- = 0.497

- -

2.

0134.0!20

30)10( 20

20

eP1.

Solution

λ = 30/hr

x

z

Standard normal or z-distribution

Normal probability distribution

- 5

0 . 4

0 . 3

0 . 2

0 . 1

. 0

x

f(

x

r a l i t r b u i o n : m = 0 , s2 = 1

Mean, median, andmode are equal

Theoretically, curve extends to infinity

a

Normal Distribution

Normal curve is symmetrical

Area between 0 and z

z  0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141

0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879

0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224

0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133

0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389

1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545

1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767

2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857

2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890

2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936

2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964

2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974

2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981

2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

Example

Six hundred candidates wrote an entrance test for admission to a management course. The marks obtained by the candidates were found to be normally distributed with a mean of 132 marks and a standard deviation of 18 marks.

1. How many candidates scored between 140 and 160 marks?

2. If the top 60 performers were given confirmed admission, calculate the minimum mark (to the nearest integer) above which a candidate would be guaranteed admission?

x

z

Solution

Z1 =(140 -132)/18 = 0.4444 → P1 ≈ 0.172

Z2 =(160 -132)/18 = 1.5556 → P2 ≈ 0.440

→ P (160<X<140) ≈ 0.440 – 0.172 = 0.268 → 0.268 x 600 students ≈ 161 students

1.

cc

xz

15528.118

132

18

132

c

cc xxx

Let xc denote the minimum mark.

60/600 = 0.1 = 10%. P(0 <z<zc) = 0.50 - 0.10 = 0.4 → zc = 1.28

2.

HYPOTHESIS TESTING

• What is a Hypothesis?

• What is Hypothesis Testing?

Basic Terms

• Null hypothesis• Alternative hypothesis• Level of significance• Type I error• Type II error• Critical value• Test statistic• Rejection area• Acceptance area• One-tailed test• Two-tailed Test

Five-Step Procedure for Hypothesis TestingFive-Step Procedure for Hypothesis Testing

Step 1: State the null and alternative hypotheses

Step 3: Identify and calculate the test statistic

Step 4: Formulate and apply the decision rule

Step 2: Determine the critical value associated with the the level of significance

Step 5: Draw a conclusion

Test statistic:

Large sample( n Large sample( n > 30) 30)

Testing a Single Population Mean

Small sample( n <Small sample( n < 30) 30)

n

xttest

Test statistic:

n

xztest

t table with right tail probabilities

df\p 0.4 0.25 0.1 0.05 0.025 0.01 0.005 0.0005

1 0.32492 1 3.077684 6.313752 12.7062 31.82052 63.65674 636.6192

2 0.288675 0.816497 1.885618 2.919986 4.30265 6.96456 9.92484 31.5991

3 0.276671 0.764892 1.637744 2.353363 3.18245 4.5407 5.84091 12.924

4 0.270722 0.740697 1.533206 2.131847 2.77645 3.74695 4.60409 8.6103

5 0.267181 0.726687 1.475884 2.015048 2.57058 3.36493 4.03214 6.8688

6 0.264835 0.717558 1.439756 1.94318 2.44691 3.14267 3.70743 5.9588

7 0.263167 0.711142 1.414924 1.894579 2.36462 2.99795 3.49948 5.4079

8 0.261921 0.706387 1.396815 1.859548 2.306 2.89646 3.35539 5.0413

9 0.260955 0.702722 1.383029 1.833113 2.26216 2.82144 3.24984 4.7809

10 0.260185 0.699812 1.372184 1.812461 2.22814 2.76377 3.16927 4.5869

11 0.259556 0.697445 1.36343 1.795885 2.20099 2.71808 3.10581 4.437

12 0.259033 0.695483 1.356217 1.782288 2.17881 2.681 3.05454 4.3178

13 0.258591 0.693829 1.350171 1.770933 2.16037 2.65031 3.01228 4.2208

14 0.258213 0.692417 1.34503 1.76131 2.14479 2.62449 2.97684 4.1405

15 0.257885 0.691197 1.340606 1.75305 2.13145 2.60248 2.94671 4.0728

16 0.257599 0.690132 1.336757 1.745884 2.11991 2.58349 2.92078 4.015

17 0.257347 0.689195 1.333379 1.739607 2.10982 2.56693 2.89823 3.9651

18 0.257123 0.688364 1.330391 1.734064 2.10092 2.55238 2.87844 3.9216

19 0.256923 0.687621 1.327728 1.729133 2.09302 2.53948 2.86093 3.8834

20 0.256743 0.686954 1.325341 1.724718 2.08596 2.52798 2.84534 3.8495

21 0.25658 0.686352 1.323188 1.720743 2.07961 2.51765 2.83136 3.8193

22 0.256432 0.685805 1.321237 1.717144 2.07387 2.50832 2.81876 3.7921

23 0.256297 0.685306 1.31946 1.713872 2.06866 2.49987 2.80734 3.7676

24 0.256173 0.68485 1.317836 1.710882 2.0639 2.49216 2.79694 3.7454

25 0.25606 0.68443 1.316345 1.708141 2.05954 2.48511 2.78744 3.7251

26 0.255955 0.684043 1.314972 1.705618 2.05553 2.47863 2.77871 3.7066

27 0.255858 0.683685 1.313703 1.703288 2.05183 2.47266 2.77068 3.6896

28 0.255768 0.683353 1.312527 1.701131 2.04841 2.46714 2.76326 3.6739

Test statistic:

Large sample( n > 30)

Testing a Single Population Proportion:

n

pztest

)1(

Small sample( n< 30)

Test statistic:

n

pttest

)1(

Tests Involving Two Sample Means

2

22

1

21

2121 )(

ns

ns

xxztest

Example

A union representing workers at a large industrial concern accused management that discriminatory wages were paid to the workers in two production facilities, A and B. It claimed that workers in facility A were being paid less than those in facility B. The company investigates the claim by examining the pay of 70 workers from each production facility. The results were as follows.

Facility A Facility B

Mean salary $455.00 $463.00

Std deviation $10.00 $13.00

What conclusion did the company reach? Investigate at the 5% level of significance.

BA

BA

081.470/16970/100

463455

// 22

BBAA

BAtest

nn

xxz

Solution

H1:

→ two tailed-test nA, nB > 30 → z test. α = 5% → zcrit = 1.96

Since │zcrit │ > │zcrit│ reject H0

→ Sufficient statistical evidence to suggest a significant difference in the salaries.

H0:

Tests Involving Two Sample Proportions

21

2121

11

)(

nnpq

ppztest

21

2211

nn

pnpnp

pq 1

Example

Surveys were conducted in two major cities “A” and “B” to ascertain viewer habits regarding a popular television channel. In city “A”, 1000 people were interviewed and 680 said they viewed the channel. In city “B”, 600 people were interviewed and 444 said they viewed the channel. Investigate, at the 5% level of significance, whether there is a significant difference between the viewing habits in the two cities.

BA

BA

7025.06001000

444680

BA

BBAA

nn

npnpp

54.2

600/11000/12975.07025.0

600/4441000/680

)/1/1(

BA

BAtest

nnpq

ppz

H0:

H1:

→ two tailed-test; α = 5% → zcrit = 1.96

q = 1 – p = 0.2975

Since │ztest │> │zcrit │, reject H0 at the 5% level of significance.→ Sufficient statistical evidence to suggest a significant difference in the viewing habits.

Major Characteristics:

positively skewed

non-negative

family of chi-square distributions

Chi-square Applications

H0: There is no difference between the observed and expected frequencies.

H1: There is a difference between the observed and the expected frequencies.

Test statistic:

e

eostat f

ff 22

The critical value is a chi-square value with (k-1) degrees of freedom, where k is the number of categories

Right tail areas for the Chi-square Distribution

df\area 0.995 0.99 0.975 0.95 0.90 0.75 0.5 0.25 0.10 0.05 0.025 0.01 0.005

1 0.00004 0.00016 0.00098 0.00393 0.01579 0.10153 0.45494 1.3233 2.70554 3.84146 5.02389 6.6349 7.87944

2 0.01003 0.0201 0.05064 0.10259 0.21072 0.57536 1.38629 2.77259 4.60517 5.99146 7.37776 9.21034 10.5966

3 0.07172 0.11483 0.2158 0.35185 0.58437 1.21253 2.36597 4.10834 6.25139 7.81473 9.3484 11.3449 12.8382

4 0.20699 0.29711 0.48442 0.71072 1.06362 1.92256 3.35669 5.38527 7.77944 9.48773 11.1433 13.2767 14.8603

5 0.41174 0.5543 0.83121 1.14548 1.61031 2.6746 4.35146 6.62568 9.23636 11.0705 12.8325 15.0863 16.7496

6 0.67573 0.87209 1.23734 1.63538 2.20413 3.4546 5.34812 7.8408 10.6446 12.5916 14.4494 16.8119 18.5476

7 0.98926 1.23904 1.68987 2.16735 2.83311 4.25485 6.34581 9.03715 12.017 14.0671 16.0128 18.4753 20.2777

8 1.34441 1.6465 2.17973 2.73264 3.48954 5.07064 7.34412 10.2189 13.3616 15.5073 17.5346 20.0902 21.955

9 1.73493 2.0879 2.70039 3.32511 4.16816 5.89883 8.34283 11.3888 14.6837 16.919 19.0228 21.666 23.5894

10 2.15586 2.55821 3.24697 3.9403 4.86518 6.7372 9.34182 12.5489 15.9872 18.307 20.4832 23.2093 25.1882

11 2.60322 3.05348 3.81575 4.57481 5.57778 7.58414 10.341 13.7007 17.275 19.6751 21.9201 24.725 26.7569

12 3.07382 3.57057 4.40379 5.22603 6.3038 8.43842 11.3403 14.8454 18.5494 21.0261 23.3367 26.217 28.2995

13 3.56503 4.10692 5.00875 5.89186 7.0415 9.29907 12.3398 15.9839 19.8119 22.362 24.7356 27.6883 29.8195

14 4.07467 4.66043 5.62873 6.57063 7.78953 10.1653 13.3393 17.1169 21.0641 23.6848 26.119 29.1412 31.3194

15 4.60092 5.22935 6.26214 7.26094 8.54676 11.0365 14.3389 18.2451 22.3071 24.9958 27.4884 30.5779 32.8013

16 5.14221 5.81221 6.90766 7.96165 9.31224 11.9122 15.3385 19.3689 23.5418 26.2962 28.8454 31.9999 34.2672

17 5.69722 6.40776 7.56419 8.67176 10.0852 12.7919 16.3382 20.4887 24.769 27.5871 30.191 33.4087 35.7185

18 6.2648 7.01491 8.23075 9.39046 10.8649 13.6753 17.3379 21.6049 25.9894 28.8693 31.5264 34.8053 37.1565

19 6.84397 7.63273 8.90652 10.117 11.6509 14.562 18.3377 22.7178 27.2036 30.1435 32.8523 36.1909 38.5823

20 7.43384 8.2604 9.59078 10.8508 12.4426 15.4518 19.3374 23.8277 28.412 31.4104 34.1696 37.5662 39.9969

Helped Harmed No Effect Total

Drug 150 30 70 250

Sugar Pills 130 40 80 250

Total 280 70 150 500

A certain drug is claimed to be effective in curing the common cold. In a clinical trial involving 500 patients having the common cold, 250 were given the drug and the rest were given sugar pills. The patients’ reactions to the treatment are recorded in the table below.

On the basis of the above data, can it be concluded, at the 5% significance level, that there is a significant difference in the effect of the drug and sugar pills?

Example

f0 fe f0 – f0 (f0 - f0)2/fe

150 140 -10 0.714330 35 5 0.714370 75 5 0.3333

130 140 10 0.714340 35 -5 0.714380 75 -5 0.3333

= 3.524

991.52 crit

0fef ef0fef

22 524.3 critcalc

H0: No significant difference in effect of drug and sugar pills.

H1: There is a significant difference in effect of drug and sugar pills.

α = 0.05, df = (2-1)(3-1) = 2 →

Hence do not reject H0 at α = 0.05.

→ insufficient statistical evidence to suggest that there is a significant difference between drug and sugar pills.

• Correlation analysis• Scatterplot• Correlation coefficient• Dependent and independent variables• The coefficient of determinationcoefficient of determination • Linear regression equation

LINEAR REGRESSION AND CORRELLATION

2222 yynxxn

yxxynr

Correlation Coefficient Formula:

The coefficient of determination =coefficient of determination = r2

b = slope of the line.

Y' = average predicted value of Y for any X.

a = Y-intercept = estimated Y value when X=0

The regression equation : Y' = a + bX

n

xbya

22

xxn

yxxynb

Example

The following data relates to the training periods and average weekly sales of seven randomly selected salesmen in a large company.

Salesman Training (hours) Ave weekly sales ($’000)

A 20 44

B 5 22

C 10 35

D 13 32

E 12 27

F 8 26

G 15 35

1. Calculate the correlation coefficient. Comment on the value obtained.

2. Determined the coefficient of determination and interpret the value obtained.

3. Assuming a linear relation between the variables in the given data, obtain the regression equation connecting the variables.

4. Estimate the weekly sales of a salesman who had 22h of training. Is the result reliable? Explain.

Solution

x y x2 Y2 xy

20 44 400 1936 880

5 22 25 484 110

10 35 100 1225 350

13 32 169 1024 416

12 27 144 729 324

8 26 64 676 208

15 35 225 1225 525

83 221 1127 7299 2813

1. Let x denote training period (in hours) and let y denote sales (in $’000)

2222 yynxxn

yxxynr

9.0)22172997)(8311277(

221832813722

xx

x

strong positive linear relationship between x and y

2. r2 = 0.81 81% of variation in Y due to variation in X. The remaining 19% due to other factors.

22 xxn

yxxynb 35.1

8311277

22183281372

x

x

xbya

=

= 221/7 – 1.35 x 83/7 =15.56 → y = 15.56 +1.35x

3.

4. When x = 22 hours, y = 15.56 + 1.35 x 22 = 45.3 x $1000 = $45300

No. Regression equation valid only in the domain 5 ≤ x ≤ 20

TIME SERIES AND FORECASTING

Components

• The Irregular Variation (I)

Multiplicative Model: Y = T.C.S.I

• The Secular Trend (T)

• The Cyclical Variation (C)

• The Seasonal Variation (S)

The linear trend equation :

T = a + bt

Moving average Centred moving average Ratio to centred moving average Adjusted seasonal average Deasonalizing a series.

Seasonal Indices

Year Q1 Q2 Q3 Q4

2008 14.0 15.6 21,5 18.3

2009 13.1 14.7 24.8 19.4

2010 14.4 17.3 25.6 15.8

The Following table gives the quarterly healthcare claims (in R millions) against all healthcare claims for the period 2008 to 2010.

1. Represent the above data in as time series plot.2.Calculate the quarterly seasonal indices for healthcare claims using the ratio-to moving average method. Interpret the results.3. Derive a trend line using the method of least squares4.Estimate the seasonally-adjusted trend value of health care claims for the third quarter of 2011.

Example

Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

2008 2009 2010

Quarterly Healthcare Claims ( in Rm) for the period 2008 - 2010

1.

Season Data(Rm)

4MA(Rm)

Centred4MA (Rm)

Unadj. SI(%)

2008 Q1 14.0  -  - - 

Q2 15.6  -  - -

Q3 21.5 17.350 17.238 124.7

Q4 18.3 17.125 17.013 107.6

2009 Q1 13.1 16.900 17.313 75.7

Q2 14.7 17.725 17.863 82.3

Q3 24.8 18.000 18.163 136.5

Q4 19.4 18.325 18.650 104.0

2010 Q1 14.4 18.975 19.075 75.5

Q2 17.3 19.175 18.725 92.4

Q3 25.6 18.275 -  - 

Q4 15.8  - - -

2.

Q1 Q2 Q3 Q4

2 008 124.7 107.6

2 009 75.7 82.3 136.6 104.0

2 010 75.6 92.4 - -

Mean SI 75.7 87.4 130.7 105.8

Adj. SI 75.7 87.5 130.9 106.0

 

The annual seasonal influences are as follows:

Q1: substantial decrease of 24.3%Q2: decrease of 12.5%Q3: substantial increase of 30.9%Q4: increase of 6.0%

t T t2 tT1 14.0 1 14.02 15.6 4 31.23 21.5 9 64.54 18.3 16 73.25 13.1 25 65.56 14.7 36 88.27 24.8 49 173.68 19.4 64 155.29 14.4 81 129.6

10 17.3 100 173.011 25.6 121 281.612 15.8 144 189.6

∑ = 78 ∑ = 214.5 ∑ = 650 ∑ = 1439.2

T(t) = 15.9 +0.31t

3.

Adj. Estimate for Q3 of 2011:

Y(2011, Q3) = T(15) x 1.309 = (15.9 + 0.31 x 15) x 1.309 = 26.9 ≡ R26.9m

4.

STATISTICAL DECISION THEORY

Components to Decision-Making Situation

• Decision alternatives or acts

• Payoffs

• States of nature

• Minimax Regret Strategy

• Maximin Strategy

• Maximax Strategy

Decision Making Without Probabilities

• Expected Payoff or Expected Monetary Value (EMV)

Decision Making with Probabilities

• Payoff table

Decision Trees

• Decision nodes

• Even nodes

• Tree Structure

• EMV calculations

Example

A large corporation arranged to use an ocean linear as a floating hotel for its annual convention. The shipping company had to make a decision whether or not to lease the ship. If leased, the company would get a flat fee and an additional percentage of profits from the convention, which could attract as many as 50000 people. The company’s analysts estimated that if the ship were leased there would be a 50% chance of realizing a profit of $700000, a 30% chance of making a profit of $800000, 15% chance of making a profit of $900000 and a 5% chance of making a profit of $1m.If the ship were not leased, it could be used for its usual voyage over the convention duration. In this case there would a 90% probability of making a profit of $750000 and a 10% probability that profits would be $780000.

The company has one additional option. It the ship were leased, and it became clear within the first few days of the convention that the profits were going to be in the $700000 range, the company could choose to promote the convention on its own by offering participants discounts on the ocean liner’s cruises. The company’s analysts believe that if this action were chosen there would be a 60% chance that profits would increase to $740000 and a 40% chance that the promotion would fail, lowering profits to $680000.

4.1 Draw a decision tree to depict the above problem.

4.2 What decision should the shipping company take? Show all working.

Lease

Do not

lease

0.1

0.9

0.3

0.15

0.05

0.5Promote

Do not

Promote

0.4

0.6

$700000

$680000

$740000

$800000

$900000

$1000000

$750000

$780000A

B

C

D

4.2

EMV = max[EMV(A), EMV(B)]

EMV(A) = $780000 x 0.1 + $750000 x 0.9 = $753000

EMV(B) = $1000000x0.05 + $900000x0.15 + $800000x0.3 + 0.5xEMV(C) = $425000+0.5xEMV(C)

EMV(C) = max[$700000, EMV(D)]

= max[$700000, $680000x0.4 + $740000x0.6] = $716000 → promote

Hence EMV (B) = $425000 + $716000x0.5 = $783000 → EMV = $783000

Decision: Lease and then promote the convention if profits from lease are in the $700000 range.