Lecture 3

24
ECON1203/ECON2292 Business and Economic Statistics Week 3

Transcript of Lecture 3

Page 1: Lecture 3

ECON1203/ECON2292 Business and Economic

Statistics

Week 3

Page 2: Lecture 3

2

Week 3 topics

Data collection and sampling Random sampling

Probability distributions Marginal, conditional & joint distributions

Sampling with & without replacement

Key references Keller 5.1-5.4, 6.1-6.3

Page 3: Lecture 3

3

Stages of statistical analysis Define & understand the problem

E.g., firm wants to determine effectiveness of its advertising Collect data Analyse data

Use descriptive statistics to organize the data What are the sales before and after the ads?

Extract information about population parameters Can conclude that the sales after the ads is higher than the sales

before the ads? Communicate results Class project will provide experience in such analyses

Page 4: Lecture 3

4

Data collection In practice can often find secondary data Data collected by someone else (e.g., ABS)

possibly for some other purpose Alternatively could collect primary data E.g., market researchers using mail survey or

phone interview to determine impact of an ad

Page 5: Lecture 3

Data collection... In an observational study, the researcher observes the

behaviour of the individuals in the study without trying to influence the outcome of the study. The data obtained is called observational data

E.g., a researcher of a pharmacy company wants to determine the effect of a drug on heart attacks. The research may select a sample of people and ask if they have taken the drug in the last 2 years and the number of incidences of heart attacks they had over the same period.

5

Page 6: Lecture 3

Data collection... In an experimental study, the researcher manipulates

the independent (explanatory) variables and tries to determine how the independent variables influence the dependent (response) variable. The data obtained is called experimental data

The researcher may randomly select some people and divide them into two groups. One group take the drug regularly (treatment group) and the other don’t (control group). The researcher then observes the number of incidences of heart attacks in two groups over 2 years. Statistical methods can be used to determine the influence of the drug.

6

Page 7: Lecture 3

7

Threats to comparing -confoundment Does radiation from mobile phones cause cancer? An observational study would compare a sample of users

with non-users If users have a higher incidence of cancer is this

evidence that mobile phones cause cancer? Note: There is no consistent evidence that this is the case But even if a relationship was found you need to control

for other factors that might explain the finding Possible confounding factors?

Higher use of mobile phones in cities where exposure to other forms of radiation is higher

Page 8: Lecture 3

8

Threats to comparing -confoundment … Suppose an experimental study of the mobile

phone cancer link is designed Subjects randomly assigned to

Control group of non-users Treatment group of users

Wait a few years & observe differences in cancer rates

Why is group allocation done at random?

Page 9: Lecture 3

9

Data collection… Data collection requires Definition of target population Method of sampling

Method of sampling must avoid the problem of bias,where design of sample systematically favours certain outcomes

What’s wrong with phone-in polls on talkback radio? Example of a self-selected sample

Method of simple random sampling: process by which all samples of the same size n are equally likely to be chosen from the population of interest

Page 10: Lecture 3

Data collection… Producing a simple random sample

List and number all the individuals in the population of interest from 1 to N

Use a random number table or computer to randomly generate n numbers

Stratified random sampling: separate the population into mutually exclusive sets/stratum, and then draw a simple random sample from each set/strata How to draw a sample of households in Australia?

Cluster sample: separate the population into mutually exclusive sets/clusters, then draw a simple random sample of the clusters

How to draw a sample of the students in today’s class?

10

Page 11: Lecture 3

11

Probability

Mathematical means of studying uncertainty E.g., setting premiums for insurance policies

Provides logical foundation of statistical inference Making decisions on basis of partial information E.g., what is probability of sell out at upcoming concert

given data on past concerts, weather forecasts, etc?

Page 12: Lecture 3

Probability review A random experiment is an action or process that has

several possible outcomes The set of all possible outcomes is a sample space An event consists of one or several outcomes Probability of event E, denoted by P(E), measures the

chance/likeliness that E will occur

12

outcomes possible ofnumber totalEin outcomes ofnumber )( =EP

1)(0 ≤≤ EP

Page 13: Lecture 3

Probability review...

13

Page 14: Lecture 3

Probability review… Often easier to work with probability of the event

complementary to event E, ‘not E’, or Ec

P(Ec) = 1 ─ P(E)

Conditional probability that E occurs, given that F has is defined by P(E│F) = P(E and F) / P(F) Similarly, P(F |E) = P(E and F) / P(E)

Rearranging yields the multiplication rule P(E and F) = P(F)P(E |F)

If P(E|F) = P(E), then E & F are said to be independent i.e., conditioning has no effect Equivalently, P(F|E) = P(E), or P(E and F) = P(E)P(F)

14

Page 15: Lecture 3

15

SIA: Private health insurance

Abbott defends private health rebate July 17, 2006 - 3:16PM http://www.smh.com.au/articles/2006/07/17/1152988457337.html Health Minister Tony Abbott has hit out at critics of the Federal Government's private health insurance rebate. "I think that there has been a consistent line of ideological hostility to the private health insurance rebate from a certain style of academic and indeed from the Labor Party," Mr Abbott told ABC Radio today. Mr Abbott's comments came after University of NSW professor Denzil Fiebig released a study of the rebate's effectiveness. Professor Fiebig claimed the rebate was unsustainable and was no guarantee that people with private health insurance would use the private health system over the public one. But Mr Abbott said the rebate took pressure off public hospitals. "If we didn't have the rebate we'd have a lot more pressure on the public hospital system," Mr Abbott said. He said Professor Fiebig was part of an ideological group who would never support private health care. Giving people choice between private and public health care was vital, Mr Abbott said.

Page 16: Lecture 3

16

SIA: Private health insurance…

Page 17: Lecture 3

17

SIA: Private health insurance… What is the relationship between having private

health insurance (PHI) & being admitted to hospital as a public or private patient?

Data from ABS 2001 National Health Survey was used to derive following table of relative frequencies

Admission to hospital in last 12 months Not admitted Admitted as

public patient Admitted as

private patient Totals

Have PHI 40.3 1.5 5.4 47.2 No PHI 45.3 6.6 0.9 52.8 Totals 85.6 8.1 6.3 100.0

Page 18: Lecture 3

18

SIA: Private health insurance… Treat relative frequencies

as probabilities Find probability of the

following events: Have PHI Have PHI & admitted to

hospital Admitted as a private

patient given have PHI Admitted as a public patient

given have PHI Are “admission as private”

& “have PHI” independent events?

Marginal probability P(PHI) = 0.472

Joint probability P(PHI and admitted)

= 0.015 + 0.054 = 0.069

Conditional probabilities P(admitted as private| PHI ) P(admitted as public| PHI )

= 0.054/0.472 = 0.114 = 0.015/0.472 = 0.032

Independence Admission & PHI status are related (dependent) P(admitted as private| PHI ) ≠ P(admitted as private)

Page 19: Lecture 3

19

Auditing example Let’s draw on both topics of probability &

sampling Background Local auditing firm services 100 companies 10 of these companies are known to have overdue

accounts If 2 separate firms are chosen at random What is probability distribution for number of accounts

overdue?

Page 20: Lecture 3

20

Auditing example…

009.0 182.0809.01overdue) two(

182.09990

10010

9910

10090

) and () and (overdue) one(

809.09989

10090

)|()() and (overdue) none(

account overdue having firm ith denote Let

2121

121

21

=−−=

=

+

=

+=

=

=

=

=

P

eePeePP

eePePeePP

e

CC

CCC

CC

i

Page 21: Lecture 3

21

Sampling Auditing example illustrates distinction between

sampling with & without replacement Because problem specified separate firms sampling was

done without replacement Such sampling induces dependence across events

P(e2) depends on type of firm sampled in first draw Independence & hence random sampling requires

sampling with replacement Distinction not of practical importance if population is large

Redo auditing example assuming sampling with replacement

Page 22: Lecture 3

22

Sampling… Events in experiments such as auditing example

may be represented by probability trees These are diagrams (that resemble trees)

Keller Ex 6.5 looks at selecting 2 students (without replacement) from 10 students (3 F, 7 M)

Page 23: Lecture 3

23

Auditing example revisted Assumed probability of an overdue account was

known Enabled determination of probability distribution of

number of overdue accounts Consider a different problem Suppose probability of an overdue account is not known Given sample of companies auditing firm can determine

sample proportion of these that have overdue accounts How can this statistic be used to infer something about the

true population proportion of companies with overdue accounts?

This is an inference problem

Page 24: Lecture 3

24

Progress report #2 In SIA admission distributions Assign probabilities to qualitative characteristics Public or private patient

In auditing example Probabilities assigned to quantitative characteristics Probabilities of number of overdue accounts Topic of random variables

Need to introduce theoretical distributions that are useful in representing/modelling actual data Initially discrete distribution Binomial