Lecture 3
-
Upload
chris-chow -
Category
Documents
-
view
214 -
download
0
Transcript of Lecture 3
ECON1203/ECON2292 Business and Economic
Statistics
Week 3
2
Week 3 topics
Data collection and sampling Random sampling
Probability distributions Marginal, conditional & joint distributions
Sampling with & without replacement
Key references Keller 5.1-5.4, 6.1-6.3
3
Stages of statistical analysis Define & understand the problem
E.g., firm wants to determine effectiveness of its advertising Collect data Analyse data
Use descriptive statistics to organize the data What are the sales before and after the ads?
Extract information about population parameters Can conclude that the sales after the ads is higher than the sales
before the ads? Communicate results Class project will provide experience in such analyses
4
Data collection In practice can often find secondary data Data collected by someone else (e.g., ABS)
possibly for some other purpose Alternatively could collect primary data E.g., market researchers using mail survey or
phone interview to determine impact of an ad
Data collection... In an observational study, the researcher observes the
behaviour of the individuals in the study without trying to influence the outcome of the study. The data obtained is called observational data
E.g., a researcher of a pharmacy company wants to determine the effect of a drug on heart attacks. The research may select a sample of people and ask if they have taken the drug in the last 2 years and the number of incidences of heart attacks they had over the same period.
5
Data collection... In an experimental study, the researcher manipulates
the independent (explanatory) variables and tries to determine how the independent variables influence the dependent (response) variable. The data obtained is called experimental data
The researcher may randomly select some people and divide them into two groups. One group take the drug regularly (treatment group) and the other don’t (control group). The researcher then observes the number of incidences of heart attacks in two groups over 2 years. Statistical methods can be used to determine the influence of the drug.
6
7
Threats to comparing -confoundment Does radiation from mobile phones cause cancer? An observational study would compare a sample of users
with non-users If users have a higher incidence of cancer is this
evidence that mobile phones cause cancer? Note: There is no consistent evidence that this is the case But even if a relationship was found you need to control
for other factors that might explain the finding Possible confounding factors?
Higher use of mobile phones in cities where exposure to other forms of radiation is higher
8
Threats to comparing -confoundment … Suppose an experimental study of the mobile
phone cancer link is designed Subjects randomly assigned to
Control group of non-users Treatment group of users
Wait a few years & observe differences in cancer rates
Why is group allocation done at random?
9
Data collection… Data collection requires Definition of target population Method of sampling
Method of sampling must avoid the problem of bias,where design of sample systematically favours certain outcomes
What’s wrong with phone-in polls on talkback radio? Example of a self-selected sample
Method of simple random sampling: process by which all samples of the same size n are equally likely to be chosen from the population of interest
Data collection… Producing a simple random sample
List and number all the individuals in the population of interest from 1 to N
Use a random number table or computer to randomly generate n numbers
Stratified random sampling: separate the population into mutually exclusive sets/stratum, and then draw a simple random sample from each set/strata How to draw a sample of households in Australia?
Cluster sample: separate the population into mutually exclusive sets/clusters, then draw a simple random sample of the clusters
How to draw a sample of the students in today’s class?
10
11
Probability
Mathematical means of studying uncertainty E.g., setting premiums for insurance policies
Provides logical foundation of statistical inference Making decisions on basis of partial information E.g., what is probability of sell out at upcoming concert
given data on past concerts, weather forecasts, etc?
Probability review A random experiment is an action or process that has
several possible outcomes The set of all possible outcomes is a sample space An event consists of one or several outcomes Probability of event E, denoted by P(E), measures the
chance/likeliness that E will occur
12
outcomes possible ofnumber totalEin outcomes ofnumber )( =EP
1)(0 ≤≤ EP
Probability review...
13
Probability review… Often easier to work with probability of the event
complementary to event E, ‘not E’, or Ec
P(Ec) = 1 ─ P(E)
Conditional probability that E occurs, given that F has is defined by P(E│F) = P(E and F) / P(F) Similarly, P(F |E) = P(E and F) / P(E)
Rearranging yields the multiplication rule P(E and F) = P(F)P(E |F)
If P(E|F) = P(E), then E & F are said to be independent i.e., conditioning has no effect Equivalently, P(F|E) = P(E), or P(E and F) = P(E)P(F)
14
15
SIA: Private health insurance
Abbott defends private health rebate July 17, 2006 - 3:16PM http://www.smh.com.au/articles/2006/07/17/1152988457337.html Health Minister Tony Abbott has hit out at critics of the Federal Government's private health insurance rebate. "I think that there has been a consistent line of ideological hostility to the private health insurance rebate from a certain style of academic and indeed from the Labor Party," Mr Abbott told ABC Radio today. Mr Abbott's comments came after University of NSW professor Denzil Fiebig released a study of the rebate's effectiveness. Professor Fiebig claimed the rebate was unsustainable and was no guarantee that people with private health insurance would use the private health system over the public one. But Mr Abbott said the rebate took pressure off public hospitals. "If we didn't have the rebate we'd have a lot more pressure on the public hospital system," Mr Abbott said. He said Professor Fiebig was part of an ideological group who would never support private health care. Giving people choice between private and public health care was vital, Mr Abbott said.
16
SIA: Private health insurance…
17
SIA: Private health insurance… What is the relationship between having private
health insurance (PHI) & being admitted to hospital as a public or private patient?
Data from ABS 2001 National Health Survey was used to derive following table of relative frequencies
Admission to hospital in last 12 months Not admitted Admitted as
public patient Admitted as
private patient Totals
Have PHI 40.3 1.5 5.4 47.2 No PHI 45.3 6.6 0.9 52.8 Totals 85.6 8.1 6.3 100.0
18
SIA: Private health insurance… Treat relative frequencies
as probabilities Find probability of the
following events: Have PHI Have PHI & admitted to
hospital Admitted as a private
patient given have PHI Admitted as a public patient
given have PHI Are “admission as private”
& “have PHI” independent events?
Marginal probability P(PHI) = 0.472
Joint probability P(PHI and admitted)
= 0.015 + 0.054 = 0.069
Conditional probabilities P(admitted as private| PHI ) P(admitted as public| PHI )
= 0.054/0.472 = 0.114 = 0.015/0.472 = 0.032
Independence Admission & PHI status are related (dependent) P(admitted as private| PHI ) ≠ P(admitted as private)
19
Auditing example Let’s draw on both topics of probability &
sampling Background Local auditing firm services 100 companies 10 of these companies are known to have overdue
accounts If 2 separate firms are chosen at random What is probability distribution for number of accounts
overdue?
20
Auditing example…
009.0 182.0809.01overdue) two(
182.09990
10010
9910
10090
) and () and (overdue) one(
809.09989
10090
)|()() and (overdue) none(
account overdue having firm ith denote Let
2121
121
21
=−−=
=
+
=
+=
=
=
=
=
P
eePeePP
eePePeePP
e
CC
CCC
CC
i
21
Sampling Auditing example illustrates distinction between
sampling with & without replacement Because problem specified separate firms sampling was
done without replacement Such sampling induces dependence across events
P(e2) depends on type of firm sampled in first draw Independence & hence random sampling requires
sampling with replacement Distinction not of practical importance if population is large
Redo auditing example assuming sampling with replacement
22
Sampling… Events in experiments such as auditing example
may be represented by probability trees These are diagrams (that resemble trees)
Keller Ex 6.5 looks at selecting 2 students (without replacement) from 10 students (3 F, 7 M)
23
Auditing example revisted Assumed probability of an overdue account was
known Enabled determination of probability distribution of
number of overdue accounts Consider a different problem Suppose probability of an overdue account is not known Given sample of companies auditing firm can determine
sample proportion of these that have overdue accounts How can this statistic be used to infer something about the
true population proportion of companies with overdue accounts?
This is an inference problem
24
Progress report #2 In SIA admission distributions Assign probabilities to qualitative characteristics Public or private patient
In auditing example Probabilities assigned to quantitative characteristics Probabilities of number of overdue accounts Topic of random variables
Need to introduce theoretical distributions that are useful in representing/modelling actual data Initially discrete distribution Binomial