Stat 31, Section 1, Last Time Correlation Linear Regression –Idea – graphics –Computation...
-
Upload
john-jordan -
Category
Documents
-
view
224 -
download
0
Transcript of Stat 31, Section 1, Last Time Correlation Linear Regression –Idea – graphics –Computation...
Stat 31, Section 1, Last Time
• Correlation
• Linear Regression– Idea – graphics– Computation– Interpretation
Midterm I
Coming up: Tuesday, Feb. 15
Material: HW Assignments 1 – 4
Extra Office Hours:
Mon. Feb. 14, 8:30 – 12:00, 2:00 – 3:30
Bring Along:
1 8.5” x 11” sheet of paper with formulas
(front & back OK, but no newspapers)
Chapter 3: Producing Data
(how this is done is critical to conclusions)
Section 3.1: Statistical Settings
2 Main Types:
I. Observational Study
Simply “see what happens, no intervention”
(to individuals or variables of interest)
e.g. Political Polls, Supermarket Scanners
Producing Data
2 Main Types:
I. Observational Study
II. Experiment
(Make Changes, & Study Effect)Apply “treatment” to individuals & measure
“responses”
e.g. Clinical trials for drugs, agricultural trials
(safe? effective?) (max yield?)
Producing Data
2 Main Types:
I. Observational Study
II. Experiment
(common sense)
Caution: Thinking is required for each.
Both if you do statistics & if you need to understand somebody else’s results
Producing Data
2 Main Types:
I. Observational Study
II. Experiment (common sense)
Caution: Thinking is required for each
Both if you do statistics & if you need to understand somebody else’s results
Helpful Distinctions(Critical Issue of “Good” vs. “Bad”)
I. Observational Studies:
A. Anecdotal Evidence
Idea: Study just a few cases
Problem: may not be representative
(or worse: only considered for this reason)
e.g. Cures for hiccups
Key Question: how were data chosen?(early medicine: this gave crazy attempts at cures)
Helpful DistinctionsI. Observational Studies:
B. Sampling
Idea: Seek sample representative of population
HW:
(old) 3.1, 3.3, 3.5, 3.7
Challenge: How to sample?
(turns out: not easy)
How to sample?History of Presidential Election Polls
During Campaigns, constantly hear in news “polls say …” How good are these? Why?
1936 Landon vs. Roosevelt Literary Digest Poll: 43% for R
Result: 62% for R
What happened?Sample size not big enough? 2.4 million
Biggest Poll ever done (before or since)
Bias in SamplingBias: Systematically favoring one outcome
(need to think carefully)
Selection Bias: Addresses from L. D.
readers, phone books, club memberships
(representative of population?)
Non-Response Bias: Return-mail survey
(who had time?)
Bias in Sampling1936 Presidential Election (cont.)
Interesting Alternative Poll:
Gallup: 56% for R (sample size ~ 50,000)
Gallup of L.D. 44% for R ( ~ 3,000)
Predicted both correct result (62% for R),
and L. D. error (43% for R)!
(what was better?)
Improved SamplingGallup’s Improvements:
(i) Personal Interviews
(attacks non-response bias)
(ii) Quota Sampling
(attacks selection bias)
Quota SamplingIdea: make “sample like population”
So surveyor chooses people to give:i. Right % male
ii. Right % “young”
iii. Right % “blue collar”
iv. …
This worked well, until …
How to sample?1948 Dewey Truman sample size
Crossley 50% 45%
Gallup 50% 44% 50,000
Roper 53% 38% 15,000
Actual 45% 50% -
Note: Embarassing for polls, famous photo of Truman + Headline “Dewey Wins”
What went wrong?Problem: Unintentional Bias
(surveyors understood bias,
but still made choices)
Lesson: Human Choice can not give a Representative Sample
Surprising Improvement: Random Sampling
Now called “scientific sampling”
Random = Scientific???
Random SamplingKey Idea: “random error” is smaller than
“unintentional bias”, for large enough sample sizes
How large?
Current sample sizes: ~1,000 - 3,000
Note: now << 50,000 used in 1948.
So surveys are much cheaper
(thus many more done now….)
Random Sampling
How Accurate?
• Can (& will) calculate using “probability”
• Justifies term “scientific sampling”
• 2nd improvement over quota sampling
Random SamplingWhat is random?
Simple Random Sampling:
Each member of population is
equally likely to be in sample
Key Idea: Different from “just choose some”
Random SamplingAn old (but still fun?) experiment:
Choose a number among 1,2,3,4
Old typical results: about 70% choose “3”
(perhaps you have seen this before…)
Main lesson: human choice does not give “equally likely” (i.e. random sample)
Random Sampling
How to choose a random sample?
Old Approaches:
– Random Number Table
– Roll Dice
Modern Approach:
– Computer Generated
Random SamplingEXCEL generation of random samples:
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg15.xls
Goal 1: Generate Random Numbers
EXCEL approaches:
• RAND function
• Tools Data Analysis Random
Number Generation
EXCEL Random SamplingGoal 2: Randomly Reorder List
EXCEL approach:
• Highlight block with list & random num’s
• Sort whole thing on numbers
Goal 3: Random Sample from List
• Choose 1st subset from random re-order
• Since, each equally likely in each spot
EXCEL DetailsRAND:
• Not available among “Statistical” functions
• But can find on “All” menu
• Note no (explicit) inputs
• Just put in desired cell
• Drag downwards for several random #s
• Caution: these change on each re-comp.
• Thus not recommended for this
EXCEL DetailsTools Data Analysis Random Number
Generation :• Set: # Variables: 1
Distribution: Uniform (over [0,1])
• Generates Fixed List
(doesn’t change with re-computation)
(note entries are “just numbers”)• Thus stable for later interpretation• Recommended for random sample choice
EXCEL DetailsSorting Lists:
• Highlight Block with Both:
– Names to sort
– Random numbers
• Data Sort Choose Column
• Result is random re-ordering of List
Random Sampling HWHW:
C8: For the letters A – L, use EXCEL to:
(a) Put in a random order.
(b) Choose a random sample of 6.
(Hints: for (a), want each equally likely,
for (b), reorder, and choose a subset)
Random Sampling HWInteresting Question:
What is the % of Male Students at UNC?
(Your chance of date,
or take 100% - to get your chance)
HW:
C9: Print Class Handouthttps://www.unc.edu/~marron/UNCstat31-2005/Stat31HWC9.doc
Random Sampling HWNotes on HW C9:• 3 dumb ways to sample, 1 good one• Goal is to learn about sampling,
Not “get right answer”• Part 1, put symbol for yourself, Ms and Fs
for others• Put both count & % (%100 x count / 25)• Part 2, “tally” is:• Part 4, student phone directory available
in Student Union?
Random Sampling HWNotes on HW C9,
• Hints on Part 4:– For each draw, first draw a “random page”– Tools Data Analysis Random Number
Generation Uniform is one way to do this– In “Uniform”, you need to set “Parameters”, to
0 and “number of pages”.– This gives a random decimal, to get an
integer, round up, using CEILING– In CEILING, set “significance” to 1.
Random Sampling HWNotes on HW C9,
• Hints on Part 4 (cont.):– Next Choose Random Column– Next Choose Random Name– Caution: Different numbers on each page.– Challenge: still make equally likely– Approach: choose larger number.– Approach: when not there, just toss it out– Approach: then do a “redraw”– Also redraw if can’t tell gender