Statistics 2/6/2013

61
Statistics 2/6/2013

description

Statistics 2/6/2013. Quiz 1. Explain the difference between a Census  and a Sample.  Explain the difference between numerical and Categorical data. Give an example of data that is at the nominal level of measurement. Quiz 1 Solutions. - PowerPoint PPT Presentation

Transcript of Statistics 2/6/2013

Statistics 2/6/2013

Quiz 1

1.Explain the difference between a Census  and a Sample. 

2.Explain the difference between numerical and Categorical data.

3.Give an example of data that is at the nominal level of measurement.

Quiz 1 Solutions

1.Explain the difference between a Census  and a Sample. A census is a collection of data from the entire population while a sample is from a subset.

2.Explain the difference between numerical and Categorical data. Numerical data consists of numbers. Categorical data consists of names of labels or categories

3.Give an example of data that is at the nominal level of measurement. Political affiliation.

Critical Thinking  

We must think carefully about the context of the data the source of the data, the method used in data collection, the conclusions reached, and the practical implications

 

 Fun Quotes•"There are three kinds of lies: lies, damned lies,

and statistics"-Benjamin Disraeli•"Figures don't lie; liars figure."-Mark Twain•"There are two kinds of statistics, the kind you

look up, and the kind you make up."-Rex Stout

Bad StatisticsBad statistics happen either by evil intent or unintentional errors.  How to Lie with Statistics a book written by Darrell Huff in 1954, is the classic text on this topic and has many examples of intentional or unintentional misuses of statistics. Misuse of graphs is a common way to misrepresent data or results. 

Bad Samples may result in incorrect findings as well.  Bad samples occur when the methods used to collect the data results in a biased sample.  So that the sample does not

represent the population from which it was obtained

A voluntary response sample (or self-selected sample) is one in whch the respondents themselves decide whether to be included.

Ex: Any poll or survey where the readers or listeners decide to participate

Another way to misinterpret statistical data is to find a statistical association between two variables and to conclude that one of the variable caused the other.  The relationship is called a correlation.  When one of the variables does cause a change in the other, then we have causality.

Correlation and Causality

Reported results

When collecting data from people, it is better to take the measurements yourself rather than rely on subjects to report results.

Ex. 3 Voting Behavior When surveyed about whether they voted or not about 70% of 1000 eligible voters reported that they had voted.  Voting records showed that only 61% had indeed voted.

Small Samples

Conclusions should not be drawn from samples that are far too small.

Ex. 4 The Children's defense fund published an article Children out of School in America, in which it was reported that in a certain school district that 67%  of the students were suspended at least three times.  The figure is based on a sample of only 3 students.

Percentages

Percentages can be cited in a manner that is either unclear or misleading.  The fact is that a 100% of something is all of the something.  So if you see percentages about 100% being cited it is probably not justified.  

Other Sampling considerationsWording of a question

97% yes: "Should the President have the line item veto to eliminate waste?"

57% yes: "should the President have the line item veto, or not?"

Order of Questions • Would you say that traffic contributes more or less to air

pollution than industry?• Would you say that industry contributes more or less to

air pollution than traffic?

Traffic first: 45% blamed traffic and 27% blamed industry

Industry first: 24% blamed traffic and 57% blamed industry

Nonresponse

 

Other Sampling Considertations

Missing Data Phone surveys miss people without phones

Self-Interest Study Kiwi Shoe Polish, getting a job

Precise Numbers You cannot assume precise numbers are accurate

Deliberate Distortions Avis vs Hertz

Collecting Sample Data

If sample data are not collected in an appropriate way, the data may be so completely useless that no amouicsnt of statistical torturing can salvage them.

In an Observational Study, we observe and measure specific characters, but we don't attempt to modify the subjects being studied.

In an Experiment, we apply some treatment and then proceed to observe its effects on the subjects. (Subjects in experiments are called experimental units.)

Types of Samples

A Simple random sample of n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen.

In a random sample members from the population are selected in such a way that each individual member in the sample has an equal chance of being selected.

 

A probability sample involves selecting members from a population in such a way that each member of the population has a know (but not necessarily the same) chance of being selected

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? 

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? Yes since each individual senator has an equal chance of being picked.

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? Yes since each individual senator has an equal chance of being picked.

Simple random sample?

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? Yes since each individual senator has an equal chance of being picked.

Simple random sample? No not all samples of size two have the same chance of being picked. (a sample of senators from different states cannot be picked at all).

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? Yes since each individual senator has an equal chance of being picked.

Simple random sample? No not all samples of size two have the same chance of being picked. (a sample of senators from different states cannot be picked at all).

 

Probability sample? 

Types of Samples Example

Each  of the 50 states sends two senators to Congress, so there are exactly 100 senators.  Suppose that we write the name of each state on a separate index card, then mix 50 cards in a bowl, and then select one card. If we consider the two senators from the selected state to be sampled, is this result a random sample? Yes since each individual senator has an equal chance of being picked.

Simple random sample? No not all samples of size two have the same chance of being picked. (a sample of senators from different states cannot be picked at all).

 

Probability sample?  Yes since each senator has a know chance of being selected.

Other Sampling Methods

In Systematic sampling, we select some starting point and then select every kth element in the population.

 

With convenience sampling, we simply use the results that are very easy to get.

 

With Stratified sampling, we subdivide the population into at least two different subgroups (or strata) so that subjects within the same subgroup share the same characteristics (such as gender or age bracket), then we draw a sample from each subgroup (or stratum).

 

In Cluster sampling, we first divide the population area into sections (or clusters), then randomly select  some of those clusters, and then chose all the members from those selected clusters.

Other Sampling Methods

Multistage sampling occurs when pollsters collect data using a combination of the basic sampling methods. In a multistage sample design, pollsters select a sample in different stages, and each stage might use different methods of sampling.

Group Quiz 2

1. The Statistical Abstract of the United States includes the average per capita income for each of the 50 states. When those 50 values are added, then divided by 50, the result is $29,672.52. Is $ 29,672.52 the average per capita income for all individuals in the United States? Why or why not?

Frequency Distributions

We recorded the pulses of 40 women. Here it is!

76 64 72 80 88 76 60 76 72 7668 80 80 104 64 88 68 60 68 7680 72 76 72 68 88 72 80 96 60 72 72 68 88 72 88 64 124 80 64

This data is hard to make sense of so we (you) are going to organize it using a Frequency Distribution (Table)

Frequency Distributions

A frequency Distribution shows how a data set is partitioned among all of several categories (or classes) by listing all of the categories along with the number of data values in each of the categories.

 

Lower class limits are the smallest numbers that can belong to the different classes.

 

Upper class limits are the largest numbers that can belong to the different classes.  

Class boundaries are the numbers used to separate the classes, but without the gaps created by class limits

Frequency Distributions

Class midpoints are the values in the middle of the classes.

Class width is the difference between two consecutive lower class limits.

 

 

Procedure for constructing a frequency Distribution.

1. Determine the number of classes.

2. Calculate the class width.

class width= (max data value-min data value)/number of classes.

 3. Choose either the min data value or convenient value below the min data value as the first lower class limit.

4. Using the first lower class limit and class width, list the other lower class limits. Do this vertically and add in the upper class limits

 5. Tally up the data values in each class.

Example 1 Frequency table by hand.

76 64 72 80 88 76 60 76 72 76 68 80 80 104 64 88 68 60 68 76

 

80 72 76 72 68 88 72 80 96 60 72 72 68 88 72 88 64 124 80 64 

 

 1. Lets Have 7 classes.

 

2. Find the width.

Example 1 Frequency table by hand.

76 64 72 80 88 76 60 76 72 76 68 80 80 104 64 88 68 60 68 76

 

80 72 76 72 68 88 72 80 96 60 72 72 68 88 72 88 64 124 80 64 

 

 1. Lets Have 7 classes.

 

2. Find the width.   124-60= 64   64/7=9.14

List the min data value or convenient data value

60

List the lower values

60

70

List the lower values

60

70

80

90

100

110

120

Add in the upper limit values

60-69

70-79

80-89

90-99

100-109

110-119

120-129

Tally Ho!

76 64 72 80 88 76 60 76 72 76 68 80 80 104 64 88 68 60 68 76 80 72 76 72 68 88 72 80 96 60 72 72 68 88 72 88 64 124 80 64  

60-69 12

70-79

80-89

90-99

100-109

110-119

120-129

Tally Ho!

76 64 72 80 88 76 60 76 72 76 68 80 80 104 64 88 68 60 68 76 80 72 76 72 68 88 72 80 96 60 72 72 68 88 72 88 64 124 80 64

60-69 12

70-79 14

80-89

90-99

100-109

110-119

120-129

Tally Ho!

Pulse Rate Freq

60-69 12

70-79 14

80-89 11

90-99 1

100-109 1

110-119 0

120-129 1

Relative Frequency

In a relative frequency the frequency is replaced with a relative frequency (proportion) or a percentage frequency (percent).

Relative frequency=class frequency/sum of all frequencies

Percentage freq=(class freq/sum of all freq)*100%

Pulse Rate Relative Frequency

60-69 12/40

70-79 14/40

80-89 11/40

90-99 1/40

100-109 1/40

110-119 0/40

120-129 1/40

Change into a relative frequency

Pulse Rate Relative Frequency

60-69 12/40=0.3

70-79 14/40=0.35

80-89 11/40=0.27

90-99 1/40=0.025

100-109 1/40=0.025

110-119 0/40=0

120-129 1/40=0.025

Change into a relative frequency

Pulse Rate Relative Frequency

60-69 0.3

70-79 0.35

80-89 0.275

90-99 0.025

100-109 0.025

110-119 0

120-129 0.025

Change into a relative frequency

Pulse Rate Freq

60-69 12

70-79 14

80-89 11

90-99 1

100-109 1

110-119 0

120-129 1

Change into cumulative frequency

Pulse Rate Cumulative Freq

60-69 12

70-79 12+14

80-89 12+14+11

90-99 12+14+11+1

100-109 12+14+11+1+1

110-119 12+14+11+1+1+0

120-129 12+14+11+1+1+0+1

Change into cumulative frequency

Pulse Rate Cumulative Freq

69 or less 12

79 or less 12+14=26

89 or less 12+14+11=37

99 or less 12+14+11+1=38

109 or less 12+14+11+1+1=39

119 or less 12+14+11+1+1+0=39

129 or less 12+14+11+1+1+0+1=40

Change into cumulative frequency

Pulse Rate Cumulative Freq

69 or less 12

79 or less 26 

89 or less 37

99 or less 38

109 or less 39

119 or less 39

129 or less 40

Frequency DistributionsLast Digit of female pulses

Frequency

0 9

1 0

2 8

3 0

4 6

5 0

6 7

7 0

8 10

9 0

Frequency Distributions

IQ Frequency

50-69 24

70-89 228

90-109 490

110-129 232

130-149 26

IQ Scores from 1000 adults were randomly selected. The results are summarized below. Notice the frequencies start low, increase then decrease.

HistogramsA histogram is a graph consisting of bars of equal width drawn

adjacent to each other (without gaps). The Horizontal scale represents classes of quantitative data value and the vertical scale represents frequencies. The heights of the bars correspond to the frequency values.

 

60-69 70-79 80-89 90-99 100-109

110-119

120-129

0

4

8

12

16

Female Pulse Rates

Pulse Rate

Freq

uenc

y

Relative Frequency Histogram

A relative frequency histogram is the same as a histogram with relative frequencies instead of frequencies.

60-69 70-79 80-89 90-99 100-109

110-119

120-129

00.05

0.10.15

0.20.25

0.30.35

0.4

Female Pulse Rates

Pulse Rate

Rela

tive

Freq

Cumulative Histogram

69 or less

79 or less

89 or less

99 or less

109 or less

119 or less

129 or less

05

1015202530354045

Cumulative Frequency Distribution of the Pulse Rates of Females

This data because of its shape is said to have a normal distribution.

50-69 70-89 90-109 110-129 130-1490

100

200

300

400

500

600

IQ Scores

IQ Score

Freq

uenc

y

Histograms

2.40-2.49

2.50-2.59

2.60-2.69

2.70-2.79

2.80-2.89

2.90-2.99

3.00-3.09

3.10-3.19

0

5

10

15

20

25

30

Weights of Pennies

Weight of Penny

Freq

uenc

y