Discrete Data

Post on 30-Dec-2015

127 views 0 download

description

Discrete Data. Distributions and Summary Statistics. Terms: histogram, mode, mean, range, standard deviation, outlier. Discrete vs. Continuous Data. dis·crete adj. 1. Constituting a separate thing. See Synonyms at distinct . 2. Consisting of unconnected distinct parts. - PowerPoint PPT Presentation

Transcript of Discrete Data

Discrete Data

Distributions and Summary Statistics

Terms: histogram, mode, mean, range, standard deviation, outlier

Discrete vs. Continuous Data

dis·crete adj. 1. Constituting a separate thing. See Synonyms at distinct.2. Consisting of unconnected distinct parts.3. Mathematics: Defined for a finite or countable set of values; not continuous.

con·tin·u·ous adj. 1. Uninterrupted in time, sequence, substance, or extent. See Synonyms at continual.2. Attached together in repeated units: a continuous form fed into a printer.3. Mathematics: Of or relating to a line or curve that extends without a break or irregularity.

Discrete vs. Continuous Datadiscrete

Usually related to counts.Variable values for different units often tie.Averaging two values does not necessary yield another possible value.

continuousAny value in some interval.A tie among different units is in theory virtually impossible (and in practice very rare). Ties (due to rounding) are infrequent in practice.The average of any two values is another (and different) possible value.

Distribution

The distribution of a variable tells us what values it takes and how often it takes those values.

MAKE A PICTURE!

For discrete quantitative data, use a relative frequency chart / histogram* to display the distribution.

* Fundamentally these are the same thing.

Left Skewed Distribution

Right Skewed Distribution

Symmetric Distribution

Outlier

outlier noun

1: something that is situated away from or classed differently from a main or related body

2: a statistical observation that is markedly different in value from the others of the sample

Measures of CenterMedian

Half the data are above/below the median.

Not too suitable to highly discrete data. More later about this.

(Sample) Mean

Sum all the data x, then divide by how many (n)

Denoted (“x bar”)

Both have the same measurement units as the data.

x

Less Important Measures of CenterMidrange

Average the minimum and maximum

For highly skewed data, the midrange is often a value that is quite atypical.

Mode

Most common value - highest proportion of occurrence

There can be 2 (or more) modes if there are ties in relative frequencies.

Generally found by graphical inspection.

Sometimes not anywhere near any “center.”

Both have the same measurement units as the data.

Measure of spread / variationSAME THING

Range = Max – MinIn statistics Range is a single number

Interquartile RangeBetter suited to continuous data

More later about this.

Variance / Standard Deviation

All but variance have the same measurement units as the data.

Variance S2

Mean of the squared deviations from the mean

1. Obtain the Mean.

2. Determine, for each value, the deviation from the Mean.

3. Square each of these deviations

4. Sum these squares

5. Divide this sum by one fewer than the number of observations to get the Variance

Measure of squared variation from the mean

Standard Deviation S

Square root of the Variance

Measure of spread / variation (from the mean)

Same measurement units as the data.

Comparing Means & Standard Deviations

Small: Mean = 41.60 SD = 2.07

Large: Mean = 44.80 SD = 2.59

50484644424038

Small Class

Large Class

Age Guess

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add a 40 and a 50…

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add a 40 and a 50…

Mean 44.86 SD 3.58

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add a 42 and a 48…

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add a 42 and a 48…

Mean 44.86 SD 2.73

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add 45 and 45…

Comparing Means & Standard Deviations

Mean 44.80 SD 2.59

Add 45 and 45…

Mean 44.86 SD 2.12

Comparing Means & Standard Deviations

0

12

3

45

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Mean = 4.0 SD = 3.0

0

12

3

45

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Mean = 8.0 SD = 3.0

Comparing Means & Standard Deviations

0

12

3

45

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Mean = 8.0 SD = 3.0

0

1

2

3

4

5

6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Mean = 8.0 SD = 6.0

Computing Mean & Standard Deviation

Data listed by unit

1. By hand with calculator support (UGH)

2. Using your calculator’s built in statistics functionality

• 60 second quiz: Determine and write down the mean and standard deviation of at most 10 data values in under 1 minute

3. Using Excel

4. Using Minitab

Z = # of St Devs from Mean

“…within Z standard deviations of the mean…”

Determine Z SD.

Find the values

Mean – ZSD & Mean + ZSD

This means:

“…between __________ and ______________.”

Mean & Standard DeviationWhere the data are

In general you’ll find that about

68% of the data falls within 1 standard deviation of the mean

95% falls within 2

all falls within 3

There are exceptions.

These guidelines hold fairly precisely for data that has a bell (Normal) shaped histogram.

Range Rule of Thumb

To guess the standard deviation, take the usual range of data and divide by four.

Most homes for sale in the Oswego City School District are listed at prices between $50,000 and $200,000. What would you guess for the standard deviation of prices?

$50,000 to $200,000

Range about $200000 – $50000 = $150000

Apply the RRoT…

$150000 / 4 = $37,500

Students are asked to complete a survey online. This assignment is made on a Monday at about noon. The survey closes Wednesday at midnight.

Since each student’s submission is accompanied by a time stamp, it is simple to figure how early, relative to the deadline, each student submitted the work.

For the data set of amount of time early, guess the standard deviation. Give results in both days and hours.

This assignment is made on a Monday at about noon. The survey closes Wednesday at midnight.

That’s 2.5 days, or 60 hours. People will hand it in between immediately (2.5 days / 60 hours early) and at the last minute (0 early). The range is about 2.5 days or 60 hours.

Apply the RRoT…

2.5 / 4 = 0.625 days

these are the same

60 / 4 = 15 hours

Consider GPAs of graduating seniors.

Guess the standard deviation.

GPAs. You can’t graduate under 2.0. All As gives 4.0.

Min about 2.0 Max probably exactly 4.0

Range about 4.0 – 2.0 = 2.0

Apply the RRoT…

2.0 / 4 = 0.5

Example

An instructor asked students in two sections of the same course to guess the instructor’s age. Students in the first class (in a large lecture hall) had no other knowledge of the instructor’s personal life. Students in the second class (in a small classroom) knew that the instructor was the father of a young girl.

Variable

Guess of instructor’s age

Quantitative

Units

The students

Guess of instructor’s age varies from student to student.

Variable

Class (or Which class?)

Categorical

Units

The students

Which class varies from student to student.

This is a fairly symmetric distribution.

Mode = 42

Range = 54 – 32 = 22

5452504846444240383634323028Age_Large

Dotplot of Age_Large

This is a symmetric distribution.Mode = 42

Mean = 42.0

Symmetry: Typically Mean Mode

“Nearly equal”

5452504846444240383634323028Age_Large

Dotplot of Age_Large

Mean = 42.0

Mean = 39.0

5452504846444240383634323028Age_Small

Dotplot of Age_Small

5452504846444240383634323028Age_Large

Dotplot of Age_Large

Mean = 42.0 St Dev 22 / 4 = 5.5

Mean = 39.0 St Dev 22/ 4 = 5.5

5452504846444240383634323028Age_Large

Dotplot of Age_Large

5452504846444240383634323028Age_Small

Dotplot of Age_Small

Mean = 40.25 St Dev = 4.33 (guess 4.25)

Mean = 38.15 St Dev = 4.14 (guess 3.75)

454239363330

Large Class

454239363330

Small Class

Properties: Mean & Standard Deviation

They don’t really “depend” (in the usual sense) on how much data there is. They depend on the relative frequency (percent) of occurrence of each value.

Adding a new unit…

Sometimes the mean will go up; sometimes down. But on average it will stay the same.

Same for standard deviation.

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

ALWAYS – for every data set

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Standard Deviation Calculation

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

Standard Deviation Calculation

Sample Mean:

Sample Standard Deviation:

Standard Deviation Calculation for the Large Section

Age Mean Deviation from Mean Deviation squared

43 44.8 43 – 44.8 = -1.8 (-1.8)2 = 3.24

48 44.8 48 – 44.8 = +3.2 3.22 = 10.24

42 44.8 42 – 44.8 = -2.8 (-2.8)2 = 7.84

44 44.8 44 – 44.8 = -0.8 (-0.8)2 = 0.64

47 44.8 47 – 44.8 = +2.2 2.22 = 4.84

Sums 224 224.0 0 26.80

Mean = 224 / 5 = 44.8 Variance = 26.8 / 4 = 6.7 SD = 7.6 = 2.59

80.44x

59.2S