Analyze Data USE MEAN & MEDIAN TO COMPARE THE CENTER OF DATA SETS. IDENTIFY OUTLIERS AND THEIR...

12
Analyze Data USE MEAN & MEDIAN TO COMPARE THE CENTER OF DATA SETS. IDENTIFY OUTLIERS AND THEIR EFFECT ON DATA SETS.

Transcript of Analyze Data USE MEAN & MEDIAN TO COMPARE THE CENTER OF DATA SETS. IDENTIFY OUTLIERS AND THEIR...

Analyze DataUSE MEAN & MEDIAN TO COMPARE THE CENTER OF DATA SETS.

IDENTIFY OUTLIERS AND THEIR EFFECT ON DATA SETS.

4 3 2 1 0In addition to level 3.0 and above and beyond what was taught in class,  the student may:· Make connection with other concepts in math· Make connection with other content areas.

The student will summarize, represent, and interpret data on a single count or measurement variable.- Comparing data includes analyzing center of data (mean/median), interquartile range, shape distribution of a graph, standard deviation and the effect of outliers on the data set. - Read, interpret and write summaries of two-way frequency tables which includes calculating joint, marginal and relative frequencies.

The student will be able to:- Make dot plots, histograms, box plots and two-way frequency tables.- Calculate standard deviation.- Identify normal distribution of data (bell curve) and convey what it means.  

With help from theteacher, the student haspartial success with summarizing and interpreting data displayed in a dot plot, histogram, box plot or frequency table.

Even with help, the student has no success understanding statistical data.

Focus 6 Learning Goal – (HS.S-ID.A.1, HS.S-ID.A.2, HS.S-ID.A.3, HS.S-ID.B.5) = Students will summarize, represent and interpret data on a single count or measurement variable.

Reminder:

To find the median, you put all the numbers in order from least to greatest. The middle number is the median.

To find the mean, you add up all of the numbers then divide by how many numbers are in the data set.

Measure of Central Tendency

Mean & median are both measures of central tendency. This means they identify the “middle” of the data.

This measure attempts to describe the whole set of data with a single value that represents the middle or center of its distribution.

Median: Advantage: Is less affected by outliers and skewed data. It is the

preferred measure of center when the distribution is not symmetrical.

Mean: Advantage: Can be used for both continuous and discrete numeric data.

Limitations: Is influenced by outliers and skewed distribution.

The shape of data distributions.

Normal distribution is mound shaped, symmetric. If the mean and median are equal, then the data is symmetric. If the mean is greater than the median, the data is skewed

right. If the mean is less than the median, the data is skewed left.

Test your memory…

The mean of a data set is 12 and the median is 12. What are the possible shapes for this data set? A. Mound B. Symmetric C. Skewed Right D. Skewed Left E. Both

A & B

The mean of a data set is 12 and the median is 10. What is the data shape? A. Octagonal B. Symmetric C. Skewed Right D. Skewed Left

Outliers

The shape of the data helps us find and identify outliers.

An outlier is something that sticks out from the rest of the data.

It is a data point that has an “extreme value” when compared with the rest of the data set.

Mathematically speaking, an outlier is defined as any point that falls 1.5 times the IQR below the lower quartile or 1.5 times the IQR above the upper quartile.

Data:37, 37, 38, 38, 40, 40, 42, 42, 42, 62 The median is:

Q1: Q3: IQR = Q3 – Q1= The box plot looks like this:

403

842 42 – 38 = 4

The lower limit on outliers is Q1 – (1.5)(IQR).

38 – (1.5)(4) = 32 This means an outlier would be

any number less than 32. The upper limit on outliers is

Q3 + (1.5)(IQR). 42 + (1.5)(4) = 48 This means an outlier would be

any number greater than 48.

Data:37, 37, 38, 38, 40, 40, 42, 42, 42, 62 The outlier for this data set is 62. It surpasses the cut off of 48. When there is an outlier on one side of

the data set, we can chop off the “whisker” at the limit and then record the outlier as data points.

The final box plot would look like this.

Calculate the mean of the data set.

Calculate the mean of the data set without the outlier.

Removing the outlier changes the mean significantly.

Removing the outlier does not change the median significantly.

41.8

39.6

Going Fishing A fisherman records the length, in centimeters

of 10 bass caught in a stream:15 22 19 18 15 45 27 18 18 51

He wants to know the average length of a fish he can catch.

Determine the mean and median of the data. Mean: 248 ÷ 10 = 24.8 cm Median: 15 15 18 18 18 19 22 27 45 51 18.5 cm

Going Fishing

Are there any outliers? Divide the data into quarters to find the IQR. 15 15 18 18 18 19 22 27 45 51

Q3Q1 IQR = 27 – 18 = 9 The lower limit on outliers is Q1 – (1.5)(IQR).

18 – (1.5)(9) = 4.5 The upper limit on outliers is Q3 + (1.5)(IQR).

27 + (1.5)(9) = 40.5

Any number less than 4.5 or greater than 40.5 are outliers.45 and 51 are outliers.

Going Fishing

Remove the outliers and recalculate the mean and median.

15 15 18 18 18 19 22 27 Mean: 152 ÷ 8 = 19 cm Median: 18 cm With the outliers removed, the mean is now closer to the

center of the data. The average length of a fish caught in this stream is

________.