Unit 4 Describing Data: Displaying and Exploring...

27
Chapter 4 Describing Data Displaying and Exploring Data Dr.Manahil Kamal M.Eltib

Transcript of Unit 4 Describing Data: Displaying and Exploring...

Page 1: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Chapter 4

Describing Data

Displaying and Exploring

Data

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 2: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

GOALS

Develop and interpret a stem-and-leaf display

Compute and understand quartiles,

deciles, percentiles and coefficient of Skewness.

Construct and interpret box plots.

Draw and interpret a

scatter diagram

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 3: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

STEM-AND-LEAF

Stem-and-leaf display is a statistical technique to present a set of data. Each numerical value is divided into two parts. The leading digit(s) becomes the stem and the trailing digit the leaf. The stems are located along the vertical axis, and the leaf values are stacked against each other along the horizontal axis.

Advantage of the stem-and-leaf display over a frequency distribution - the identity of each observation is not lost.

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 4: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Example (1) :

Make a stem and leaf plot of the algebra test scores given below.

(Then complete each question)

56, 65, 98, 82, 64, 71, 78, 77, 86, 95, 91, 59, 69, 70, 80, 92, 76, 82, 85, 91, 92, 99, 73

Solution:

Put the scores in numerical order

56, 59, 64, 65, 69, 70, 71, 73, 76, 77, 78, 80, 82, 82, 85, 86, 91, 91, 92, 92, 95, 98, 99

Since the data range from 56 to 99, the stems range from 5 to 9. To plot the data, make a vertical list of the stems. Each number is assigned to the graph by pairing the unit's digit, or leaf, with the correct stem. The score 56 is plotted by placing the units digit, 6, to the right of stem 5.

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 5: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

What was the lowest score

on the algebra test?

56

What was the highest score

on the algebra test?

99

In which interval did most

students score?

91 to 99 (7 students)

How much is the sample

size by the shape?

23 students

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 6: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Example (2) : (Text book P"102")

Listed in the following table is the number of

30-second radio advertising spots purchased by

each of the 45 members of the Greater Buffalo

Automobile Dealers Association last year.

Organize the data into a stem-and-leaf display.

Around what values do the numbers of

advertising spots tend to cluster? What is the

fewest number of spots purchased by a dealer?

The largest number purchased?

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 7: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 8: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 9: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Example (3) : Use a Stem-and-Leaf Plot to Find

Mean, Median and Mode of a Set of data

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 10: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Solution:

According to the shape of the original data is:

35 , 36 , 37 , 38 , 40 ,40 , 41 , 42 , 43 , 55 , 55, 55 , 56

, 57 , 58 , 59

1. The mean = 46.68

2. The median = 42.5

3. The mode = 55

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 11: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

QUARTILES, DECILES AND PERCENTILES

Alternative ways of describing spread of data

include determining the location of values that

divide a set of observations into equal parts.

These measures include quartiles, deciles, and

percentiles.

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 12: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Percentiles

A percentile is a measure at which that percentage of the

total values are the same as or below that measure. For

example, 90% of the data values lie below the 90th

percentile, whereas 10% of the data values lie below the 10th

percentile.

Quartiles

Quartiles are values that divide a (part of a) data table into

four groups containing an approximately equal number of

observations. The total of 100% is split into four equal parts:

25%, 50%, 75% and 100%.

First quartile (lower quartile) to be at the 25th percentile.

Median (or second quartile) to be at the 50th percentile.

Third quartile (upper quartile) to be a the 75th percentile

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 13: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

To find the P-th Percentile:

Sort all observations in ascending order (computing percentiles for non-

sorted data is the most common mistake).

Compute the position L = (P/100) * (n+1)

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 14: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Example (4): Consider the following cotinine levels of 40 smokers:

Find the quartiles and the 40th percentile.

Find the quartiles and the 40th percentile.

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 15: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Solution:

First note that before we start our computations we must

sort the data

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 16: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Lower Quartile:

Location of LQ:

By reference to the data Element No. 10 = 86 and No.11= 87

By reference to the data Element No. 20 = 167 and No.21= 173

Second Quartile: (Median)

Location of SQ:

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 17: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Upper Quartile:

Location of UQ

By reference to the data element No. 30= 250 and No.31= 253

40th Percentile

Location of40th Percentile:

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 18: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

BOXPLOT

A box plot is a way of summarizing a set of data

measured on an interval scale. It is often used in

exploratory data analysis. It is a type of graph

which is used to show the shape of the

distribution, its central value, and variability

(maximum and minimum values, the lower and

upper quartiles, and the median)

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 19: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Example (5): The following graph represents data example 4

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 20: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

SKEWNESS

The first thing you usually notice about a

distribution’s shape is whether it has one mode

(peak) or more than one. If it’s unimodal (has just

one peak), like most data sets, the next thing you

notice is whether it’s symmetric or skewed to one

side.

If the bulk of the data is at the left and the right

tail is longer, we say that the distribution is skewed

right or positively skewed; if the peak is toward the

right and the left tail is longer, we say that the

distribution is skewed left or negatively skewed

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 21: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Types of Skewness

Symmetric

Positively skewed

Bimodal

Negatively skewed

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 22: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 23: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

COEFFICIENT OF SKEWNESS

Skewness is measured using the Pearson coefficient

The coefficient of Skewness can range from -3 up to 3.

A value near -3 indicates considerable negative Skewness.

A value near 3 indicates moderate positive Skewness.

A value of 0, which will occur when the mean and median are

equal, indicates the distribution is symmetrical and that there

is no Skewness present.

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 24: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Example (6)

Calculate the coefficient of Skewness of the following data by using Pearson's method

2 , 3 , 3 , 4 , 4 , 6 , 6

Solution:

The median = 4

SK= 3(4-4)/1.53 = 0

Symmetric (Normal)

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 25: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

DESCRIBING RELATIONSHIP BETWEEN TWO

VARIABLES

One graphical technique we use to show the

relationship between variables is called a scatter

diagram.

To draw a scatter diagram we need two variables.

We scale one variable along the horizontal axis

(X-axis) of a graph and the other variable along

the vertical axis (Y-axis).

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 26: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Ice Cream Sales vs Temperature

(X) Temperature °C (Y) Ice Cream Sales

14.2° 215$

11.9 ° 185$

15.2 ° 332$

18.5 ° 406$

22.1 ° 522$

19.5 ° 412$

25.1 ° 416$

23.4 ° 544$

18.1° 421$

22.6° 445$

17.5° 408$

Example (7) : The local ice cream shop keeps track of how much ice

cream they sell versus the noon temperature on that day. Here are

their figures for the last 11 days :

And here is the same data as a Scatter Plot:

Dr.M

an

ah

il Ka

ma

l M.E

ltib

Page 27: Unit 4 Describing Data: Displaying and Exploring Datafac.ksu.edu.sa/sites/default/files/chapter_04_0.pdf · 2015-08-30 · Skewness is measured using the Pearson coefficient The coefficient

Dr.M

an

ah

il Ka

ma

l M.E

ltib