Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR...

12
Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities Quartiles and IQR Boxplots Quantile Plots

Transcript of Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR...

Page 1: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

Lecture 7Sections 2.3 – 2.4

Objectives:

•More Detailed Summary Quantities− Quartiles and IQR− Boxplots− Quantile Plots

Page 2: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

More Detailed Summary Quantities

Percentiles The median divides a data set into two equal parts. A finer partition can be obtained by dividing a data set into more than two parts. The (100p)th percentile separates the smallest 100p% of the data or distribution from the remaining values.

Page 3: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

Quartiles and the Interquartile Range

Certain percentiles are particularly important. Quartiles (first quartile, median, third quartile) separates a data set or distribution into four equal parts:

25%th percentile=first quartile or lower quartile, denoted by Q1. 50%th percentile=median, 75%th percentile=third quartile or upper quartile, denoted by Q3.

Sample quartiles Separate the n ordered sample observations into a lower half and an upper half. If n is odd, include the median in each half. Then,

Q1=median of the lower half of the data Q3=median of the upper half of the data

Note that there are several different sensible ways to define the sample quartiles. R uses different ways of finding sample quartiles.

Page 4: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

Examples

Example. n = 15

20 25 25 27 28 31 33 34 36 37 44 50 59 85 86

Find Q1 ,Median and Q3.

Example. n=14

20 25 25 27 28 31 33 34 36 37 44 50 59 85

Find Q1, Median and Q3.

Page 5: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

Population Quartiles

25.0)(1

Q

dxxf 25.0)(3

Q

dxxf

pdxxfp

)(

Page 6: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

IQR and Outlier Detection

Determining outliers

Suspected (mild) outlier – any observation is a suspected outlier if it is farther than 1.5 IQR from the closest quartile (i.e., falls beyond Q1-1.5IQR and Q3-1.5IQR).

Interquartile range (IQR)

IQR = Q3 - Q1 •Resistant to the effect of outliers. •Useful for the estimation of the variability when the distribution is skewed.

Highly suspected (extreme) outlier – any observation is an extreme outlier if it is farther than 3IQR form the nearest quartile (i.e., falls beyond Q1-3IQR and Q3-3IQR).

Page 7: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

Boxplots

A boxplot is a visual display of data based on the following five-number summary:

Min, Q1, Median, Q3, Max

Note: Boxplots always run from bottom-to-up or from left-to-right. A central box spans Q1 and Q3 and a line in the box marks the median. Outliers are marked with “o”. In a box plot the upper whisker extends to the largest data value within the upper limit, Q3 + 1.5IQR, and the lower whisker extends to the smallest value within the lower limit, Q1 -1.5IQR.

Page 8: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

Boxplot Examples

Ultrasound was used to gather the accompanying corrosion data on the thickness of the floor plate of an aboveground tank used to store crude oil (“Statistical Analysis of UT Corrosion Data from Floor Plates of a Crude Oil Aboveground Storage Tank”, Material Eval., 1994: 846-849). Each observation is the largest pit depth in the plate, expressed in milli-in.

40 52 55 60 70 75 85 85 90 90 92 94 94 95 98 100 115 125 125 Find the five-number summary and plot the boxplot.

The effects of partial discharges on the degradation of insulation cavity material have important implications for the lifetimes of high-voltage components. Consider the following sample of n=25 pulse widths from slow discharges in a cylindrical cavity made of polyethylene:

5.3 8.2 13.8 74.1 85.3 88.0 90.2 91.5 92.4 92.9 93.6 94.3 94.8 94.9 95.5 95.8 95.9 96.6 96.7 98.1 99.0 101.4 103.7 106.0 113.5 Find the five-number summary and plot the boxplot.

Page 9: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

Comparative Boxplots

Comparative boxplot (or side-by-side boxplot) provides a very effective way of revealing similarities and differences between two or more data sets consisting of observations on the same variable.

Example. The article “Compression of Single-Wall Corrugated Shipping Containers Using Fixed and Floating Test Platens” (J. of Testing and Evaluation, 1992: 318-320) describes an experiment in which several different types of boxes were compared with respect to compression strength. Consider the following observations on four different types of boxes:

Type of Box Compression Strength (lb) 1 655.5 788.3 734.3 721.4 679.1 699.4 2 789.2 772.5 786.9 686.1 732.1 774.8 3 737.1 639.0 696.3 671.7 717.2 727.1 4 535.1 628.7 542.4 559.0 586.9 520.0

Page 10: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

Quantile Plots

Quantile Plots An investigator frequently wishes to know whether data was selected from a particular type of population distribution (e.g., normal distribution). For one thing, many inferential procedures are based on the assumption that the underlying distribution is of a specified type. The use of such procedures is inappropriate if the actual distribution differs greatly from the assumed type. Additionally, understanding the underlying distribution can sometimes give insight into the physical mechanisms involved in generating the data. An effective way to check distributional assumption is to construct a quantile plot (or probability plot).

Idea: Plot the sample quantiles vs. the theoretical quantiles (population quantiles). If the data come from the correct distribution, the points in the plot will fall close to a straight line. If the actual distribution is quite different from the one used to construct a plot, the points should depart substantially from a linear pattern.

Page 11: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

Normal Quantile Plot

A Normal Quantile Plot is a plot of the (z quantile, sample quantile) pairs.

Example. The accompanying sample consisting of n=20 observations on dielectric breakdown voltage of a piece of epoxy resin appeared in the article “Maximum Likelihood Estimation in the 3-Parameter Weibull Distribution” (IEEE Trans on Dielectrics and Elec. Insul., 1996: 43-55).

24.46 25.61 26.25 26.42 26.66 27.15 27.31 27.54 27.74 27.94 27.98 28.04 28.28 28.49 28.50 28.87 29.11 29.13 29.50 30.88

Is the population distribution of dielectric breakdown voltage normal?

Page 12: Lecture 7 Sections 2.3 – 2.4 Objectives: More Detailed Summary Quantities − Quartiles and IQR − Boxplots − Quantile Plots.

Review of Concepts