Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare...

28
Lecture Notes 2: Variables and graphics 1 Highlights: Quantitative vs. qualitative variables Continuous vs. discrete and ordinal vs. nominal variables Frequency distributions Pie charts Bar charts Histograms and distribution shape Box plots

Transcript of Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare...

Page 1: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Lecture Notes 2:Variables and graphics

1

Highlights:

• Quantitative vs. qualitative variables • Continuous vs. discrete and ordinal vs. nominal

variables • Frequency distributions • Pie charts • Bar charts • Histograms and distribution shape • Box plots

Page 2: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Variable (Data) Types• Variables can be either qualitative or quantitative.

• Quantitative: Numeric - height, weight, number of customers, blood alcohol level

• Quantitative variables have values that we can do sensible math with. Numbers which do not represent quantities are not quantitative.

• Qualitative: Names or categories - eye color type of car, political affiliation, breed of dog. Sometimes qualitative variables are also referred to as categorical.

• Non-quantitative numbers are categorical.

Page 3: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Variable Levels• Quantitative variables come in two levels, continuous

and discrete:

• Quantitative Continuous: Numeric variables which can be given to an arbitrary number of decimal places. Typically, continuous variables are measured. Examples:

Page 4: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Variable Levels• Quantitative Discrete: Numeric variables where only integer responses make

sense. Typically, discrete variables are counted.Examples:

• Note that when a continuous variable is rounded to the nearest integer, it is still considered continuous.

• For instance, rounding temperature to the nearest integer is a common thing to do, but temperature is still considered continuous.

Page 5: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Variable Levels• Qualitative variables also come in two levels, ordinal and

nominal.

• Qualitative Ordinal: These are qualitative variables that are typically placed in a set order.

• If placing the values of a qualitative variable “out of order” would be confusing, then it should probably be treated as ordinal.Examples:

Page 6: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Variable Levels

• Qualitative Nominal: These are qualitative variables in which order does not matter. Most qualitative variables are nominal.Examples:

Page 7: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Data Graphics• We will look at pie charts, bar charts,

histograms, and box plots.

• All of the graphs we will look at show frequency distributions of data. Often this is shortened to just distribution.

• A distribution tells you the values a variable takes on, and the frequency with which those values are taken on.

• So, if we are interested in the distribution of blood types from a bank of donors, I could first show you the data like this…

Page 8: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

B B B B B B A B O O A AB B O B AB A B B B B AB B AB AB O B O AB AB A AB A AB AB O O AB O B AB A O A B B A B A AB B AB A O A B AB A AB B B B AB B B B O A B A B A A B A A AB

Blood Types from a Group of 77 Donors

(This is raw data, not a distribution)

Page 9: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Blood Type

A AB B O

# of Donors

18 18 30 11

…or like this:

This is a frequency distribution, because it tells you the different values that the variable “Blood Type” takes on, as well as how often it takes each value on.

Page 10: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

I could show it to you like this:

This is also a frequency distribution. Does the visual aspect help give meaning to the distribution?

Page 11: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Relative Frequency Distribution• Sometimes it is useful to show relative frequency

rather than just frequency.

• Relative frequency shows the different values a variable takes on, and how often it takes each value on as a proportion of the total.

• Proportions are often denoted as “p”, and given as:

# of observations of interestr e l a t ive f r eq . = p =

to ta l # of observat ions

Page 12: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Relative Frequency DistributionRelative frequency example:

Blood Type

A AB B O Total

# of Donors

18 18 30 11

Relative frequency (p)

Page 13: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Hate stats (20)

Like stats (43)

Open mind (198)

ST301 Student Attitudes

attitudes towards statisticsSurvey results for students'

Pie Charts• Pie charts can be used to

summarize one qualitative variable

• Pie slices represent the proportion of observations in a class

• Sometimes frequency results are also included

• The more categories you have, the more difficult the pie chart will be to read.

Page 14: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Death (11)

No opinion ( 2)

Neither ( 1)Depends ( 1)Life in prison

(14)

Females

No opinion ( 1)

Neither ( 2)

Depends ( 3)Life in prison ( 3)

Death (22)

Males

Two pie charts • Multiple pie charts can be

used to compare two different groups. Here, the pie charts compare attitudes of females and males toward the appropriate punishment for murder.

• Often it is tough to make direct comparisons using pie charts.

Page 15: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Bar charts• Bar charts can be

used wherever pie charts are used.

• Like pie charts, they are used to show the distribution of a qualitative variable.

• Each bar in a bar chart shows you the frequency (or count) for the group it is associated with.

Num

ber C

augh

t0

13

25

38

50

Species

Brown

Brook

Rainbo

w

Cutthro

at

Lrg.M

th

Small M

th.

Walley

e

Salmon

Sunfis

h

Bluegil

l

Perch

The chart above shows the frequency of catches for different species of fish.

Page 16: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Bar charts vs. pie chartsHere is the intro stats grade distribution bar chart from before, alongside a pie chart of the same data.

• Bar charts make comparing categories easier

• For example, It isn’t immediately obvious that the “A” slice is the same size as the “AB” slice. But it is obvious that the “A” bar is the same height as the “AB” bar.

Page 17: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

The HistogramHistogram of Stat311 Heights (Inches)Frequency

60 65 70

05

1015

Page 18: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

The Histogram

• A histogram displays the distribution of a quantitative variable.

• The difference between a histogram and a bar chart is that bar charts are for qualitative data and histograms are for quantitative data.

• With bar charts, each bar represents a different distinct group. With histograms, each bar represents the number of observations which fall into an interval, also known as a bin.

Page 19: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

The Histogram• Note that the number of bins on a histogram is

arbitrary. The larger the bin size, the fewer bins there will be.

• Changing the number of bins can produce different looking histograms, even if the underlying data is exactly the same.

• The following four histograms represent the exact same data:

Page 20: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Frequency

0 10 20 30 40 50 60

010

2030

40

Frequency

0 10 20 30 40 50 60

020

4060

80

Frequency

0 10 20 30 40 50 60

010

2030

4050

60

Frequency

0 10 20 30 40 50 600

24

68

1012

14

Note: If an observation falls directly on a bin endpoint, it is typical to place that observation in the bar to the left of its value. But this is not a hard and fast rule.

Page 21: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Histogram exampleLet’s briefly construct two different histograms that represent the same simple dataset below:

Heights of 10 randomly selected statistics students65 67 66 69 69 66 64 64 63 72

Page 22: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Distribution Shape• Looking at a histogram allows

us to discern a distribution’s shape.

• When there are lots of low values and just a few high values the distribution is said to be skewed to the right, or positively skewed

Page 23: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Distribution Shape

• When there are lots of high values and just a few low values the distribution is said to be skewed to the left, or negatively skewed

• The skewedness of this histogram is not as dramatic compared to that of the previous histogram.

Page 24: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Distribution Shape

• When the two halves of the histogram look approximately like mirror images the distribution is said to be (almost) symmetrical.

• We say “almost” symmetrical because it is unlikely that a histogram of data will be perfectly symmetrical.

Page 25: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Distribution Shape• When there are two peaks in a

histogram, we say that the data is bimodal

• The mode is the most common value in a distribution.

• Bimodality may indicate that there are two distinct groups being combined into one dataset.

Histogram of Heights (Inches)

Frequency

60 65 70 75

05

1015

2025

Page 26: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Boxplots• Like histograms, boxplots are used to display the

distribution of a quantitative variable.

• The shape of the distribution as well as the presence of any possible outliers is easily discerned from the boxplot. These outliers are drawn as dots.

• Boxplots are also useful for comparing multiple groups of data side by side.

Page 27: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Boxplot graphics• The boxplot is sometimes called

the box and whiskers plot.

• In a boxplot, half the data lies above the thick black line and half lies below it.

• Also, half the data lies inside the box, and half lies outside

• The dots are outliers.

We will discuss boxplots in detail in the next set of notes

Page 28: Lecture Notes 2 - Colorado State Universityvollmer/stat307pdfs/LN2_2017.pdfpie charts compare attitudes of females and males toward the appropriate punishment for murder. • Often

Graphic Summaries• Graphs are tools that allow us to give meaning to a set of

data. They should give the reader a better understanding of what is going on than can be achieved by just looking at the raw data.

• “A picture is worth a thousand words.” In the case of statistics, a picture is also worth a whole lot of numbers.

• In the next set of notes, we will discuss some common statistics that can be used to summarize a set of data.