1 Probability and Statistics What is probability? What is statistics?
1-1 What is Statistics? GOALS When you have completed this Part, you will be able to: ONE Understand...
-
Upload
alexis-fitzgerald -
Category
Documents
-
view
217 -
download
0
Transcript of 1-1 What is Statistics? GOALS When you have completed this Part, you will be able to: ONE Understand...
1-1
What is What is Statistics?Statistics?GOALS
When you have completed this Part, you will be able to: ONE Understand why we study statistics. TWO Explain what is meant by descriptive statistics and inferential statistics.
THREEDistinguish between a qualitative variable and a quantitative variable.FOUR Distinguish between a discrete variable and a continuous variable.FIVEDistinguish among the nominal, ordinal, interval, and ratio levels of measurement.
SIX Define the terms mutually exclusive and exhaustive. Goals
1-2
What is Meant by Statistics?
Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting numerical data to assist in making more effective decisions.
1-3
Who Uses Statistics?
Statistical techniques are used extensively by marketing, accounting, quality control, consumers, professional sports people, hospital administrators, educators, politicians, physicians, and many others.
1-4
Types of Statistics
EXAMPLE 2: According to Consumer Reports, General Electric washing machine owners reported 9 problems per 100 machines during 2001. The statistic 9 describes the number of problems out of every 100 machines.
Descriptive StatisticsDescriptive Statistics: Methods of organizing, summarizing, and presenting data in an informative way.
EXAMPLE 1: A Gallup poll found that 49% of the people in a survey knew the name of the first book of the Bible. The statistic 49 describes the number out of every 100 persons who knew the answer.
1-5
Types of Statistics
A PopulationPopulation is a CollectionCollection of all possible individuals, objects, or measurements of interest.
A SampleSample is a portion, or part, of the population of interest
Inferential StatisticsInferential Statistics:: A decision, estimate, prediction, or generalization about a population, based on a sample.
1-6
Types of Statistics(examples of inferential statistics)
Example 2: Wine tasters sip a few drops of wine to make a decision with respect to all the wine waiting to be released for sale.
Example 1: TV networks constantly monitor the popularity of their programs by hiring Nielsen and other organizations to sample the preferences of TV viewers.
Example 3: The accounting department of a large firm will select a sample of the invoices to check for accuracy for all the invoices of the company.# 1
1-7
Types of Variables
G ender E yeC olor
For a Qualitative Qualitative or Attribute VariableAttribute Variable the characteristic being studied is nonnumeric.
T ype of carState of
B irth
1-8
Types of Variables
Number of children in a family
In a Quantitative VariableQuantitative Variable information is reported numerically.
Balance in your checking account
Minutes remaining in class
1-9
Types of Variables
Discrete VariablesDiscrete Variables:: can only assume certain values and there are usually “gaps” between values.
Example: the number of bedrooms in a house, or the number of hammers sold at the local Home Depot (1,2,3,…,etc).
Quantitative variables can be classified as either
DiscreteDiscrete or ContinuousContinuous.
1-10
Types of Variables
The height of students in a class.The height of students in a class.
A Continuous VariableContinuous Variable can assume any value within a specified range.
The pressure in a tireThe pressure in a tire
The weight of a pork chopThe weight of a pork chop
1-11
Summary of Types of Variables
Q u a lita t ive o r a ttrib u te(typ e o f ca r ow n ed )
d isc re te(n u m b er o f ch ild ren )
con tin u ou s(t im e taken fo r an exam )
Q u an tita tive o r n u m erica l
D A TA
1-12
Levels of Measurement
There are four levels of data
Nominal Nominal OrdinalOrdinalIntervalInterval
RatioRatio
1-13
Nominal data
Nominal levelNominal level Data that is classified into categories and cannot be arranged in any particular order.
G ender
E yeC olor
1-14
Levels of Measurement
Mutually exclusiveMutually exclusive
An individual, object, or measurement is included in only one category.
Nominal level variables must be:
ExhaustiveExhaustive Each individual, object, or measurement must appear in one of the categories.
1-15
Levels of Measurement
During a taste test of 4 soft drinks, Coca Cola was ranked number 1, Dr. Pepper number 2, Pepsi number 3, and Root Beer number 4.
Ordinal levelOrdinal level: involves data arranged in some order, but the differences between data values cannot be determined or are meaningless.
1
2
3
4
1-16
Levels of Measurement
Temperature on the Fahrenheit scale.
Interval levelInterval level Similar to the ordinal level, with the additional property that meaningful amounts of differences between data values can be determined. There is no natural zero point.
1-17
Levels of Measurement
M onthly incomeof surgeons
M iles trav eled by salesrepresentativ e in a month
Ratio level:Ratio level: the interval level with an inherent zero starting point. Differences and ratios are meaningful for this level of measurement.
1-18
Describing Data: Frequency Distributions Describing Data: Frequency Distributions and Graphic Presentationand Graphic Presentation
GOALSWhen you have completed this Part, you will be able to:
ONE Organize data into a frequency distribution.
TWO Portray a frequency distribution in a histogram, frequency polygon, and cumulative frequency polygon.
THREEPresent data using such graphic techniques as line charts, bar charts, and pie charts.
Goals
1-19
Frequency Distribution
A Frequency DistributionFrequency Distribution is a grouping of data into mutually exclusive
categories showing the number of observations in each class.
1-20
Determining the question to be addressed
Constructing a frequency distribution involves:
Constructing a frequency distribution
1-21
Determining the question to be addressed
Constructing a frequency distribution involves:
Collecting raw data
Constructing a frequency distribution
1-22
Determining the question to be addressed
Constructing a frequency distribution involves:
Collecting raw data
Organizing data (frequency distribution)
Constructing a frequency distribution
1-23
Determining the question to be addressed
Constructing a frequency distribution involves:
Collecting raw data
Organizing data (frequency distribution)
Presenting data (graph)
Constructing a frequency distribution
1-24
Determining the question to be addressed
Constructing a frequency distribution involves:
Collecting raw data
Organizing data (frequency distribution)
Presenting data (graph)
Drawing conclusions
Constructing a frequency distribution
1-25
Collecting raw data
Organizing data (frequency distribution)
Presenting data (graph)
Drawing conclusions
1.5 3.5 5.5 7.5 9.5 11.5 13.5
5
10
15
20
Constructing a frequency distribution
1-26
Class MidpointClass Midpoint:: A point that divides a class into two equal parts. This is the average of the upper and lower class limits.
Class FrequencyClass Frequency: The number of observations in each class.
Class intervalClass interval: The class interval is obtained by subtracting the lower limit of a class from the lower limit of the next class. The class intervals should be equal.
Definitions
1-27
EXAMPLE 1
15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6.
Organize the data into a frequency distribution.
Dr. Tillman is Dean of the School of Business Socastee University. He wishes prepare to a report showing the number of hours per week students spend studying. He selects a random sample of 30 students and determines the number of hours each student studied last week.
1-28
Example 1 continued
Step One:Step One: Decide on the number of classes using the formula
22kk > n > nwhere k=number of classes
n=number of observations
oThere are 30 observations so n=30.
oTwo raised to the fifth power is 32.
oTherefore, we should have at least 5 classes, i.e., k=5.
1-29
where H=highest value, L=lowest value
33.8 – 10.3 5
= 4.7
Step TwoStep Two: Determine the class interval or width using the formula
H – LH – L kk
i > =
Round up for an interval of 5 hours.
Set the lower limit of the first class at 7.5 hours, giving a total of 6 classes.
Example 1 continued
1-30
EXAMPLE 1 continued
Hours studying Frequency, f
7.5 up to 12.5 1
12.5 up to 17.5 12
17.5 up to 22.5 10
22.5 up to 27.5 5
27.5 up to 32.5 1
32.5 up to 37.5 1
Step ThreeStep Three: Set the individual class limits andSteps Four and FiveSteps Four and Five: Tally and count the number of items in each class.
1-31
Class MidpointClass Midpoint: find the midpoint of each interval, use the following formula: Upper limit + lower limit
2Hours
studying Midpoint f
7.5 up to 12.5 (12.5+7.5)/2 =10.0 1
12.5 up to 17.5 (17.5+12.5)/2=15.0 12
17.5 up to 22.5 (22.5+17.5)/2=20.0 10
22.5 up to 27.5 (27.5+22.5)/2=25.0 5
27.5 up to 32.5 (32.5+27.5)/2=30.0 1
32.5 up to 37.5 (37.5+32.5)/2=35.0 1
Example 1 continued
1-32
Hours f Relative Frequency
7.5 up to 12.5 1 1/30=.0333
12.5 up to 17.5 12 12/30=.400
17.5 up to 22.5 10 10/30=.333
22.5 up to 27.5 5 5/30=.1667
27.5 up to 32.5 1 1/30=.0333
32.5 up to 37.5 1 1/30=.0333 TOTAL 30 30/30=1
Example 1 continued
A Relative Frequency DistributionRelative Frequency Distribution shows the percent of observations in each class.
1-33
Graphic Presentation of a Frequency Distribution
A Histogram is a graph in which the class midpoints or limits are marked on the horizontal axis and the class frequencies on the vertical axis.
The class frequencies are represented by the heights of the bars and the bars are drawn adjacent to each other.
The three commonly used graphic forms are Histograms, Frequency PolygonsHistograms, Frequency Polygons, and a
Cumulative FrequencyCumulative Frequency distribution.
1-34
Histogram for Hours Spent Studying
0
2
4
6
8
10
12
14
10 15 20 25 30 35
Hours spent studying
Fre
qu
ency
midpoint
1-35
Graphic Presentation of a Frequency Graphic Presentation of a Frequency DistributionDistribution
Graphic Presentation of a Frequency Distribution
A Frequency PolygonFrequency Polygon consists of line segments connecting the points formed by the class midpoint and the class frequency.
1-36
Frequency PolygonFrequency Polygon for Hours for Hours Spent StudyingSpent Studying
0
2
4
6
8
10
12
14
10 15 20 25 30 35
Hours spent studying
Fre
qu
en
cy
Frequency Polygon for Hours Spent Studying
1-37
A Cumulative Cumulative Frequency Frequency DistributionDistribution is used to determine how many or what proportion of the data values are below or above a certain value.
Cumulative Frequency DistributionCumulative Frequency Distribution
To create a cumulative frequency polygon, scale the upper limit of each class along the X-axis and the corresponding cumulative frequencies along the Y-axis. Cumulative Frequency distribution
1-38
Cumulative Frequency Table for Hours Spent Cumulative Frequency Table for Hours Spent StudyingStudying
Hours Studying
Upper Limit
f Cumulative Frequency
7.5 up to 12.5 12.5 1 1
12.5 up to 17.5 17.5 12 13 (1+12)
17.5 up to 22.5 22.5 10 23 (13+10)
22.5 up to 27.5 27.5 5 28 (23+5)
27.5 up to 32.5 32.5 1 29 (28+1)
32.5 up to 37.5 37.5 1 30 (29+1)
Cumulative frequency table
1-39
Cumulative Frequency Distribution Cumulative Frequency Distribution For Hours StudyingFor Hours Studying
0
5
10
15
20
25
30
35
10 15 20 25 30 35
Hours Spent Studying
Frequency
Cumulative frequency distribution
1-40
Line graphs are typically used to show the change or trend in a variable over time.
Year Males Females1992 30.5 32.91993 30.8 33.21994 31.1 33.51995 31.4 33.81996 31.6 34.01997 31.9 34.31998 32.2 34.61999 32.5 34.92000 32.8 35.22001 33.2 35.52002 33.5 35.8
Line Graphs
1-41
U.S. median age by gender
25
30
35
40
Med
ian
Age
Males
Females
Example 3 continued
1-42
Construct a bar chart for the number of unemployed per 100,000 population for selected cities during 2001
City Number of unemployed per 100,000 population
Atlanta, GA 7300 Boston, MA 5400 Chicago, IL 6700
Los Angeles, CA 8900 New York, NY 8200
Washington, D.C. 8900
A Bar Chartar Chart can be used to depict any of the levels of measurement (nominal, ordinal, interval, or ratio).
Bar Chart
1-43
Bar Chart for the Unemployment Data
7300
5400
6700
89008200
8900
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1 2 3 4 5 6
Cities
# u
nem
plo
yed
/100
,000
Atlanta
Boston
Chicago
Los Angeles
New York
Washington
1-44
Pie Chart
A sample of 200 runners were asked to indicate their favorite type of running shoe. Draw a pie chart based on the following information.
Type of shoe # of runners % of total
Nike 92 46.0
Adidas 49 24.5
Reebok 37 18.5
Asics 13 6.5
Other 9 4.5
A Pie ChartPie Chart is useful for displaying a relative frequency distribution. A circle is divided proportionally to the relative frequency and portions of the circle are allocated for the different groups.
1-45
Pie Chart for Running Shoes
46%
24,50%
18,50%6,50%
4,50%
Nike
Adidas
ReebokAsics
Other
Pie Chart for Running Shoes
1-46
Describing Data: Numerical MeasuresDescribing Data: Numerical Measures
GOALSWhen you have completed this Part, you will be able to:ONECalculate the arithmetic mean, median, mode, weighted mean, and the geometric mean.TWO Explain the characteristics, uses, advantages, and disadvantages of each measure of location.THREEIdentify the position of the arithmetic mean, median, and mode for both a symmetrical and a skewed distribution.
Goals
3- 46
1-47
FOUR
Compute and interpret the range, the mean deviation, the variance, and the standard deviation of ungrouped data.
Describing Data: Numerical MeasuresDescribing Data: Numerical Measures
FIVEExplain the characteristics, uses, advantages, and disadvantages of each measure of dispersion.
SIX
Understand Chebyshev’s theorem and the Empirical Rule as they relate to a set of observations.
Goals
3- 47
1-48
Characteristics of the Mean
It is calculated by summing the values and dividing by the number of values.
It requires the interval scale.All values are used.It is unique.The sum of the deviations from the mean is 0.
The Arithmetic MeanArithmetic Mean is the most widely used measure of location and shows the central value of the data.
The major characteristics of the mean are: A verag e J oe
3- 48
1-49
Population Mean
N
X
where µ is the population meanN is the total number of observations.X is a particular value. indicates the operation of adding.
For ungrouped data, the
Population MeanPopulation Mean is the sum of all the population values divided by the total number of population values:
3- 49
1-50
Example 1
500,484
000,73...000,56
N
X
Find the mean mileage for the cars.
A ParameterParameter is a measurable characteristic of a population.
The Kiers family owns four cars. The following is the current mileage on each of the four cars.
56,000
23,000
42,000
73,000
3- 50
1-51
Sample Mean
n
XX
where n is the total number of values in the sample.
For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values:
3- 51
1-52
Example 2
4.155
77
5
0.15...0.14
n
XX
A statisticstatistic is a measurable characteristic of a sample.
A sample of five executives received the following bonus last year ($000):
14.0, 14.0, 15.0, 15.0, 17.0, 17.0, 16.0, 16.0, 15.015.0
3- 52
1-53
Properties of the Arithmetic Mean
Every set of interval-level and ratio-level data has a mean.
All the values are included in computing the mean.
A set of data has a unique mean.
The mean is affected by unusually large or small data values.
The arithmetic mean is the only measure of location where the sum of the deviations of each value from the mean is zero.
PropertiesProperties of the Arithmetic Mean 3- 53
1-54
Example 3
0)54()58()53()( XX
Consider the set of values: 3, 8, and 4. The meanmean is 5. Illustrating the
fifth property
3- 54
1-55
Weighted Mean
)21
)2211
...(
...(
n
nnw
www
XwXwXwX
The Weighted MeanWeighted Mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1,
w2, ...,wn, is computed from the following formula:
3- 55
1-56
Example 4
89.0$50
50.44$1515155
)15.1($15)90.0($15)75.0($15)50.0($5
wX
During a one hour period on a hot Saturday afternoon cabana
boy Chris served fifty drinks. He sold five drinks for $0.50, fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10. Compute the
weighted mean of the price of the drinks.
3- 56
1-57
The Median
There are as many values above the median as below it in the data array.
For an even set of values, the median will be the arithmetic average of the two middle numbers and is
found at the (n+1)/2 ranked observation.
The MedianMedian is the midpoint of the values after they have been ordered from the smallest to the largest.
3- 57
1-58
The ages for a sample of five college students are:21, 25, 19, 20, 22.
Arranging the data in ascending order gives:
19, 20, 21, 22, 25.
Thus the median is 21.
The median (continued)
3- 58
1-59
Example 5
Arranging the data in ascending order gives:
73, 75, 76, 80
Thus the median is 75.5.
The heights of four basketball players, in inches, are: 76, 73, 80, 75.
The median is found at the (n+1)/2 =
(4+1)/2 =2.5th data point.
3- 59
1-60
Properties of the Median
There is a unique median for each data set.
It is not affected by extremely large or small values and is therefore a valuable measure of location when such values occur.
It can be computed for ratio-level, interval-level, and ordinal-level data.
It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class.
Properties of the Median
3- 60
1-61
The Mode: Example 6
Example 6Example 6:: The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of 81 occurs the most often, it is the mode.
Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes, trimodal, and the like.
The ModeMode is another measure of location and represents the value of the observation that appears most frequently.
3- 61
1-62
Symmetric distributionSymmetric distribution: A distribution having the same shape on either side of the center
Skewed distributionSkewed distribution: One whose shapes on either side of the center differ; a nonsymmetrical distribution.
Can be positively or negatively skewed, or bimodal
The Relative Positions of the Mean, Median, and Mode
3- 62
4-63
Skewness is the measurement of the lack of symmetry of the distribution.
The coefficient of skewness can range from -3.00 up to 3.00 when using the following formula:
A value of 0 indicates a symmetric distribution.
Some software packages use a different formula which results in a wider range for the coefficient.
s
MedianXsk
3
1-64
The Relative Positions of the Mean, Median, and Mode: Symmetric Distribution
Zero skewness Mean
=Median
=Mode
M o d e
M ed ia n
M ea n
3- 64
1-65
The Relative Positions of the Mean, Median, and Mode: Right Skewed Distribution
Positively skewed: Mean and median are to the right of the mode.
Mean>Median>Mode
M o d e
M ed ia n
M ea n
3- 65
1-66
Negatively Skewed: Mean and Median are to the left of the Mode.
Mean<Median<Mode
The Relative Positions of the Mean, Median, and Mode: Left Skewed Distribution
M o d eM ea n
M ed ia n
3- 66
1-67
Geometric Mean
GM X X X Xnn ( )( )( )...( )1 2 3
The geometric mean is used to average percents, indexes, and relatives.
The Geometric MeanGeometric Mean (GM) of a set of n numbers is defined as the nth root of the product of the n numbers. The formula is:
3- 67
1-68
Example 7
The interest rate on three bonds were 5, 21, and 4 percent.
The arithmetic mean is (5+21+4)/3 =10.0.
The geometric mean is
49.7)4)(21)(5(3 GM
The GM gives a more conservative profit figure because it is not heavily weighted by the rate of 21percent.
3- 68
1-69
Geometric Mean continued
1period) of beginningat (Value
period) of endat Value(nGM
Another use of the geometric mean is to determine the percent increase in sales, production or other business or economic series from one time period to another.
Grow th in Sales 1999-2004
0
10
20
30
40
50
1999 2000 2001 2002 2003 2004
Year
Sal
es in
Milli
ons(
$)
3- 69
1-70
Example 8
0127.1000,755
000,8358 GM
The total number of females enrolled in American colleges increased from 755,000 in 1992 to 835,000 in 2000. That is, the geometric mean rate of increase is 1.27%.
3- 70
1-71
Variance:Variance: the arithmetic mean of the squared
deviations from the mean.
Standard deviationStandard deviation: The square root of the variance.
Variance and standard Deviation
3- 71
1-72
Not influenced by extreme values.The units are awkward, the square of the
original units. All values are used in the calculation.
The major characteristics of the
Population VariancePopulation Variance are:
Population Variance
3- 72
1-73
Population VariancePopulation Variance formula:
(X - )2
N =
X is the value of an observation in the population
m is the arithmetic mean of the population
N is the number of observations in the population
Population Standard DeviationPopulation Standard Deviation formula:
2Variance and standard deviation
3- 73
1-74
(-8 .1 -6 .6 2 ) 2 + (-5 .1 -6 .6 2 ) 2 + ... + (2 2 .1 -6 .6 2 ) 2
2 5
= 4 2 .2 2 7
= 6 .4 9 8
In Example 9, the variance and standard deviation are:
(X - )2
N =
Example 9 continued
3- 74
1-75
Sample variance (sSample variance (s22))
s 2 =(X - X ) 2
n -1
Sample standard deviation (s)Sample standard deviation (s)
2ss
Sample variance and standard deviation
3- 75
1-76
40.75
37
n
XX
30.515
2.2115
4.76...4.77
1
2222
n
XXs
Example 11
The hourly wages earned by a sample of five students are:
$7, $5, $11, $8, $6.
Find the sample variance and standard deviation.
30.230.52 ss
3- 76
4-77
Using the twelve stock prices, we find the mean to be 84.42, standard deviation, 7.18, median, 84.5.
Coefficient of variation
= 8.5%%)100(X
sCV
Coefficient of skewness
= -.035
Example 2 revisited
sMedianXsk
3
1-78
Chebyshev’s theorem:Chebyshev’s theorem: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is
at least:
where k is any constant greater than 1.
2
11
k
Chebyshev’s theorem
3- 78
1-79
Empirical RuleEmpirical Rule: For any symmetrical, bell-shaped distribution:
About 68% of the observations will lie within 1s the mean
About 95% of the observations will lie within 2s of the mean
Virtually all the observations will be within 3s of the mean
Interpretation and Uses of the Standard Deviation
3- 79
1-80
Bell -Shaped Curve showing the relationship between and .
68%
95%99.7%
Interpretation and Uses of the Standard Deviation
3- 80
1-81
The Mean of Grouped Data
n
XfX
The MeanMean of a sample of data organized in a frequency
distribution is computed by the following formula:
3- 81
1-82
Example 12
A sample of ten movie theaters in a large metropolitan area tallied the total number of movies showing last week. Compute the mean number of movies showing.
Movies showing
frequency f
class midpoint
X
(f)(X)
1 up to 3 1 2 2
3 up to 5 2 4 8
5 up to 7 3 6 18
7 up to 9 1 8 8
9 up to 11
3 10 30
Total 10 66
6.610
66
n
XX
3- 82
1-83
The Median of Grouped Data
)(2 if
CFn
LMedian
where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class interval.
The MedianMedian of a sample of data organized in a frequency distribution is computed by:
3- 83
1-84
Finding the Median Class
To determine the median class for grouped data
Construct a cumulative frequency distribution.
Divide the total number of data values by 2.
Determine which class will contain this value. For example, if n=50, 50/2 = 25, then determine which class will contain the 25th value.
3- 84
1-85
Example 12 continued
Movies showing
Frequency Cumulative Frequency
1 up to 3 1 1
3 up to 5 2 3
5 up to 7 3 6
7 up to 9 1 7
9 up to 11 3 10
3- 85
1-86
Example 12 continued
33.6)2(3
32
10
5)(2
if
CFn
LMedian
From the table, L=5, n=10, f=3, i=2, CF=3
3- 86
1-87
The Mode of Grouped Data
The modes in example 12 are 6 and 10 and so is bimodal.
The ModeMode for grouped data is approximated by the midpoint of the class with the largest class frequency.
3- 87