UGRC 120 Numeracy Skills - WordPress.com · Measures of Central Location Arithmetic Mean for...
Transcript of UGRC 120 Numeracy Skills - WordPress.com · Measures of Central Location Arithmetic Mean for...
College of Education
School of Continuing and Distance Education 2014/2015 – 2016/2017
UGRC 120
Numeracy Skills
Session 6
SUMMARY STATISTICS
Lecturer: Dr. Ezekiel N. N. Nortey/Mr. Enoch Nii Boi Quaye, Statistics Contact Information: [email protected]/[email protected]
Session Overview
OVERVIEW
• Real-life datasets come in large forms. A common feature of large dataset is that of implicit characteristics. Details such as mean, mode, median and other measures which describe the data are mostly concealed.
• Session 6 introduces students to summary statistics, the fundamental statistical concept used to describe data structure for useful inferences.
• We will learn about the varied type forms of distributions which a dataset may follow. Accurate data description will inform the student on which advanced statistical tests to adopt for data analysis.
Dr. Ezekiel N. N. Nortey, Department Of Statistics Slide 2
Goals and Objectives
At the end of the session, the student will be able to:
• Understand and apply the index and sigma notations.
• Compute the measures of central tendency.
• Make comparison of the mean, mode and median as they relate to the shape of the curve (Normal and Skewed Distributions).
• Carry out computations involving the range, semi-inter-quartile range, the mean deviation and the standard deviation.
• Give practical interpretation of each of these measures.
• Compute the above related measures in excel.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 3
Session Outline
The key topics to be covered in the session are as follows:
• Measures of Central Location and Other Location Measures
• Measures of Dispersion
• Some Standardized Measures and their Uses
• Measures of Symmetry and Skewness
• The Five Number Summary and the Box-and-Whisker Plot
• Exploratory Data Analysis using Microsoft Excel
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 4
Reading List
• Refer to Unit 5 of Recommended Text:
Nortey, E. and Afrim, J. (2013). Numeracy skills: The basics and beyond. Accra: Dieco Ventures
• Refer to Chapter 2 of:
Mendenhall, W., Beaver R. J. and Beaver, B. M. (2009). Introduction to probability and statistics. USA: Brooks/Cole
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 5
MEASURES OF CENTRAL LOCATION AND OTHER LOCATION MEASURES
Topic One
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 6
INTRODUCTION
• Quantitative data arranged in the form of frequency distributions generally exhibit certain common characteristics:
– they have the tendency to concentrate at certain values, usually somewhere near the center of the distribution.
• This tendency of observations to cluster near the central portion of the distribution is known as Central Tendency and can be measured statistically.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 7
INTRODUCTION - Continued
• A measure of a central tendency or an average is regarded as the most representative value of the given data or observed values.
• This is because it is determined at the point where the concentration of the values is the greatest, i.e., the frequency is highest on a distribution scale of measurement.
• A central tendency is thus a precise yet simple expression representing a series of divergent individuals, or in other words, it is the consolidated essence of a complex distribution.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 8
INTRODUCTION - Continued
• Central tendencies serve the following purposes:
1. They provide a condensed or consolidated formulation of a large quantity of numerical data which ordinarily cannot be easily interpreted. For example, it might not just be possible to memorise the individual performances of students in an examination, but quite simple to remember their average performance.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 9
INTRODUCTION - Continued 2. They afford us a basis for comparison with other similar
grouped data. For example, it is impossible to compare the weight of individual babies born in a particular hospital with every baby born in another hospital, but it is possible to compare the mean weights of babies born in the two hospitals.
The following types of central tendencies will be discussed:
– The arithmetic mean;
– The median
– The mode.
These are the central tendencies that are in common use.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 10
Notations and Symbols
• Computations in statistics make use of basic knowledge of index notation and the summation notation. We will learn how to develop and use index and summation notations.
Index or Subscript Notation
• The index notation is a symbolic terminology that statisticians use to represent the variables they use in computations. To develop symbols for a set of variables we have to start with some assumptions.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 11
Notations and Symbols - Continued
• Lets us say we want to let the symbol 𝑋𝑗 (read 𝑋 sub 𝑗) denote or represent any of the 𝑁 values 𝑋1, 𝑋2, 𝑋3, … , 𝑋𝑁 that is assumed by a variable 𝑋.
• The letter 𝑗 in 𝑋𝑗, which can stand for any of the numbers 1, 2, 3, … ,𝑁 is called an index or subscript.
• Any other small letter of the alphabet other than 𝑗 for example 𝑖, 𝑘, 𝑙 can also be used as an index or subscript
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 12
Summation Notation
• If we now use the knowledge of the index notation we can develop the
summation notation. The symbol 𝑋𝑗 𝑁𝑗=1 is used to denote/represent
the sum of all the 𝑋𝑗 variables from 𝑗 = 1 to 𝑗 = 𝑁 i.e.
𝑋𝑗
𝑁
𝑗=1
= 𝑋1 + 𝑋2 +⋯+ 𝑋𝑁
• Without loss of generality, this sum can be written simply as 𝑋.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 13
Summation Notation - Continued
• The symbol Σ is the Greek capital letter called sigma, which statisticians use to denote “sum up values”.
• Refer to “session 6 examples and activities” NOW and practice with example 1.1 and 1.2
• Refer to “session 6 examples and activities” NOW and complete Activity 1.1
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 14
Summation Notation - Continued
NOTE
• We will now use our ability to expand an expression and write in symbols to begin computations involving measures of central location.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 15
Measures of Central Location
• As stated earlier in the introduction of this section, a measure of central location or average is that value which is typical or representative of a set of data. The value tends to lie centrally within a set data arranged according to magnitude.
• Such measures of central tendency serve two purposes:
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 16
Measures of Central Location
a) It is a concise, brief and economic description of a mass of data and
b) It is also a simple measure that represents all the measures in a sample and it enables us to compare two or more distributions.
Several types of measures of central location/tendency exist. The most common is the arithmetic mean. Others are the median, the mode, the geometric mean and the harmonic mean. Each of them has advantages and disadvantages depending on the data and the intended purpose of the data.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 17
Measures of Central Location
The Arithmetic Mean
The arithmetic mean is what the ordinary man calls the average. The arithmetic mean or “mean” is calculated by adding up all the values/variables and divide the total by the number of values/variables.
Therefore the mean of a set of 𝑁 variables: 𝑋1, 𝑋2, 𝑋3, … , 𝑋𝑁 is denoted by 𝑋 (read 𝑋 bar) and defined as follows:
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 18
𝑋 = 𝑋
𝑁=𝑋1 + 𝑋2 +⋯+ 𝑋𝑁
𝑁
• Refer to “session 6 examples and activities” NOW and practice with example 1.3
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 19
Measures of Central Location
Measures of Central Location
Arithmetic Mean for Grouped Data
Where the data has been grouped in a frequency table the calculation is different.
If 𝑋 is a variable having values 𝑥1, 𝑥2, 𝑥3, … , 𝑥𝑘 occurring with respective frequencies 𝑓1, 𝑓2, 𝑓3, … , 𝑓𝑘, then the arithmetic mean of the given data may be obtained by the formula:
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 20
Measures of Central Location
𝑋 =𝑓1𝑥1 + 𝑓2𝑥2 +⋯+ 𝑓𝑘𝑥𝑘𝑓1 + 𝑓2 +⋯+ 𝑓𝑘
= 𝑓𝑥
𝑓 = 𝑓𝑥
𝑁 (5.1)
• Refer to “session 6 examples and activities” NOW and practice with example 1.4
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 21
Measures of Central Location
Quite often, we have data in the form of frequency distributions in which there are regular class intervals.
If we assume that each class interval can be represented by its midpoint, then the formula above can easily be applied and the procedure can be simplified appropriately as illustrated in the following example.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 22
Measures of Central Location
• Refer to “session 6 examples and activities” NOW and practice with example 1.5
• Refer to “session 6 examples and activities” NOW and complete Activity 1.2
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 23
The Assumed Mean Method
• If 𝐴 is any guessed or assumed mean (any value of 𝑥) then let 𝑑 = 𝑥 − 𝐴 be the deviations of 𝑥 from 𝐴 then 𝑋 is computed as follows:
𝑋 = 𝐴 + 𝑓𝑑
𝑁 (5.2)
• Refer to “session 6 examples and activities” NOW and practice with example 1.6
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 24
Coding Method
• Instead of using 𝑋 as in formula 5.1 we create a new variable 𝑢 such that
𝑢 =𝑥 − 𝐴
𝑐
where 𝒄 is the equal class interval sizes of the data provided and 𝑨 is any Assumed mean value which is any of the 𝑋 values picked up from the data. Usually, a value close to the middle of the data is taken.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 25
Coding Method
Then
𝑢 = 𝑓𝑢
𝑓= 𝑓𝑢
𝑁
to find the arithmetic mean of the data, we use the formula
𝑋 = 𝐴 + 𝑐𝑢 5.3
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 26
• Refer to “session 6 examples and activities” NOW and practice with example 1.7
• Refer to “session 6 examples and activities” NOW and complete Activity 1.3
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 27
Coding Method
• When the class intervals are of equal sizes then the Assumed Mean or the coding method is advisable.
• When the class intervals are of unequal size the prior methods is recommended.
• The mean is used when the greatest reliability of the data is required. We also use it when the distribution is reasonably symmetrical and when subsequent statistical calculations are to be made.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 28
Coding Method
Properties of the Arithmetic Mean
I. The mean is affected by all the values in the data set, which is good so far as it makes the mean a very representative central tendency. It may however not be a good representative of the data if there are large deviations from the central value.
II. It is easy to compute and is capable of algebraic manipulations.
III. It is determinate i.e. unique.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 29
iv. It is a typical representative for all the values in the data set.
v. Its value may be substituted for each value in the data set without changing the total. i.e. 𝑓𝑥 = 𝑓𝑥 = 𝑁𝑥
vi. The algebraic sum of the deviations of a set of numbers from their arithmetic mean is zero
i.e. 𝑓 𝑥 − 𝑥 = 𝑓𝑥 − 𝑓𝑥 = 𝑁𝑥 − 𝑁𝑥 = 0
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 30
Properties of the Arithmetic Mean
Advantages of the Mean The arithmetic mean is the most widely used measure of central tendency
i. Its definition is clear and precise. It corresponds to the centre of gravity of the observations.
ii. It is simple to understand and easy to compute.
iii. It uses each and every item in its computation
iv. It has a determinate value and is rigidly defined.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 31
v. It can be subjected to further algebraic treatment and advanced statistical theory is based on it.
vi. It can be found even if only the sum of the values is known and the individual values are not known.
vii. It provides a good standard of comparison since extreme values can cancel each other out when the number of observations is large.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 32
Advantages of the Mean
Disadvantages of the Mean
i. It may be influenced greatly by unrepresentative values. In such cases the representative character of the mean is lost.
ii. It gives greater importance to larger values and less importance to smaller values. i.e. it has an upward bias.
iii. It cannot be computed if one or two values are missing in the data set.
iv. It cannot be located just by inspection like the median and the mode.
v. It may conceal facts and may lead to distorted conclusions.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 33
The Median
Raw (Ungrouped) Data
• The median of a set of raw data arranged in order of magnitude is the middle value when the number of values is odd, or it is the mean of the two middle values, when the number of values is even.
• For example the median for the following values: 3 , 4 , 6 , 8 , 10 , 12 , 14 is 8 , this value is the 4𝑡ℎ ranked value.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 34
The Median
Note that 4𝑡ℎ ranked value
7 + 1
2
𝑡ℎ
ranked value.
Also, finding the median for the following data: 5, 7, 9, 11, 12, 15, 18, 20 then we have the median as
11 + 12
2= 11.5
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 35
The Median
• The median is defined as that value which divides a distribution so that an equal number of items or values occur on either side of it.
• Note also that the median cannot be found for disorderly data. We must first arrange the data in order of magnitude.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 36
The Median
• Generally, if there are 𝑛 observations, the median is the value of the
𝑛 + 1
2
𝑡ℎ
ranked item.
• Rule 1: If the ranked item results in a whole number, then the median is value that occurs at that ranked value.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 37
The Median
Rule 2: If the ranked item results in a fractional half (2.5, 3.5, 𝑒𝑡𝑐), then the median is equal to the average of the corresponding ranked values within which the result falls.
• Refer to “session 6 examples and activities” NOW and complete Activity 1.4
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 38
Median for Grouped Data
• When data has been grouped into a frequency table we cannot determine the median using the method used for raw data.
• To determine the median for the age distribution of children in a frequency table, we first find
𝑛
2 of the
cases and working from either ends of the table, we interpolate to determine the value of the median at 𝑛
2.
• If we decide to start from the lower end of the table, the median is computed from the following expression:
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 39
Median for Grouped Data
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿1 +
𝑛2− 𝑓 1
𝑓𝑚𝑒𝑑× 𝑐 5.4
where;
𝐿1 is the Lower class boundary of class in which median falls. 𝑛 is the total frequency. (Σ𝑓)1 is the sum of frequencies in classes below the median class. 𝑓𝑚𝑒𝑑 is the frequency of median class. 𝑐 is the class width for the table
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 40
Median for Grouped Data
• Refer to “session 6 examples and activities” NOW and practice with example 1.8
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 41
Estimating Median by Graph
• The median may also be located using a graph of the cumulative frequency (ogive). This is done as follows:
– Draw a perpendicular to the vertical axis at 𝑛 2
– Extend the perpendicular till it intersects the less than line graph
– From the point of intersection draw another perpendicular to intercept the horizontal axis.
– The point of intersection on the horizontal axis is the value of the median.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 42
Uses of the Median
i. The median is used when the distributions are badly skewed.
ii. It is also used when a quick and calculated average is required.
iii. Finally when extreme cases are likely to affect the mean disproportionately the median is not useful.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 43
Properties of the Median
i. Geometrically, the median divides a histogram or a frequency curve into two parts with equal areas.
ii. It is unaffected by the magnitude of extreme deviations from the average.
iii. It may be located even when the observations in the data cannot be measured quantitatively, so long as they can be ranked or arranged in order of magnitude.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 44
Advantages of the Median
i. It can easily be understood and its computation is simple. ii. It can be obtained even with incomplete data. It is only concerned with only
a few central observations. iii. It balances the number of observations in a distribution. Quite useful in
describing scores, ratios and grades. iv. It is useful in the case of skewed distributions like those of incomes and
prices. v. It can be used for qualitative data. vi. In the case of open-ended classes, the median can be calculated but the
mean cannot. vii. It can easily be determined graphically. (See Subsection: Estimating the
median from graphs).
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 45
Disadvantages of the Median
i. Re-arrangement of data values is necessary to compute the median.
ii. The median is not easily capable of algebraic manipulations. As such it is not used much in advanced statistical studies.
iii. The empirical formula for the median based on interpolation may not always give correct results.
iv. It ignores significant extreme values.
v. Weighting cannot be used in the case of the median, as is the case with the mean. The scope of operations with the median is narrowed.
vi. It cannot be computed as exactly as the mean.
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 46
• One of the most important practical applications of the concept of the median is to be found in road construction. It is used to divide the road into two equal halves or into dual carriageways whereby a road median is created in the middle of the road using concrete or hedges.
• Examples of such dual carriageways include the following:
– Accra – Tema Motorway
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 47
The Median
• Refer to “session 6 examples and activities” NOW and complete Activity 1.5
Dr. Ezekiel N. N. Nortey, Department of Statistics Slide 48