SCM300 Survey Design Lecture 3 Summarising Data For use in fall semester 2015 Lecture notes were...
-
Upload
elvin-warren -
Category
Documents
-
view
213 -
download
0
Transcript of SCM300 Survey Design Lecture 3 Summarising Data For use in fall semester 2015 Lecture notes were...
SCM300 Survey Design
Lecture 3Summarising Data
For use in fall semester 2015Lecture notes were originally designed by Nigel Halpern. This lecture set may be modified during the semester.
Last modified: 4-8-2015
SCM300 Survey Design
Lecture Aim & Objectives
Aim• To investigate pictorial & statistical methods of
analysing quantitative data
Objectives• Pictorial representation of quantitative data• Statistical representation of quantitative data
SCM300 Survey Design
Pictorial Representation
• Levels of measurement• Tables & frequency distributions• Charts, plots, graphs & pie-charts
SCM300 Survey Design
3 (4) Levels of Measurement
• Nominal variables• Ordinal variables• Interval (& ratio) variables
SCM300 Survey Design
Nominal
• Categories – e.g. gender (m/f), responses (y/n), class of travel (b/l)
• Usually presented as frequencies & categories or %’s– e.g. 45% male, 55% female
• Measure the existence (or not) of a characteristic– But contain limited information
SCM300 Survey Design
Ordinal
• Ordered categories or preferences– e.g. ranked responses from a Likert scale– e.g. finishers in a race (1st, 2nd, 3rd, etc) – e.g. preferred aircraft
• Measure intensity, order or degree– But still limited as they don’t imply distances
• i.e. distance between 1st & 2nd
SCM300 Survey Design
Interval & Ratio
• Ordered & scaled (on equal intervals)– e.g. age in years, temperature
• Measures differences between values– Interval: arbitrary zero
• e.g. temperature (+/-)
– Ratio: absolute zero indicating absence of that variable• e.g. age, income
• High analytical capabilities– e.g. can compare means unlike for nominal or ordinal data
SCM300 Survey Design
Variable type Description Examples
Nominal Classification of responses into mutually exclusive categories
Male/Female
Yes/No
Ordinal Categories are rank ordered
1st/2nd/3rd
Likert
Interval/Ratio Distances between items on scale are equal
Temperature
Age
Levels of Measurement Summary
SCM300 Survey Design
Your turn…..
• What levels of measurement would be derived from each of the following questions1. Gender (male/female)2. Age in years and months (state years/months)3. Do you smoke (yes/no)4. How many cigarettes, on average, do you smoke a day (state no.)5. Number of full years you’ve been smoking (state no.)6. How many minutes exercise do you do, on average, each day (less
than 30mins / 30-59mins / 60+mins)7. To what extent do you think that smoking is bad for your health
(Strongly agree / tend to agree / neither / tend to disagree / strongly disagree)
8. Rank the cigarette brands in order of quality (B&H, Silk Cut, Marlborough)
Variable type Description
Nominal Classification of responses into mutually exclusive categories
Ordinal Categories are rank ordered
Interval/Ratio Distances between items on scale are equal
SCM300 Survey Design
Tables
• Most straight forward pictorial representation• Good method of storing information• Summarises &/or shows patterns in data• Easily made using word-processing or
spreadsheets
• Confusing if constructed poorly• Confusing if they try to show too much
SCM300 Survey Design
Table Considerations
• Should be clear & appropriate• Should be chosen with a purpose in mind
– Not just for the sake of it
• Must include a title & a source of data• Must be referenced & discussed in the text
– Don’t assume that everyone will understand them
SCM300 Survey Design
Table Clarity
• Use a common system of data presentation• Use percentages rather than raw scores for clarity &
comparative capabilities
The above points are particularly relevant if the table includes more than one variable calculated using different units of measurement (AKA ‘cross-tabulation’)
SCM300 Survey Design
Table 1. Passengers at LGW, LHR & MAN, 1999
Socio-economic status
Business passenger
Leisure passengers
Total
A/B 18,607 43,407 62,014
C1 14,345 52,400 66,745
C2 1,386 21,508 22,894
D/E 312 13,035 13,347
Total 34,650 130,350 165,000
Data from a survey of pax at LGW, LHR & MAN (CAA, 2000): - 34,650 Business Pax: A/B=18,607; C1=14,345; C2=1,386; D/E=312 - 130,350 Leisure Pax: A/B=43,407; C1=52,400; C2=21,508; D/E=13,035
Use percentages instead?
SCM300 Survey Design
Table 1. Passengers at LGW, LHR & MAN, 1999 (%)
Socio-economic status
Business passengers
Leisure passengers
Total
A/B 54 33 38
C1 41 40 40
C2 4 17 14
D/E 1 10 8
Total 21 79 100
Data from a survey of pax at LGW, LHR & MAN (CAA, 2000): - 34,650 Business Pax: A/B=18,607; C1=14,345; C2=1,386; D/E=312 - 130,350 Leisure Pax: A/B=43,407; C1=52,400; C2=21,508; D/E=13,035
Easier to interpret?
SCM300 Survey Design
Frequency Distributions
• Standard frequency distribution• Univariate frequency distribution• Grouped frequency distribution• Relative & cumulative frequency distribution
SCM300 Survey Design
• Standard frequency distribution– Presents data
• e.g. “How many return flights did you take last year?”
• Answers from 50 pax as a standard frequency distribution:
Number of return flights taken last year:7 3 10 3 2 4 3 3 6 3 5 2 3 4 2 5 4 3 6 8 4 12 1 3 4 15 5 1 3 1 4 2 3 5 2 3 8 3 4 4 6 3 5 2 4 2 3 2 5 1
Standard Frequency
SCM300 Survey Design
• Univariate frequency distribution– Lists data more clearly & with their frequency– Important for large sample sizes
Univariate Frequency
Flights Frequency Flights Frequency
1 4 7 1
2 8 8 2
3 14 10 1
4 9 12 1
5 6 15 1
6 3
SCM300 Survey Design
• Grouped frequency distribution– Groups all data according to
categories– Further improves clarity
Grouped Frequency
Flights Grouped frequency
1-3 26
4-6 18
7-9 3
10-12 2
13+ 1
Total 50
SCM300 Survey Design
• Relative & cumulative frequency distributions– Relative: each category as a % of the total– Cumulative: add each relative to proceeding
Relative & Cumulative Frequency
Flights Grouped Relative (%) Cumulative (%)
1-3 26 52 52
4-6 18 36 88
7-9 3 6 94
10-12 2 4 98
13+ 1 2 100
Total 50 100
SCM300 Survey Design
Charts, Plots, Graphs & Pie-charts
• Simple bar charts• Compound bar charts• Histograms• Scatter or dot plots• Line graphs• Pie-charts
SCM300 Survey Design
Charts, Plots, Graphs & Pie-charts:Pros & Cons
• Easily made using word-processing or spreadsheets
• Ease of creation can lead to over-elaborate charts at the expense of clarity
SCM300 Survey Design
Charts, Plots, Graphs & Pie-charts:Considerations
• Should be clear & appropriate• Should be chosen with a purpose in mind
– Not just for the sake of it
• Typically include– Title– Labelled axis– Key that explains the different segments– Source of data
• Must be referenced & discussed in the text– Do not assume that everyone will understand them
• Data type will restrict which method is chosen
SCM300 Survey Design
Simple Bar Charts
• Simple bar charts– Horizontal or vertical charts of separate bars that represent
size of data
Student results for SCM300 in 2007
0-39% 40-49% 50-59% 60-69% 70+%
5 9 15 7 3
SCM300 Survey Design
Simple Bar Charts
02468
10121416
Nu
mb
er o
f st
ud
ents
0-39 40-49 50-59 60-69 70+
Grade (%)
Figure 1. Student results for SCM300 in 2007
SCM300 Survey Design
Compound Bar Charts
• Compound bar charts– Show proportions/relative size of groups– Bars will always have same height when % are used but not
when figures are used– For 3+ components, pie-charts may be better
Student results for SCM300 in 2007
0-39% 40-49% 50-59% 60-69% 70%+
Male 4 7 7 2 0
Female 1 2 8 5 3
SCM300 Survey Design
Compound Bar Charts
0%
20%
40%
60%
80%
100%
0-39 40-49 50-59 60-69 70+
Grade (%)
Nu
mb
er o
f st
ud
ents
Female
Male
Figure 1. Student results for SCM300 in 2007
SCM300 Survey Design
Histograms
• Histograms– Similar to bar charts but a better indication of variation &
distribution– Bars are connected instead of separate
SCM300 Survey Design
Histograms
02468
10121416
Nu
mb
er o
f st
ud
ents
0-39 40-49 50-59 60-69 70+
Grade (%)
Figure 1. Student results for SCM300 in 2007
SCM300 Survey Design
This figure indicates repeat visits to Norway & tourists interest in returning but is it easy to understand…..?
SCM300 Survey Design
Scatter or Dot Plots
• Scatter or dot plots– Illustrate the exact distribution of data– Can be used to illustrate continuous data
• BUT a line graph may be better
– Effective for 2 related variables
SCM300 Survey Design
Scatter or Dot Plots
0100 000200 000300 000400 000500 000600 000
Aircraft movements
Pa
ss
en
ge
rs
Figure 1. Passengers & Aircraft Movements at HiMolde Airport
SCM300 Survey Design
Line Graphs
• Line graphs– Show trends over time
• e.g. patterns, peaks & troughs, rates of incline/decline
– Can show more than 1 variable at a time• This can indicate possible relationships
• e.g. see next slide
SCM300 Survey Design
Pie Charts
• Pie-charts– Segments represent cases in each category– Best for 3-6 categories (no more, no less)– Labelling & shading sometimes difficult– Combining categories may improve clarity but loses detail
SCM300 Survey Design
Charts, Plots, Graphs & Pie-chartsSummary
Variable type Bar Pie Line
Nominal Yes Yes No
Ordinal Yes Yes No
Interval/ratio Yes (if grouped) Yes (if grouped) Yes
SCM300 Survey Design
Statistical Representation
• Measures of central tendency• Measures of dispersion• Normal distribution & skew
SCM300 Survey Design
Measures of Central Tendency
• Raw data can be confusing & meaningless• Measures of central tendency
– AKA measures of location or average– Present the data in 1 single number
• 3 different measures depend on intention or data– See next slide
SCM300 Survey Design
Measure Definition Data
Mode Most commonly occurring value in a data set Misleading if an extreme value & may be multiple modes (bimodal distribution)
Any
Median Central value representing central point of a data setWhen there is an even set of values you take the two middle values and find the mid-point between them. Extremes don’t distort it but data has to be in order from lowest to highest in order to calculate it.
Ordinal or interval/ratio
Mean Average value in a data setAdvantage is that it uses all values in a data set. Disadvantage is that it can only be used with interval/ratio data and when there are few values in the data set, it can be distorted by extremes.
Interval/ratio
Measures of Central Tendency
SCM300 Survey Design
ExampleAge of students
Mean 22
Median 20
Mode 19
19 20 36 19 19 24 37 20 21 20
19 19 19 19 20 25 20 26 20 19
19 19 19 19 20 19 24 25 20 20
26 25 19 20 19 18 19 28 22 19
SCM300 Survey Design
Measures of Dispersion
• Measures of central tendency don’t show:– How closely related values are (i.e. clustered)– How representative they are of the data set– The range of values– The degree of distortion by extreme values
Salaries of office staff at HiMolde Airways:· £11k, £15k, £15k, £18k, £25K, £30k, £32k, £38kSalaries of office staff at HiMolde Airport:· £20k, £21k, £22k, £23k, £23k, £24k, £25k, £26kMean salary at HiMolde Airways = £23k (£184k/8)Mean salary at HiMolde Airport = £23k (£184k/8)
SCM300 Survey Design
Range
• Simplest & crudest measure of dispersion• Indicates spread of data
– Places values in ascending order– Then subtracts smallest from the largest value
• Extreme values affect (determine) the outcome• Range gives a greater insight into a data set
– But gives no indication of the clustering of individual values
SCM300 Survey Design
Range
Salaries of office staff at HiMolde Airways:· - £11k, £15k, £15k, £18k, £25K, £30k, £32k, £38k Salaries of office staff at HiMolde Airport:· - £20k, £21k, £22k, £23k, £23k, £24k, £25k, £26k Range of salaries at HiMolde Airways = £38k - £11k = £27k Range of salaries at HiMolde Airport = £26k - £20k = £6k
SCM300 Survey Design
Inter-Quartile Range
• Most appropriate when using ordinal data• Divides values into 4 equal parts (quartiles)
– Is an extension of the idea of the median
• Represents the middle 50% of the values that fall between the 1st & 3rd quartiles
• Not affected by extremes– BUT doesn’t utilise all values
• It discards 50% of the values & therefore provides a limited picture of the degree of clustering
SCM300 Survey Design
Min. value Q1 Q2 Q3 Max. value
Inter-Quartile Range
1st 25% cases
4th 25% cases
3rd 25% cases
2nd 25% cases
Median value
Inter-Quartile Range
SCM300 Survey Design
Salaries of office staff at HiMolde Airways:· - £11k, £15k, £15k, £18k, £25K, £30k, £32k, £38k Salaries of office staff at HiMolde Airport:· - £20k, £21k, £22k, £23k, £23k, £24k, £25k, £26k IQ Range of salaries at HiMolde Airways = £15-£31 IQ Range of salaries at HiMolde Airport = £22-£24
Inter-Quartile Range
SCM300 Survey Design
Standard Deviation
• Widely used in quantitative research• Most useful measure of dispersion• Utilises all data in the distribution• Compares each value in the distribution with the mean
– It examines the variance of the data around the mean– Therefore saying something about how representative the
mean is for the data set
SCM300 Survey Design
• Smaller SD = less variation– i.e. data is more concentrated around the mean– Greater SD = greater variation
• However– Size of SD is in part a reflection of the size of the mean
• So a large SD may simply be the product of a large mean
• Because of this, both figures should be quoted
• Extreme numbers can distort the outcome– BUT have less of an impact than when using the range
Standard Deviation
SCM300 Survey Design
Salaries of office staff at HiMolde Airways:· - £11k, £15k, £15k, £18k, £25K, £30k, £32k, £38k Salaries of office staff at HiMolde Airport:· - £20k, £21k, £22k, £23k, £23k, £24k, £25k, £26k Standard deviation of salaries at HiMolde Airways = 10 Standard deviation of salaries at HiMolde Airport = 2
Standard Deviation
SCM300 Survey Design
Central Tendency & Dispersion Summary
Nominal Ordinal Interval/Ratio
Example Male/Female 1st/2nd/3rd Temperature
Central tendency
Mode Median Mean
Dispersion N/a Inter-quartile range
Standard deviation
SCM300 Survey Design
Normal Distribution
• Normal if– Mean, median & mode coincide– Distribution is the same either side of the central values
• e.g. see next slide
• Often referred to as a bell-shaped curve– 50% of the cases can be found either side of the central value– Values tend to be clustered around the mean
• i.e. very few extreme values
SCM300 Survey Design
Normal Distribution
• A normal distribution has certain properties– 68% of cases fall within 1 SD either side of the mean– 95% within 2 SDs– 99% within 3 SDs
• e.g. see next slide
• Other % values can be calculated using statistical tables– Found in some statistics books
• Normal distribution is important for sampling & hypothesis testing– Many statistical tests assume data will be normally distributed
SCM300 Survey Design
68.26%
95.44%
99.7%
-3sd -2sd -1sd Mean +1sd +2sd +3sd
• Normal distribution is an ‘ideal’ type of distribution• However, it is unlikely that data sets will be normal• When they are not normal, they are ‘skewed’
SCM300 Survey Design
Skew
• +ve skew– Data set has a few very large values
• i.e. most values cluster to the left
– The mean will be larger than the median
• -ve skew– Data set has a few very small values
• i.e. most values cluster to the right
– The mean will be smaller than the median
SCM300 Survey Design
Skew
• Skew is typically found where– Sample sizes are small– Bias has been introduced in the sampling process
• Skewed distributions can be determined– Visually using a histogram– Statistically by calculating a co-efficient of skewness (sk)
SCM300 Survey Design
Co-efficient of Skewness
• Indicates the direction of the skew (+ve or –ve)• Greater co-efficient = greater skew• Normal distribution will have a co-efficient of 0
3(Mean – Median)sk = ---------------------------- Standard Deviation
SCM300 Survey Design
Summary
• Pictorial representation of quantitative data– 3 (4) levels of measurement
• Nominal• Ordinal• Interval / ratio
– Range of pictorial representation available• Choice is determined by the level of measurement
SCM300 Survey Design
Summary
• Statistical representation of quantitative data– 3 measures of central tendency
• Mean, median, mode• Choice is determined by the level of measurement
– 3 measures of dispersion• Range, inter-quartile range, SD• Choice is determined by the level of measurement
– Normal distribution & skew represent the distribution of responses
SCM300 Survey Design
Recommended Reading
• Chapter 1-3 in Gaur, A.S. and Gaur, S.S. (2006). Statistical Methods for Practice and Research: A Guide to Data Analysis Using SPSS. New Delhi: Response Books.