Data Handling Collecting Data
-
Upload
axel-townsend -
Category
Documents
-
view
74 -
download
4
description
Transcript of Data Handling Collecting Data
Data HandlingCollecting Data
Learning Outcomes
Understand terms: sample, population, discrete, continuous and variable
Understand the need for different sampling techniques including random and stratified sampling and be able to generate random numbers with a calculator or computer to obtain a sample
Be able to design a questionnaire (taking bias into account)
Understand the need for grouping data and the importance of class limits and class boundaries when doing so
DH - Collecting Data Data Handling
Sample:
A sample is a subset of the population. 11A would be a subset of the
following populations → year 11, senior pupils, pupils of St Mary’s
Population:
The total number of individuals or objects being analyzed; this quantity is
user defined. E.g. pupils in a school, people in a town, people in a postal
code.
Discrete:
A discrete variable is often associated with a count, they can only take
certain values – usually whole numbers.
E.g. number of children in a family, number of cars in a street, number of
people in a class.
DH - Collecting Data Data Handling
Continuous:
A continuous variable is often associated with a measurement, they can
take any value in given range.
E.g. height, weight, time.
Variable:
See discrete & continuous above.
DH - Collecting Data Data Handling
Random Sampling:
In simple random sampling every member of the population is a given
number. If the population has 100 member , they will each be given a
number between 000 and 999 (inclusive) then 3 digit random numbers are
used to select the sample (ignore repeats)
Stratified Sample:
Often data is collected in sections (strata).
Eg. Number of pupils in a school. In selecting
such a sample data is taken as a proportion of
the total population. Here we should sample
twice as many people in year 10 than in
year 8.
Year No. of Pupils
8 100
9 50
10 200
11 200
12 150
Total 700
DH - Collecting Data Data Handling
Stratified Sample:
To obtain as sample of 70 pupils out of the 700, we construct the
following table
YearNo. of Pupils
Proportion of total No. of pupils to be sampled
8 100 100/700 = 1/7100/700 = 1/7 × 70 = 10
9 50 50/700 = 1/14100/700 = 1/14 × 70 = 5
10 200 200/700 = 2/7100/700 = 2/7 × 70 = 20
11 200 200/700 = 2/7100/700 = 2/7 × 70 = 20
12 150 150/700 = 3/14100/700 = 3/14 × 70 = 15
700 70
DH - Collecting Data Questionnaires
1. Sample should represent population
2. Sample must be of a reasonable size to represent population
(at least 30) sample mean = population mean
3. Questions should:
i) be as short as possible
ii) use tick boxes
iii) avoid bias
iv) avoid leading questions
Additional Notes
Data HandlingCollecting Data
Understand terms: sample, population, discrete, continuous and variable
Understand the need for different sampling techniques including random and stratified sampling and be able to generate random numbers with a calculator or computer to obtain a sample
Be able to design a questionnaire (taking bias into account)
Understand the need for grouping data and the importance of class limits and class boundaries
Learning Outcomes:At the end of the topic I will be able to
Can Revise Do Further
Data HandlingAnalysing Data
Learning Outcomes
Understand that in order to gain a mental picture of a collection of data it is necessary to obtain a measure of average and range
Be able to determine the mean, median and mode for a set of raw scores and an ungrouped frequency table
Be able to obtain the median and interquartile range for grouped data from a cumulative frequency graph
Understand the advantages and disadvantages of each average and measure of spread
DH - Analysing DataMeasures of
Central Tendency
Mean
Sum of all measures divided by total number of measures.
nxx
Mode
Most popular / most frequent occurrence.
everyone included× affected by extremes
× not everyone included not affected by extremes
MedianArrange data in ascending order; the median is the middle measure. Position = ½ (n + 1)
× not everyone included not affected by extremes
DH - Analysing DataMeasures of
Central Tendency
Examples
Calculate the Mean, Median and Mode for:
a) 3, 4, 5, 6, 6,
b) 2.4, 2.4, 2.5, 2.6
* Normal distribution is where the mean, median and mode are close eg example b)
DH - Analysing Data Frequency Distribution
The number of children in 30 families surveyed are surveyed.
The results are given below.
Calculate
a) The mean number
of children per family
b) The median
(No. of children)
x0 1 2 3 4 5
(No of families)
f4 5 10 6 3 2
DH - Analysing DataGrouped Frequency
DistributionOften data is grouped so that patterns and the shape of the distribution can be seen. Group sizes can be the same, although there are no applicable rules.
Find the mean of:
Mark Frequency (f) Midpoint (x) fx
30 – 34 7
40 – 49 14
50 – 59 21
60 – 69 9
∑f = 51
DH - Analysing DataCumulative
Frequency CurvesFind the median of the following grouped frequency distribution.
Length FrequencyCumulative Frequency
Upper Limit
21 – 24 3
25 – 28 7
29 – 32 12
33 – 36 6
37 – 40 4
DH - Analysing DataCumulative
Frequency Curves
Cum
ula
tive
fre
quen
cy
Upper Limit
Q3
Q2
Q1
Median = Measure of central location
Interquartile range = Measure of spread Q1 = 25th percentile = Q3 – Q1 Q3 = 75th percentile
Q1 = ¼ (n + 1)
Q2 = ½ (n +1)
Q3 = ¾ (n +1)
= 8.25th → 26
= 16.5th → 30
= 24.75th → 33
Interquartile Range = Q3 – Q1
= 33 – 26 = 7
DH - Analysing Data Additional Notes
Data HandlingAnalysing Data
Learning Outcomes:At the end of the topic I will be able to
Can Revise Do Further
Understand that in order to gain a mental picture of a collection of data it is necessary to obtain a measure of average and range
Be able to determine the mean, median and mode for a set of raw scores and an ungrouped frequency table
Be able to obtain the median and interquartile range for grouped data from a cumulative frequency graph
Understand the advantages and disadvantages of each average and measure of spread
Data HandlingPresenting Data
Learning Outcomes
Revise drawing of pie charts, line graphs and bar charts
Be able to present data using a stem and leaf diagram, determine mean, Median and quartiles
Be able to draw a boxplot for a set of values and compare more than one box and whisker plots with reference to their average, spread, skewness
Be able to draw a histogram to represent groups with unequal widths
Know which diagram to use to represent data, the advantages and disadvantages of each type.
Be aware of the shape of a normal distribution and understand the concept of skewness
DH - Presenting Data Box & Whisker Plots
A box & Whisker plot illustrates:
a) The range of data
b) The median of data
c) The quartiles and interquartile range of data
d) Any indication of skew within the data
Scale
Q1 Q2 Q3
DH - Presenting Data Scatter Diagrams
y
x
×
×
×
××
×
× ×
×
y
x
×× ××
×
×
×× ×
Positive Correlationx ▲ y▲
Negative Correlationx ▲ y▼
* The closer the points, the stronger the correlation
y
x
×
×××
×
×
××
×
No Correlationx & y are independent
×
× ×
×
DH - Presenting Data Histograms
32 packages were brought to the local post office. The masses of the packages were recorded as follows
Mass (g) 0 < m ≤ 30 30 < m ≤ 40 40 < m ≤ 50 50 < m ≤ 90
No of packages 3 10 12 7
With unequal class widths we draw a histogram.
There are 2 important differences between a bar chart and a histogram
1. In a bar chart the height of the bar represents the frequency.2. In a histogram the ‘x’ axis is a continuous scale.
DH - Presenting Data Histograms
Group Frequency Class WidthFrequency
Density
0 < m ≤ 30 3 30
30 < m ≤ 40 10 10
40 < m ≤ 50 12 10
50 < m ≤ 90 7 40
When the classes are of unequal width we calculate and plot frequency density
Frequency Density = Frequency Class Width
DH - Presenting Data Stem & Leaf Diagram
When data are grouped to draw a histogram or a cumulative frequency distribution, individual results are lost. The advantage of grouping is that patterns (distribution) can be seen. In a stem and leaf diagram individual results are retained and the spread / distribution of the data can be seen.
Draw a stem and leaf diagram for the data:
10, 11, 12, 15, 23, 26, 29, 32, 33, 34, 35,36, 42, 43, 44, 56, 57
Stem Leaf
1
2
3
4
5
DH - Presenting Data Additional Notes
Data HandlingPresenting Data
Can Revise Do Further
Revise drawing of pie charts, line graphs and bar charts
Be able to present data using a stem and leaf diagram, determine mean, Median and quartiles
Be able to draw a boxplot for a set of values and compare more than one box and whisker plots with reference to their average, spread, skewness
Be able to draw a histogram to represent groups with unequal widths
Know which diagram to use to represent data, the advantages and disadvantages of each type.
Be aware of the shape of a normal distribution and understand the concept of skewness