Data Handling Collecting Data

27
Data Handling Collecting Data Learning Outcomes Understand terms: sample, population, discrete, continuous and variable Understand the need for different sampling techniques including random and stratified sampling and be able to generate random numbers with a calculator or computer to obtain a sample Be able to design a questionnaire (taking bias into account) Understand the need for grouping data and the importance of class limits and class boundaries when doing so

description

Data Handling Collecting Data. Learning Outcomes Understand terms: sample, population, discrete, continuous and variable - PowerPoint PPT Presentation

Transcript of Data Handling Collecting Data

Page 1: Data Handling Collecting Data

Data HandlingCollecting Data

Learning Outcomes

Understand terms: sample, population, discrete, continuous and variable

Understand the need for different sampling techniques including random and stratified sampling and be able to generate random numbers with a calculator or computer to obtain a sample

Be able to design a questionnaire (taking bias into account)

Understand the need for grouping data and the importance of class limits and class boundaries when doing so

Page 2: Data Handling Collecting Data

DH - Collecting Data Data Handling

Sample:

A sample is a subset of the population. 11A would be a subset of the

following populations → year 11, senior pupils, pupils of St Mary’s

Population:

The total number of individuals or objects being analyzed; this quantity is

user defined. E.g. pupils in a school, people in a town, people in a postal

code.

Discrete:

A discrete variable is often associated with a count, they can only take

certain values – usually whole numbers.

E.g. number of children in a family, number of cars in a street, number of

people in a class.

Page 3: Data Handling Collecting Data

DH - Collecting Data Data Handling

Continuous:

A continuous variable is often associated with a measurement, they can

take any value in given range.

E.g. height, weight, time.

Variable:

See discrete & continuous above.

Page 4: Data Handling Collecting Data

DH - Collecting Data Data Handling

Random Sampling:

In simple random sampling every member of the population is a given

number. If the population has 100 member , they will each be given a

number between 000 and 999 (inclusive) then 3 digit random numbers are

used to select the sample (ignore repeats)

Stratified Sample:

Often data is collected in sections (strata).

Eg. Number of pupils in a school. In selecting

such a sample data is taken as a proportion of

the total population. Here we should sample

twice as many people in year 10 than in

year 8.

Year No. of Pupils

8 100

9 50

10 200

11 200

12 150

Total 700

Page 5: Data Handling Collecting Data

DH - Collecting Data Data Handling

Stratified Sample:

To obtain as sample of 70 pupils out of the 700, we construct the

following table

YearNo. of Pupils

Proportion of total No. of pupils to be sampled

8 100 100/700 = 1/7100/700 = 1/7 × 70 = 10

9 50 50/700 = 1/14100/700 = 1/14 × 70 = 5

10 200 200/700 = 2/7100/700 = 2/7 × 70 = 20

11 200 200/700 = 2/7100/700 = 2/7 × 70 = 20

12 150 150/700 = 3/14100/700 = 3/14 × 70 = 15

700 70

Page 6: Data Handling Collecting Data

DH - Collecting Data Questionnaires

1. Sample should represent population

2. Sample must be of a reasonable size to represent population

(at least 30) sample mean = population mean

3. Questions should:

i) be as short as possible

ii) use tick boxes

iii) avoid bias

iv) avoid leading questions

Page 7: Data Handling Collecting Data

Additional Notes

Page 8: Data Handling Collecting Data

Data HandlingCollecting Data

Understand terms: sample, population, discrete, continuous and variable

Understand the need for different sampling techniques including random and stratified sampling and be able to generate random numbers with a calculator or computer to obtain a sample

Be able to design a questionnaire (taking bias into account)

Understand the need for grouping data and the importance of class limits and class boundaries

Learning Outcomes:At the end of the topic I will be able to

Can Revise Do Further

Page 9: Data Handling Collecting Data

Data HandlingAnalysing Data

Learning Outcomes

Understand that in order to gain a mental picture of a collection of data it is necessary to obtain a measure of average and range

Be able to determine the mean, median and mode for a set of raw scores and an ungrouped frequency table

Be able to obtain the median and interquartile range for grouped data from a cumulative frequency graph

Understand the advantages and disadvantages of each average and measure of spread

Page 10: Data Handling Collecting Data

DH - Analysing DataMeasures of

Central Tendency

Mean

Sum of all measures divided by total number of measures.

nxx

Mode

Most popular / most frequent occurrence.

everyone included× affected by extremes

× not everyone included not affected by extremes

MedianArrange data in ascending order; the median is the middle measure. Position = ½ (n + 1)

× not everyone included not affected by extremes

Page 11: Data Handling Collecting Data

DH - Analysing DataMeasures of

Central Tendency

Examples

Calculate the Mean, Median and Mode for:

a) 3, 4, 5, 6, 6,

b) 2.4, 2.4, 2.5, 2.6

* Normal distribution is where the mean, median and mode are close eg example b)

Page 12: Data Handling Collecting Data

DH - Analysing Data Frequency Distribution

The number of children in 30 families surveyed are surveyed.

The results are given below.

Calculate

a) The mean number

of children per family

b) The median

(No. of children)

x0 1 2 3 4 5

(No of families)

f4 5 10 6 3 2

Page 13: Data Handling Collecting Data

DH - Analysing DataGrouped Frequency

DistributionOften data is grouped so that patterns and the shape of the distribution can be seen. Group sizes can be the same, although there are no applicable rules.

Find the mean of:

Mark Frequency (f) Midpoint (x) fx

30 – 34 7

40 – 49 14

50 – 59 21

60 – 69 9

∑f = 51

Page 14: Data Handling Collecting Data

DH - Analysing DataCumulative

Frequency CurvesFind the median of the following grouped frequency distribution.

Length FrequencyCumulative Frequency

Upper Limit

21 – 24 3

25 – 28 7

29 – 32 12

33 – 36 6

37 – 40 4

Page 15: Data Handling Collecting Data

DH - Analysing DataCumulative

Frequency Curves

Cum

ula

tive

fre

quen

cy

Upper Limit

Q3

Q2

Q1

Median = Measure of central location

Interquartile range = Measure of spread Q1 = 25th percentile = Q3 – Q1 Q3 = 75th percentile

Q1 = ¼ (n + 1)

Q2 = ½ (n +1)

Q3 = ¾ (n +1)

= 8.25th → 26

= 16.5th → 30

= 24.75th → 33

Interquartile Range = Q3 – Q1

= 33 – 26 = 7

Page 16: Data Handling Collecting Data

DH - Analysing Data Additional Notes

Page 17: Data Handling Collecting Data

Data HandlingAnalysing Data

Learning Outcomes:At the end of the topic I will be able to

Can Revise Do Further

Understand that in order to gain a mental picture of a collection of data it is necessary to obtain a measure of average and range

Be able to determine the mean, median and mode for a set of raw scores and an ungrouped frequency table

Be able to obtain the median and interquartile range for grouped data from a cumulative frequency graph

Understand the advantages and disadvantages of each average and measure of spread

Page 18: Data Handling Collecting Data
Page 19: Data Handling Collecting Data

Data HandlingPresenting Data

Learning Outcomes

Revise drawing of pie charts, line graphs and bar charts

Be able to present data using a stem and leaf diagram, determine mean, Median and quartiles

Be able to draw a boxplot for a set of values and compare more than one box and whisker plots with reference to their average, spread, skewness

Be able to draw a histogram to represent groups with unequal widths

Know which diagram to use to represent data, the advantages and disadvantages of each type.

Be aware of the shape of a normal distribution and understand the concept of skewness

Page 20: Data Handling Collecting Data

DH - Presenting Data Box & Whisker Plots

A box & Whisker plot illustrates:

a) The range of data

b) The median of data

c) The quartiles and interquartile range of data

d) Any indication of skew within the data

Scale

Q1 Q2 Q3

Page 21: Data Handling Collecting Data

DH - Presenting Data Scatter Diagrams

y

x

×

×

×

××

×

× ×

×

y

x

×× ××

×

×

×× ×

Positive Correlationx ▲ y▲

Negative Correlationx ▲ y▼

* The closer the points, the stronger the correlation

y

x

×

×××

×

×

××

×

No Correlationx & y are independent

×

× ×

×

Page 22: Data Handling Collecting Data

DH - Presenting Data Histograms

32 packages were brought to the local post office. The masses of the packages were recorded as follows

Mass (g) 0 < m ≤ 30 30 < m ≤ 40 40 < m ≤ 50 50 < m ≤ 90

No of packages 3 10 12 7

With unequal class widths we draw a histogram.

There are 2 important differences between a bar chart and a histogram

1. In a bar chart the height of the bar represents the frequency.2. In a histogram the ‘x’ axis is a continuous scale.

Page 23: Data Handling Collecting Data

DH - Presenting Data Histograms

Group Frequency Class WidthFrequency

Density

0 < m ≤ 30 3 30

30 < m ≤ 40 10 10

40 < m ≤ 50 12 10

50 < m ≤ 90 7 40

When the classes are of unequal width we calculate and plot frequency density

Frequency Density = Frequency Class Width

Page 24: Data Handling Collecting Data

DH - Presenting Data Stem & Leaf Diagram

When data are grouped to draw a histogram or a cumulative frequency distribution, individual results are lost. The advantage of grouping is that patterns (distribution) can be seen. In a stem and leaf diagram individual results are retained and the spread / distribution of the data can be seen.

Draw a stem and leaf diagram for the data:

10, 11, 12, 15, 23, 26, 29, 32, 33, 34, 35,36, 42, 43, 44, 56, 57

Stem Leaf

1

2

3

4

5

Page 25: Data Handling Collecting Data

DH - Presenting Data Additional Notes

Page 26: Data Handling Collecting Data
Page 27: Data Handling Collecting Data

Data HandlingPresenting Data

Can Revise Do Further

Revise drawing of pie charts, line graphs and bar charts

Be able to present data using a stem and leaf diagram, determine mean, Median and quartiles

Be able to draw a boxplot for a set of values and compare more than one box and whisker plots with reference to their average, spread, skewness

Be able to draw a histogram to represent groups with unequal widths

Know which diagram to use to represent data, the advantages and disadvantages of each type.

Be aware of the shape of a normal distribution and understand the concept of skewness