S1: Chapters 2-3 Data: Location and Spread Dr J Frost ([email protected]) Last modified:...

36
S1: Chapters 2-3 Data: Location and Spread Dr J Frost ([email protected]) Last modified: 5 th September 2014

Transcript of S1: Chapters 2-3 Data: Location and Spread Dr J Frost ([email protected]) Last modified:...

Page 1: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

S1: Chapters 2-3Data: Location and SpreadDr J Frost ([email protected])

Last modified: 5th September 2014

Page 2: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Types of variables

𝑥In statistics, we can use a variable to represent some quantity, e.g. height, age.This could be qualitative (e.g. favourite colour) or quantitative (i.e. numerical).

Variables are often used differently in statistics than they are in algebra.

In statistics, this would mean:“Sum over the values of the variable we’re collected (i.e. our data).”

2 types of variable:

Discrete variables

Has specific values.e.g. Shoe size, colour, website visits in an hour period, number of siblings, …

Continuous variables

Can have any value in a range.e.g. Height, distance, weight, time, wavelength, …

? ?

Page 3: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

5th ?

Quartiles for large numbers of items

What item do we use for each quartile when

LQ Median UQ

31 8th 16th 24th

19 10th 15th

6 3rd and 4th 2nd 5th

14 7th and 8th 4th 11th

? ? ?

? ?

? ? ?

? ? ?

Under what circumstances do we not round?

When we have a grouped frequency table involving a continuous variable.?

Rule: Find or or of . Then:• If not whole, round up.• If whole, use this item and one after.

Page 4: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

3

2.51.5 3.5

1.5 2 4.5

2 3.5 5

21? ? ?

? ? ?

? ? ?

? ? ?

Quickfire Quartiles

1, 2, 3

LQ Median UQ

1, 2, 3, 4

1, 2, 3, 4, 5

1, 2, 3, 4, 5, 6

Page 5: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Notation for quartiles/percentiles

Lower Quartile: 𝑄1

Median: 𝑄2

Upper Quartile: 𝑄3

57th Percentile: 𝑃57

?

?

?

?

Page 6: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Grouped Frequency Data Recap

Estimate of Mean:

What does the variable represent?

Why the ‘bar’ (horizontal line) over the ?

Why is our mean just an estimate?

The midpoints of each interval. They‘re effectively a sensible single value used to represent each interval.

It’s the sample mean of . It indicates that our mean is just based on a sample, rather than the whole population.

Because we don’t know the exact heights within each group. Grouping data loses information.

? ? ?

?

?

?

Height of bear (in metres) Frequency

This type of data is continuous.?

Page 7: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Grouped Frequency Data Recap

Height of bear (in metres) Frequency

Modal class interval:

Median class interval:

0.5≤h<1.2(‘modal’ means ‘most’)

0.5≤h<1.2There are 40 items, so determine where 20th item is.

?

?

Page 8: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Using STATS mode on your calculatorHeight of bear (in metres) Frequency

1. Go to SETUP (SHIFT MODE). Press down for the second page of menu, and select STAT. You want Frequency ‘ON’. (Note that you won’t have to do this again in future)

2. MODE STAT3. Select 1-VAR (as there is only “1 variable” here – frequency is not a variable!)4. Enter your x values, pressing = after each one. Navigate to the top of your table to

enter your frequencies.5. Press AC to ‘bank’ your table.6. SHIFT 1 for ‘STAT’. Select each ‘Sum’ or ‘Var’. Once you’ve selected a statistic to use,

it’ll appear in your calculation. Once you want to calculate the value, press =. Try entering . (For this example: 1.16875)

7. MODE COMP to go back to normal computation mode.

Warning: You still need to show working in the exam.

Work out the mean for this example first using proper workings.

Important note: Confusingly, your calculator means when you enter . And , i.e. it’s interpreting the data as if it was listed out with duplicated.

Page 9: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Weight of cat to nearest kg Frequency

What’s different about the intervals here?

10−12

There are GAPS between intervals!What interval does this actually represent?

9 .5−12.5Lower class boundary Upper class boundary

Class width = 3

?

?

Page 10: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Identify the class width

Distance travelled (in m) …

Class width = 10?

Time taken (in seconds) …

Weight in kg … Speed (in mph) …

Lower class boundary = 200?Class width = 3 ?

Lower class boundary = 3.5?

Class width = 2 ?Lower class boundary = 29?

Class width = 10?Lower class boundary = 30.5?

Page 11: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

InterpolationS2 – Chapters 2/3

Page 12: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

RECAP: Quartiles of Frequency Table

Age of squirrel Frequency Cumulative Freq

1 5 5

2 8 13

3 11 24

4 5 29

?29 squirrels. So look at 8th squirrel.Occurs within second group, so

so use 15th squirrel.Occurs in third group, so

so use 22nd squirrel.Still in third group, so

?

?

?

?

?

Page 13: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Estimating the median

Answer = 13.5 + 8 = 21.5?

GCSE Question

Page 14: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Estimating the medianAt GCSE, you were only required to give the median class interval when dealing with grouped data. Now, we want to estimate a value within that class interval.

Weight of cat to nearest kg Frequency

Median ?

9 181115.5kg ? 18.5kg

Frequency up until this interval

Frequency at end of this interval

Item number we’re interested in.

Weight at start of interval.

Weight at end of interval.

? ? ???

(Why not the 11.5 item?)

Page 15: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Estimating other values

Weight of cat to nearest kg Frequency

LQ ?

UQ ?

34th Percentile ?

Page 16: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

You should have a sheet in front of you

years1a

1b years

1c years

2a cm

2b cm

2c cm

1d Interquartile Range: years

?

?

?

?

?

?

?

Page 17: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Exercises

Page 34 Exercise 3A Q4, 5, 6

Page 36 Exercise 3BQ1, 3, 5

Page 18: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Variance and Standard DeviationS2 – Chapters 2/3

Page 19: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

What is variance?

𝐼𝑄

𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦

110𝐼𝑄

𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦

110

Distribution of IQs in L6Ms4 Distribution of IQs in L6Ms5

Here are the distribution of IQs in two classes. What’s the same, and what’s different?

Page 20: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

VarianceVariance is how spread out data is.Variance, by definition, is the average squared distance from the mean.

𝑥−𝑥()2𝑛𝜎 2=¿

Distance from mean…

Squared distance from mean…

Σ

Average squared distance from mean…

Page 21: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Simpler formula for variance

“The mean of the squares minus the square of the mean (‘msmsm’)”

Variance

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=Σ 𝑥2

𝑛−(Σ 𝑥𝑛 )

2

? ?

Standard Deviation

𝜎=√𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒The standard deviation can ‘roughly’ be thought of as the average distance from the mean.

Page 22: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

StarterCalculate the variance and standard deviation of the following heights:

2cm 3cm 3cm 5cm 7cm

Variance cm

Standard Deviation cm

?

?

Page 23: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

PracticeFind the variance and standard deviation of the following sets of data.

24 6Variance = Standard Deviation =

12345Variance = Standard Deviation =

? ?

??

Page 24: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Extending to frequency/grouped frequency tables

We can just mull over our mnemonic again:

Variance: “The mean of the squares minus the square of the means (‘msmsm’)”

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=Σ 𝑓 𝑥2

Σ 𝑓−( Σ 𝑓𝑥Σ 𝑓 )

2

? ?

Bro Tip: It’s better to try and memorise the mnemonic than the formula itself – you’ll understand what’s going on better, and the mnemonic will be applicable when we come onto random variables in Chapter 8.

Page 25: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Example

Height of bear (in metres) Frequency

Σ 𝑓𝑥=46.75 Σ 𝑓 𝑥2=67.81 Σ 𝑓=40

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=67.8140

−( 46.7540 )2

=0.33

? ??

?

Page 26: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Sometimes we’re helpfully given summed data:

Shoe Size Frequency

Σ 𝑓𝑡=252 Σ 𝑓 𝑡 2=2914 Σ 𝑓=22

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒=291422

−( 25222 )2

=1.25?

Page 27: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Exercises

Page 40 Exercise 3CQ1, 2, 4, 6

Page 44 Exercise 3D Q1, 4, 5

Page 28: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Recap

?

?

?

?

Page 29: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

CodingS2 – Chapters 2/3

Page 30: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Starter

What do you reckon is the mean height of people in this room?

Now, stand on your chair, as per the instructions below.

INSTRUCTIONAL VIDEO

Is there an easy way to recalculate the mean based on your new heights? And the variance of your heights?

Page 31: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Starter

Suppose now after a bout of ‘stretching you to your limits’, you’re now all 3 times your original height.

What do you think happens to the standard deviation of your heights?

It becomes 3 times larger (i.e. your heights are 3 times as spread out!)?

What do you think happens to the variance of your heights?

It becomes 9 times larger ?

(Can you prove the latter using the formula for variance?)

Page 32: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

The point of coding

Cost of diamond ring (£)£1010 £1020 £1030 £1040 £1050

𝑦=𝑥−100010

Standard deviation of ():therefore…

Standard deviation of ():

?

?

We ‘code’ our variable using the following:

New values :

£1 £2 £3 £4 £5?

Page 33: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Finding the new mean/variance

Old mean Old variance Coding New mean New variance

364 𝑦=𝑥−20164364𝑦=2 𝑥7216354 𝑦=3 𝑥−208536406 𝑦=

𝑥22032

1127 𝑦=𝑥+10373

300125𝑦=𝑥−1005 405

? ?

? ?

? ?

? ?

? ?

? ?

Page 34: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Exercises

Page 26 Exercise 2E Q3, 4

Page 47 Exercise 3EQ2, 3, 5, 7

Page 35: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Chapters 2-3 Summary

I have a list of 30 heights in the class. What item do I use for:

• ? 8th • ? Between 15th and 16th • ? 23rd

For the following grouped frequency table, calculate:Height of bear (in metres) Frequency

h=(0.25×4 )+ (0.85×20 )+…

40= 46.75

40=1.17𝑚𝑡𝑜3 𝑠𝑓a) The estimate mean:

b) The estimate median: 0.5+( 1620 ×0.7 )=1.06𝑚c) The estimate variance:(you’re given ) 𝜎 2=67.8125

40−( 46.7540 )

2

=0.329 𝑡𝑜3 𝑠𝑓

?

?

?

???

Page 36: S1: Chapters 2-3 Data: Location and Spread Dr J Frost (jfrost@tiffin.kingston.sch.uk) Last modified: 5 th September 2014.

Chapters 2-3 SummaryWhat is the standard deviation of the following lengths: 1cm, 2cm, 3cm

The mean of a variable is 11 and the variance .The variable is coded using . What is:

a) The mean of ?b) The variance of ?

A variable is coded using .For this new variable , the mean is 15 and the standard deviation 8.What is:

c) The mean of the original data?d) The standard deviation of the original data?

?

??

??