11 Time Series in Official Stats: Statistical Thinking and Communication about Variation over Time...
-
Upload
kory-logan-snow -
Category
Documents
-
view
215 -
download
0
description
Transcript of 11 Time Series in Official Stats: Statistical Thinking and Communication about Variation over Time...
11
Time Series in Official Stats:Statistical Thinking and Communication
about Variation over Time
STOR 481: 14 Oct 2015
Emma Mawby & Sonya McGlone: Statistics New [email protected] [email protected]
22
Contents: Green: activities, on paper, to discuss1. Introduction2. Two fascinating series, and reflections on them3. What are TS, and what do you do with them?
What do OS people do: filtering and seasonal adjustment Electronic card transactions
4. iNZight: smart new software(Break)
5. Births per quarter, and the Poisson distribution6. Assignment 5 Time Series questions7. The challenges and opportunities in Official Stats8. Summary: signal and noise
and, if we have time:9. Big ideas in Time Series and Official Stats10. Earnings and OSS issues11. Term Test (Richard Arnold)
A 1 minute challenge
• Write down all the ways that you have ever accessed outputs produced by Statistics New Zealand.
• e.g. looked at a media release on http://www.stats.govt.nz/
3
Some tweets about Official Statistics
4
55
1: Introduction: Aims:1. The world of ‘Official Stats’ time series
(essential, exhilarating, accessible)2. iNZight
3. Apply statistical thinking and communication skills to variation in time series
4. Access and enjoy Assignment 5 questions
CO2 at Baring Head (Wellington)
Model fitted by linear regression:y = 1.4749x - 2584.7R2 = 0.9956
320
330
340
350
360
370
380
1973 1978 1983 1988 1993 1998 2003
Learning objectives for STOR 481:1. key aspects of Official Statistics
2. legal and ethical constraints on organisations producing Official Statistics
3. principal methods for data collection, analysis and interpretation of health, social and economic data, including spatial data
4. methods for presenting and preparing commentaries on Official Statistics
7
Resources (1)• Statistics New Zealand homepage: http://www.stats.govt.nz/
• Stories about data: eg: Labour Market Statistics http://www.stats.govt.nz/browse_for_stats/income-and-work/employment_and_unemployment/LabourMarketStatistics_HOTPJun15qtr.aspx
• Data:NZ.Stat
http://nzdotstat.stats.govt.nz/wbos/Index.aspx
Infoshare: (to eventually be replaced by NZ.Stat)http://www.stats.govt.nz/infoshare/
Demonstration:http://www2.stats.govt.nz/domino/external/web/aboutsnz.nsf/htmldocs/Seasonal+decomposition+demonstration
Background on TS and Seasonal Adjustment:http://www.stats.govt.nz/surveys_and_methods/methods/data-analysis/seasonal-adjustment.aspx
Software: iNZight and its time series modulehttp://www.stat.auckland.ac.nz/~wild/iNZight/
Resources (2)Demonstration:
http://www2.stats.govt.nz/domino/external/web/aboutsnz.nsf/htmldocs/Seasonal+decomposition+demonstration
Background on TS and Seasonal Adjustment:http://www.stats.govt.nz/surveys_and_methods/methods/data-analysis/seasonal-adjustment.aspx
Software: iNZight and its time series modulehttp://www.stat.auckland.ac.nz/~wild/iNZight/
8
9
2. What is Statistics all about?One answer is:
_ _ _ _ _ t _ _ n
10
What is Statistics all about?One answer is:
Variation
Which occurs:in estimates from samplesacross timeacross a population or sample
That’s us today
Where does ‘variation’ arise in OS?What does it look like?
Nov 2011
11
Cross-sectional data:Income (NZIS)
Series data:Guest nights: back packers
Inference:Income: 100 means: SuperSURF
0
100,000
200,000
300,000
400,000
500,000
600,000
1996
M07
1997
M07
1998
M07
1999
M07
2000
M07
2001
M07
2002
M07
2003
M07
2004
M07
2005
M07
2006
M07
2007
M07
2008
M07
2009
M07
2010
M07
2011
M07
Actual
Seas Adj
Trend
12
Official Stats and Time Series:
Official Stats
Stats
Admin data
Time Series stats
Graph of my happiness score for Tuesday 13th October 2015
13
score versus timesc
ore
time
0 5 10 15 20 25
24
68
10
1.1 What has happened over these 27 years? 1.2 How does this data get collected? Does it have sampling error?1.3 Why are there high values in some of the Q1’s ( first quarters)?1.4 What are these series going to do next?1.5 What are these ‘Quarter’ things that official stats folk are so keen on?
14
Activity 1: Two fascinating series Unemployment rates, quarterly, Male and Female
1986 Q1 to 2013 Q2
0.0
2.0
4.0
6.0
8.0
10.0
12.0
14.0
1986Q1 1989Q1 1992Q1 1995Q1 1998Q1 2001Q1 2004Q1 2007Q1 2010Q1 2013Q1
UnempRateMale
UnempRateFemale
Yes Unemployment Rate does have SE: (published from 1990 Q2, (found via resampling: jacknife)
Male and Female Unemploymet Rates + - Sampling Errors
0
2
4
6
8
10
12
14
1986Q1 1989Q1 1992Q1 1995Q1 1998Q1 2001Q1 2004Q1 2007Q1 2010Q1 2013Q1
M-SEM+SEF-SEF+SE
http://asq.org/quality-progress/2008/07/statistics-roundtable/statistics-roundtable-the-trusty-jackknife.html
16
3: What are time series?...
A time series is a statistical record of a particular social or economic activity, with the data usually measured at regular intervals over a period of time.
… and what do we do with them?Time series are analysed to:• understand the past • predict the future
• A time series analysis quantifies the main features in the data (the “signal”) and the random variation (the “noise”)
17
18
So what are TS?? EG:http://www.stats.govt.nz/infoshare/SelectVariables.aspx?pxID=d294d46d-0a80-4628-8095-b1273a8186c5
1878 - 1957 -2010 Use with caution
YearRecorded offences
Resolved offences
1957 81,998 49,4731958 85,153 54,5921959 88,071 52,9941960 102,792 66,8571961 96,384 56,1701962 115,921 62,0141963 113,942 66,9921964 118,422 71,9141965 132,311 73,2941966 135,374 77,4651967 139,737 79,4091968 149,103 85,0251969 153,914 88,7731970 165,859 94,7851971 177,924 91,3011972 189,283 96,6251973 192,079 98,7781974 206,115 101,5931975 223,362 105,3891976 232,376 109,9371977 243,619 104,9821978 245,640 88,1101979 257,922 99,1211980 286,789 107,2351981 294,015 108,5571982 309,843 114,8571983 336,155 122,9761984 347,453 125,4261985 370,844 120,7141986 376,558 124,4611987 368,712 125,5271988 378,122 129,8221989 384,928 128,3301990 409,747 124,9841991 446,417 133,4411992 464,596 141,3011993 462,536 162,8541994 447,525 171,4531995 465,052 170,6491996 477,596 175,7511997 473,547 176,2991998 461,677 175,1761999 438,074 170,2992000 427,230 177,0342001 426,526 179,0072002 440,129 184,4652003 442,489 192,5402004 406,363 181,3442005 407,496 176,3622006 424,137 185,2272007 426,384 194,7682008 431,383 201,4192009 451,405 215,6182010 426,345 202,545
Police recorded crime
So what do we do with it now???
1878 - 1957 -2010 Use with caution
YearRecorded offences
Resolved offences
1957 81,998 49,4731958 85,153 54,5921959 88,071 52,9941960 102,792 66,8571961 96,384 56,1701962 115,921 62,014
Police recorded crime
19
Crime data: Obvious things to do:
0
100,000
200,000
300,000
400,000
500,000
1957 1967 1977 1987 1997 2007
Recorded offences
Resolved offences
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
1957 1967 1977 1987 1997 2007
Percent Resolved
Graph the series: Divide, to get Percent Resolved:
And divide by Population to getRates (offences per person) (from 1991)
20
Components of a time seriesThe actual values of a time series are made up of the following components:•Trend •Long term cycle•Seasonal component•Irregular component
We assume that some relationship exists between them. It is either multiplicative: A = C x S x I or additive: A = C + S + I
21
22
Filtering, seasonal adjustment and decompositionStatistics New Zealand time series tend to be either:• The “actual” series• Seasonally adjusted series – with regular seasonal
component removed• Trend series – just the trend cycle component
Activity 2: A monthly series, filtered and seasonally adjusted (by Stats NZ):
2.1 Describe the features of the variation in debit card transactions2.2 Why does Stats NZ publish the Seasonally Adjusted series?2.3 Imagine that you own a business that receives mainly debit card transactions, and get
StatsNZ’s latest info release. Of the three series (Actual, Seasonally adjusted, Trend), which might you use and why?2.4 What do you expect to happen next in the series?
23
1500
2000
2500
3000
3500
4000
2004 2006 2008 2010 2012 2014 2016
Time
Time series plot for Debit
$million
4: iNZight: an intro: 8 slideshttp://www.stat.auckland.ac.nz/~wild/iNZight/
24
Get some data
TS, and other goodies, here
TS here
26
Our unemployment TS
Use
Ignore
Results:
4
6
8
10
1985 1990 1995 2000 2005 2010 2015
Time
Time series plot for Total.Both.Sexes
Decomposition:
Seasonal features:
For 2 or more series:
Use Multi-Plot
Results: multiplicative
5: Activity 3: Births per quarter
32
3.1 What do you think the two series (male births, female births) look like? What features might they have? Sketch your guesses in.3.2 Can you think of a sensible way to model this? Which distribution would be appropriate? Assumptions?
Births per quarter actual data
33
Births per quarter and the Poisson distribution
34
If we assume the number of male or female births per quarter is Poisson with lambda = 7,077, then the two births series would look like this:
Births per Quarter, Poisson, 1976 Q1 to 2013 Q2
5000
6000
7000
8000
9000
1976Q1 1981Q1 1986Q1 1991Q1 1996Q1 2001Q1 2006Q1 2011Q1
BirthsPoisson
6. Four slides:STOR 481: 2015: Assignment 5: Time Series questionsshortened version:
Note: Assignment 5 will include questions from the Data Visualisation, Time Series and Macroeconomic Statistics lectures
Please install iNZight: http://www.stat.auckland.ac.nz/~wild/iNZight/, and try its Time Series option. You’ll find this under the Advanced tab. In iNZight’s Data folder, you’ll find times series datasets for practice.
To use a time series dataset from Infoshare (from the Statistics NZ website) in iNZight, you need to simplify it so that it contains only simple headings and the columns of data, and then save it as a csv file.
35
STOR 481: 2015: Assignment 5 Time Series questions,shortened version:
3: Number of Guest nights from the Accommodation SurveyThe Accommodation Survey consists of several series describing the number of guest nights spent in different types of accommodation in New Zealand. These series are found in the Industry sectors section of the Statistics New Zealand website: www.stats.govt.nz.
Statistics NZ Home > Browse for statistics > Industry sectors > Accommodation
Please read all sections of the “Accommodation Survey: August 2015” release. Also, please examine the second download, which contain tables and components of Accommodation Survey data for the last twelve months. Also, note the short Media Release.
36
STOR 481: 2015: Assignment 5: Time Series: shortened version:
3.1 (12 marks) Choose one accommodation type from Hotels, Motels or Backpackers. Describe the behaviour of the “Number of guest nights” for the period July 1996 to August 2015 for this accommodation type. You’ll need to discuss the usual components of time series and any other feature or features that the number of guest nights shows.
Now describe the behaviour of the “Number of guest nights” for the period July 1996 to August 2015 for “Holiday parks”.
Now describe the differences between the two series.
3.2 (2 marks) Why do you think the series “total excluding holiday parks” is published as well as the series “total”. 3.3 (4 marks) As an Official Statistics agency, Statistics NZ aims to convey information about very complex situations to very wide audiences. Discuss and give examples of the communication methods that Statistics NZ uses to tell the stories that come from Accommodation Survey Statistics
37
End of Assignment 5 slides.
38
More media coverage….
39
http://www.stuff.co.nz/national/politics/5932813/Stats-NZ-anger-at-Labours-bias-claim09/11/11
The end: enjoy the assignment!
40http://xkcd.com/418/
AND IF WE HAVE TIME ….Supplementary slides
41
42
7: Challenges in TS for Official StatsThe monitoring of TS production:
dealing with the unexpectedOutliers:
detecting themfinding causes for themdealing with themassessing their effect on seasonals
Level shiftsrecognising them from noise or seasonal
Trading day and holiday effectsTimeliness
incoming data with a tail (eg tax)publish fast and revise a lot or publish slower and revise a little and …
A fake quarterly series with rogue outliers
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Are they:the new trendthe new seasonaljust one-offs??
Research:Use of ARIMA models to reduce revisions of seasonally adjusted estimatesError bounds on seasonally adjusted series
Implementation of new tools: New seasonal adjustment user interfaceX-13-ARIMA-SEATSSensitivity analysis tool
Collaboration with our Australian counterparts:Cross-centre trainingRegular meetings
43
7: Challenges cont’d: hot issues:
44
7… the Official Statisticians’ TS Dilemma:Do we:A: keep the variable definition the same forever,
and watch it go out of dateB: update the definition and break the seriesC: do something smart: what???
EG: ANZSIC 1996 -> ANZSIC 2006 Aust and NZ Standard Industrial Classification
EG: Employment-related series
45
8: Summary: signal and noiseTime Series in Official Stats:
great opportunities to apply Statistical Thinking to Variationusing: intuition-based concepts and powerful software tools great new data sources: admin and othersfor vitally important issues that are: social environmental economic scientific.
46
9: Big ideas in TS and/or OSS1. Data visualisation and time2. Longitudinal collections3. Admin data4. Integrated data: IDI5. Census linkage
47
9.1 Data visualisation and timeNature gave us
3 spatial dimensions(and R agrees!)
Isn’t there a 4th dimension?(Hans Rosling agrees)http://www.gapminder.org/
That’s all about TS, and mostly international OS TS
A different DV using TS: www.christchurchquakemap.co.nzA dynamic view of the Australian population etc etc etc etc:
http://www.abs.gov.au/websitedbs/d3310114.nsf/home/Population%20Pyramid%20-%20Australiahttp://www.abs.gov.au/websitedbs/D3310114.nsf/home/Interact+with+our+datahttp://betaworks.abs.gov.au/betaworks/betaworks.nsf/projects/dual_pyramid/frame.htm
48
9.3 Longitudinal collections 1:There are 2 sorts of dataset:
1: cross-sectional2: time-series
Is there a third sort??
Yes! And most Official Stats collections are like that!
That’s us today
49
Longitudinal collections 2: EGsSoFIE, with 8 waves (2001 … 2008): 10k people
Survey of Family, Income and EmploymentLISNZ, with 3 waves
Longitudinal Immigration Survey NZIntegrated Data Infrastructure (IDI)
50
Longit. collections 3: millions of TS!
50Fake quarterly earnings series from tax data, for 100 people
51
9.4 Administrative Data:Stats NZ intends to become
an Admin Data First agency.Egs of Admin Data sources:
Tax BenefitsStudent Loans and AllowancesEducational OutcomesMigrationElectronic Card TransactionsRetail barcode scanning Births, deathsand plenty more
Most sources are: longitudinal, administrative, full-coverage, other peoples’. 52
Longitudinal Business database
Person to business
link
Educationsecondary &
tertiary:Ministry of Education
Tax data: Inland
Revenue:
Student loans &
allowances:Inland
Revenue &Ministry of
Social Development
Labour Force, IncomeSurveys
Benefits:Ministry of
Social Development
OutputsRelevant releasesDynamic datasets
Cutting edge cubesRich research
Central Linking Concordance
(CLC)
Migration data:
Department of Labour
9.5 The ultimate integrated system?: Integrated Data Infrastructure: IDI
53
9.6 Census linkageUK’s Office of National Stats (ONS):
In 10-yearly censuses from 1971Australian Bureau of Stats (ABS):
One linked pair 2006-2011Stats NZ:
Five linked pairs, spanning 1981 to 2006
10: Ac 4: Earnings: issues in official stats: sources, quality, measures
4.1 Where do we get data detailed enough for us to use Industry Classification Level 3?4.2 Is it admin data, or data collected for stat purposes? 4.3 Is the collection full-coverage or sampled?4.4 What errors might it have? 4.5 What’s happening to the earnings? Why??4.6 What’s the start date for the LEED data? (Linked Employer Employee Database)4.7 What analyses and transformations would you do to this?4.8 Were you thinking of going into the fungus trade?
Earnings, per Quarter, 1999Q2 to 2012Q1,LEED, ANZIC06 Level 3
0
5,000
10,000
1999Q2 2001Q2 2003Q2 2005Q2 2007Q2 2009Q2 2011Q2
Median: mushroom & veg growersMean: mushroom & veg growersMean: All industriesMedian: All industries
54
iNZight and forecasting: 2 slides:
55
56
57
Comparisons: More to think about:
Mean and Median Earnings: Auckland and NZ: Quarterly: 1999 Q2 to 2007 Q2
5,000
10,000
15,000
00 01 02 03 04 05 06 07
Mean Earnings - AkMedian Earnings - AkMean Earnings - NZMedian Earnings - NZ