The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick...
-
Upload
natalie-ball -
Category
Documents
-
view
212 -
download
0
Transcript of The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register Yanick...
The Many Ways of Improving the Industrial Coding for Statistics Canada’s Business Register
Yanick Beaucage
ICES III
June 2007
Overview
Background
Automatic Coding
Manual Coding
Quality Evaluation of Classification Updates
Quality Assurance Survey
Conclusion
Background
STC’s Business Register RedesignImprove administrative data link
Improve treatment of births/deaths
Reflect the businesses reality
Give update privileges to a larger set of people
Develop a quality assurance program
Part of the quality assurance program is ensuring good industrial classification
Background
Good industrial classificationLeads to better population identification
Leads to smaller sample size
Leads to reduced collection cost
Leads to better precision
Prevents frustration from respondents (and interviewers)
Background
BusinessRegister
Statistics Canada
Background
BusinessRegister
Canada Revenue Agency Statistics Canada
Background
BusinessRegister
Canada Revenue Agency
Automatic
Manual
Statistics Canada
Background
BusinessRegister
Updates
Canada Revenue Agency
Automatic
Manual QE
QE
Statistics Canada
Background
BusinessRegister
Updates
Canada Revenue Agency
Automatic
Manual QE
QE
QAS
Statistics Canada
Automatic Coding
New businesses apply for a Business Number (BN) (done at Canada Revenue Agency - CRA)
In person, over the phone, over the internet, ...
What is the description of the main Business activity?
Decision tree tool used by CRA
Prompts for details needed for coding
Returns a robot-phrase to Statistics Canada
Automatic Coding
Assign classification based on robot-phrase
Improving decision tree tool and usageRe-developed on micro (originally mainframe)Expand use for Web BN application (currently used for phone or in person registration)Develop questions for all sectors
Currently used for 75% of all industrial sectorsCovers 90% of all descriptions to be coded
Automatic Coding
Automated Character Text Recognition (ACTR)
If description too general Manual coding
Used to assign classification based on descriptions
Reference file (French and English)
Parsing strategy
Word weighting algorithm
Score derived
Automatic Coding
Improving use of ACTRImprove reference file
Each year new phrases are addedCurrently 7 000 phrases
Study score needed for matchOpening the weighting algorithm
Improve parsing rulesRevisit the rules
Create an environment for testing purposesEvaluate impact of changing input/rules/score
Automatic Coding
40 000 new businesses a month to code
45% are coded using robot-phrases
5% are coded using ACTR
Leaves 20 000 new businesses to codeNeed manual coding
Done at Statistics Canada
Manual Coding
Other units to code manuallySurvey feedback
New operating entity found when profiling
ToolSearch engine for industrial coding
Improve manual codingAdd on-line ACTR or ACTR results
Add decision tree tool
Manual Coding
New businessesGoal: code all of them
Reality: do as many as we can
Result: backlog of businesses to code
Manual Coding
New businessesGoal: code all of them
Reality: do as many as we can
Result: backlog of businesses to code
BusinessRegister
Automatic
Manual
Automatic
CRA May batch
CRA June batch Backlog
Manual
Manual
Manual Coding
Which units should be coded first?First in, first out?Economic activity signal?
Economic activity is determined by administrative data
Both! Select a sample from backlogTake-all (large economic activity)Take-some 1 (economic activity / older units)Take-some 2 (economic activity / newer units)Take-none (no economic activity )
Manual Coding
Prioritize units to code
Can produce under-coverage estimates of the backlog by industrial sector
Ultimate goalImprove automatic coding
80% - 90%?
Code all remaining active units
Quality Evaluation of Classification Updates
Update privileges will be expandedSubject-matter specialists
Collection personnel
Need to evaluate the quality of updatesPrevent systematic errors
Where to focus training
Quality Evaluation of Classification Updates
Two processesNotification and sample selection
1- NotificationSpecialist determines set of enterprise to look at
Every update to targeted enterprise is sent to specialist
Agree/Disagree/Do nothing
Make use of expertise of specialist
Specialists keep up-to-date with their frame
Quality Evaluation of Classification Updates
2- Sample selection and evaluationBased on industry, source of industry, size and complexity of enterpriseRe-code and compare
Minimize respondent input when re-coding
Using notification and sampleProduce error rate for industrial codingTarget specific problems
Quality Assurance Survey
Goal: assess the quality of classification on the BR on an on-going basis
Assess dead/alive status as well
Point in time surveys done in the past1993, 1995, 1997, 2002
Implement a continuous surveyProduce overall results monthly
Produce detailed results combining 12 months
Quality Assurance Survey
StratificationIndustrial sectors
2 or 3 size stratumHave higher sampling fraction for larger size
Recently contactedConsidered to have valid classification
Sample allocationTarget 3.5% standard error for annual industrial classification error rate
550 units a month
Quality Assurance Survey
Currently doing a pilot test
Monthly estimates produced
Yearly estimates based on weighted average of 12 monthly measures
Weighted average based on 1/12
Weighted average based on population ratio over the year (Nm/(N1+...+N12))
Quality Assurance Survey
Survey will be used to Clean-up the register as an independent source
Evaluate industrial in and out-of-scope rate
Evaluate industrial error rate for non-surveyed portion of the register (e.g. small enterprises)
Evaluate death rate in order to adjust sample sizes
Potential useEvaluate frame quality for new surveys
Clean-up part of the register
Conclusion
Classification is essential to the BRRedesign provides an opportunity
To improve codingTo standardize tools used for codingTo measure quality of coding adequatelyTo set-up good practices/good reports
ResultsBetter quality of business survey framesMore efficient surveys
Pour plus d’information, veuillez contacter
For more Information please contact
Visit our web site atwww.statcan.ca
Yanick Beaucage 613-951-4622