ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.
-
Upload
nicholas-bradford -
Category
Documents
-
view
224 -
download
1
Transcript of ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.
ESSnet on Datawarehousing - the business register
Pieter Vlag – Statistics Netherlands
2
Outline of the presentation
• DataWareHouse and importance population frame
• relationship population frame - business register
- (default) target population, statistical units
• other crucial datasources: “backbones”
- turnover + employment
• datalinking : the statistical unit base
• conflicting information between datasources
- when correcting in statistical DWH
- when correcting in backbones
- when feedback to business register
ESSnet DWH – business register
3
Definition of a statistical Datawarehouse (according to the FPA)
ESSnet DWH – business register
The broad definition of a data warehouse to be used in this ESSnet is therefore:
‘A common conceptual model for managing all available data of interest, enabling the NSI to (re)use this data to create new data/new outputs, to produce the necessary information and perform reporting and analysis, regardless of the data’s source.’
4
A DataWarehouse: the general idea
ESSnet DWH – business registerAs staging area is “core business” for NSIs, term statistical DWH is used for staging area + WareHouse
5
The statistical DataWarehouse: architecture and layers
ESSnet DWH – business register
6
The statistical DataWareHouse:processing steps the GSBPM model
ESSnet DWH – business register
process
input DWH / int. data
5.1a: link data5.1b: integrate datasee presentation Fursova
Calculate aggregates
Titel van de presentatie 7datasource 1 datasource 2 datasource 3
Output 1 Output 2 Output 3
Linking
Processing (integration layer)
Integrated data
p.analysep.analyse
4
GSBPM-step
5.1
5.2-
5.6
5.7
6
7
8
A datawarehouse without population frame
ESSnet DWH – business register
Datasource I:Admin data
Datasource I:Survey 1
Datasource I:Survey 2
Datasource I:BIG DATA
• different sources cover different enterprises -> information about ?• timing of availability sources differs -> when complete desc. available ?
9
A Datawarehouse with a population frame
ESSnet DWH – business register
Pop
ulat
ion.
Dat
asou
rce
1: a
dmin
dat
a 1
Dat
asou
rce
2: B
IG D
ATA
Dat
asou
rce
3: s
urve
y 1
Dat
asou
rce
4:
surv
ey 2
ADVANTAGE:the coverage of DWH is known (e.g. which enterprises are included in a DWH)
10
Units and target population
The population should be known for the
• datawarehouse; e.g. “about which enterprises info”
• its preparation phase ; e.g. when linking data sources
Challenges are:
• units may differ between the data sources
- decision: which unit used for linking
• what is the reference population- decision: how is the default target population defined
ESSnet DWH – business register
11
Proposals
• Only statistical unit (=enterprise) is used
- for data-linking
- in processing phase of the statistical - DWH
- justification: most obvious, ESSnet on Consistency, maintenance
• Default target population :
all enterprises with economic activity in reference period (e.g. year)
- justification: SBS-regulation
- widest definition of enterprises from which flexible outputs for subpopulations can be derived
- term default is used: as subpopulations do have a target population, tooESSnet DWH – business register
Titel van de presentatie 124
GSBPM-step
5.1
5.2-
5.6
5.7
6
7
Linked data
Integrated data
Processing on stat. unit + default target population only
flexible datasources with different populations and units
Weighting to flexible pop.
flexible output for different populations, and units
13
Population frame and the Business Register
Determination of the default target population in SDWH in 2 steps:
• the population frame, i.e. a list of enterprises with a certain kind of activity during a period.
• confirmation which enterprises of the list really performed economic activities during a period
The business register provides information for the population frame.
Therefore, the statistical Business Register is an indirect datasource for the statistical-DWH
ESSnet DWH – business register
14
Information needed from stat.business register
Recommended information for the population frame :• the frame reference year
• the statistical enterprises unit, including national ID and EGR ID
• the name and address of the enterprise
• the national identification number (ID) of the enterprise
• the date in population (mm/yr)
• the date out of population (mm/yr)
• the NACE-code
• the institutional sector code
• a size class
ESSnet DWH – business register
15
Other backbones
ESSnet AdminData: VAT and social security admin • almost complete for quarter and annual
• can be used for high-quality estimates for turnover + employment respectively.
ESSnet DWH: VAT and social security data are crucial• to confirm the activity status of enterprises
implictly to determine the default target population
• to integrate data suitable for flexible outputs
measurement errors are reduced of sample survey (or data about subpopulation) if weighting to pop.numbers + VAT-turnover + employment
Proposal: to include these admin data as backbones in a stat-DWH
ESSnet DWH – business register
16
Source layer
Int. + Analyses layer
Access layer
Integration layer
SBR
Pop-framedata 1 data 2
VAT empl.
GSBPM 5.1: link & integrate
GSBPM 5.2-5.6: “process”
GSBPM 5.7-5.8: calculate aggregates
Check processing
GSBPM 6: analyse / “DATAWAREHOUSE”
GSBPM 7-9: disseminate
Backbones in a statistical-DWH
Backbones are crucial for data-linking and data-integration;-> need to be checked/cleaned by source in the source layer
17
Source layer
Int. + Analyses layerIntegration layer
SBR
Pop-framedata 1 data 2
VAT
empl.
GSBPM 5.1: link & integrate
GSBPM 5.2-5.6: “process”
GSBPM 5.7-5.8: calculate aggregates
Check processing
GSBPM 6: analyse
GSBPM 7-9: disseminate
Observed: admin data incorporated in BR
When choosing this option, - important part of linking process outside the S-DWH- unless S-DWH integral part of S-DWH (maintenance ?)
18
Determining default target population
ESSnet DWH – business register
If statistical-DWH covers annual statistics only
• relatively straightforward
- derive population frame from business register at the end of reference year t
- determine active or non-active as soon as VAT and/or employment data become available
If STS included in statistical-DWH more complicated:
- updating necessary !
19
Updating population
ESSnet DWH – business registernov aug
jan apr jul okt jan apr jul
FATS survey population year T
FATS frame population of active
units year T
Undercoverage
Overcoverage
In both frame populations
FATS frame population of active
units year T (revised)
Frame error procedure
novRelease 3
julRelease 4
decRelease 1
julRelease 2
20SBR
Pop-frame
data 1 data 2
VAT empl.
GSBPM 5.1: link & integrate
GSBPM 5.2-5.6: “process”
GSBPM 5.7-5.8: calculate aggregates
Check processing
“DATAWAREHOUSE”
The largest enterprises
L.E.
output 1output 2
output 3
If a team within a NSI produces consistent microdata for largest enterprises-> consider this source as backbone
21
Units: ideal situation
ESSnet DWH – business register
• enterprise has a unique ID
• enterprise group has a unique ID
enterprise and enterprise group
• correspond with statistical definitions
• are used in all data sources
In practice more complex situations do exist (especially when using more admin data)
Titel van de presentatie 224
GSBPM-step
5.1
5.2-
5.6
5.7
6
7
Linked data
Integrated data
processing on one unit + one population only
flexible datasources with different population and units
Flexible output for different populations, and units
Key question: how to manage these different in- and output units and their relationships to the statistical unit
ESSnet DWH – business register 23
ENTERPRISE(=statistical unit)
ENTERPRISE GROUP
Legal unit
Legal unit
“Accountìng” unit
“Accountìng” unit
“VAT-unit”
other units
“other tax”units
enterprise
Enterprise
Local unit
LKAU
KAU
Enterprise group
INPUT IN S-DWHprocessing
OUTPUT
24
The unit base
ESSnet DWH – business register
Some remarks:
• Complexity of unit base depends on
- scope of statistical-DWH
- national legislation (practices) with respect to enterprise units
• Unit base closely related to Business Register. Main motivation to place this base outside the Business registers
- more flexible in case of new in- and outputs
- more transparent in case of linking errors
25SBR
Pop-frame
VAT empl.
GSBPM 5.1: link & integrate
GSBPM 5.2-5.6: “process”
GSBPM 5.7-5.8: calculate aggregates
Check processing
“DATAWAREHOUSE”
Position of Business Register in stat -DWH
L.E.
output 1output 2
output 3
survey
units
tax BIG DATA
other
26
Feedback to Business Register
ESSnet DWH – business register
In case of conflicting information between datasources and conclusion is influential error in backbones (and indirectly SBR)
• When incorporating corrections in statistical DWH ?
• When incorporating corrections in backbones ?
• When incorporating corrections in SBR?
27SBR
Pop-frame
survey otherunits
VAT empl.
GSBPM 5.1: link & integrate
GSBPM 5.1-5.6: “process”
GSBPM 5.7-5.8: calculate aggregates
Check processing
L.E.
Correction of information
In SDWH: corrections at 5.6
In backbones themselves: timing most important revisions
In SBR: after end of year (for consistency) – exception major impact
“DATAWAREHOUSE”
output 1output 2
output 3
28
Conclusions
ESSnet DWH – business register
Requirements for statistical-DWH
• Population well defined
• Use of one unit in processing
Backbones desired for
• populations, VAT-turnover, admin data employment, large enterprises
Business Register is indirect input for statistical DWH
• population frame, unit base, survey
Timing of corrections errors (backbone information)
• in DWH: before weighting
• in backbone: when revising
• in Business Register: end of year