ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

28
ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands

Transcript of ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

Page 2: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

2

Outline of the presentation

• DataWareHouse and importance population frame

• relationship population frame - business register

- (default) target population, statistical units

• other crucial datasources: “backbones”

- turnover + employment

• datalinking : the statistical unit base

• conflicting information between datasources

- when correcting in statistical DWH

- when correcting in backbones

- when feedback to business register

ESSnet DWH – business register

Page 3: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

3

Definition of a statistical Datawarehouse (according to the FPA)

ESSnet DWH – business register

The broad definition of a data warehouse to be used in this ESSnet is therefore:

‘A common conceptual model for managing all available data of interest, enabling the NSI to (re)use this data to create new data/new outputs, to produce the necessary information and perform reporting and analysis, regardless of the data’s source.’

Page 4: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

4

A DataWarehouse: the general idea

ESSnet DWH – business registerAs staging area is “core business” for NSIs, term statistical DWH is used for staging area + WareHouse

Page 5: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

5

The statistical DataWarehouse: architecture and layers

ESSnet DWH – business register

Page 6: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

6

The statistical DataWareHouse:processing steps the GSBPM model

ESSnet DWH – business register

process

input DWH / int. data

5.1a: link data5.1b: integrate datasee presentation Fursova

Calculate aggregates

Page 7: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

Titel van de presentatie 7datasource 1 datasource 2 datasource 3

Output 1 Output 2 Output 3

Linking

Processing (integration layer)

Integrated data

p.analysep.analyse

4

GSBPM-step

5.1

5.2-

5.6

5.7

6

7

Page 8: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

8

A datawarehouse without population frame

ESSnet DWH – business register

Datasource I:Admin data

Datasource I:Survey 1

Datasource I:Survey 2

Datasource I:BIG DATA

• different sources cover different enterprises -> information about ?• timing of availability sources differs -> when complete desc. available ?

Page 9: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

9

A Datawarehouse with a population frame

ESSnet DWH – business register

Pop

ulat

ion.

Dat

asou

rce

1: a

dmin

dat

a 1

Dat

asou

rce

2: B

IG D

ATA

Dat

asou

rce

3: s

urve

y 1

Dat

asou

rce

4:

surv

ey 2

ADVANTAGE:the coverage of DWH is known (e.g. which enterprises are included in a DWH)

Page 10: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

10

Units and target population

The population should be known for the

• datawarehouse; e.g. “about which enterprises info”

• its preparation phase ; e.g. when linking data sources

Challenges are:

• units may differ between the data sources

- decision: which unit used for linking

• what is the reference population- decision: how is the default target population defined

ESSnet DWH – business register

Page 11: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

11

Proposals

• Only statistical unit (=enterprise) is used

- for data-linking

- in processing phase of the statistical - DWH

- justification: most obvious, ESSnet on Consistency, maintenance

• Default target population :

all enterprises with economic activity in reference period (e.g. year)

- justification: SBS-regulation

- widest definition of enterprises from which flexible outputs for subpopulations can be derived

- term default is used: as subpopulations do have a target population, tooESSnet DWH – business register

Page 12: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

Titel van de presentatie 124

GSBPM-step

5.1

5.2-

5.6

5.7

6

7

Linked data

Integrated data

Processing on stat. unit + default target population only

flexible datasources with different populations and units

Weighting to flexible pop.

flexible output for different populations, and units

Page 13: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

13

Population frame and the Business Register

Determination of the default target population in SDWH in 2 steps:

• the population frame, i.e. a list of enterprises with a certain kind of activity during a period.

• confirmation which enterprises of the list really performed economic activities during a period

The business register provides information for the population frame.

Therefore, the statistical Business Register is an indirect datasource for the statistical-DWH

ESSnet DWH – business register

Page 14: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

14

Information needed from stat.business register

Recommended information for the population frame :• the frame reference year

• the statistical enterprises unit, including national ID and EGR ID

• the name and address of the enterprise

• the national identification number (ID) of the enterprise

• the date in population (mm/yr)

• the date out of population (mm/yr)

• the NACE-code

• the institutional sector code

• a size class

ESSnet DWH – business register

Page 15: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

15

Other backbones

ESSnet AdminData: VAT and social security admin • almost complete for quarter and annual

• can be used for high-quality estimates for turnover + employment respectively.

ESSnet DWH: VAT and social security data are crucial• to confirm the activity status of enterprises

implictly to determine the default target population

• to integrate data suitable for flexible outputs

measurement errors are reduced of sample survey (or data about subpopulation) if weighting to pop.numbers + VAT-turnover + employment

Proposal: to include these admin data as backbones in a stat-DWH

ESSnet DWH – business register

Page 16: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

16

Source layer

Int. + Analyses layer

Access layer

Integration layer

SBR

Pop-framedata 1 data 2

VAT empl.

GSBPM 5.1: link & integrate

GSBPM 5.2-5.6: “process”

GSBPM 5.7-5.8: calculate aggregates

Check processing

GSBPM 6: analyse / “DATAWAREHOUSE”

GSBPM 7-9: disseminate

Backbones in a statistical-DWH

Backbones are crucial for data-linking and data-integration;-> need to be checked/cleaned by source in the source layer

Page 17: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

17

Source layer

Int. + Analyses layerIntegration layer

SBR

Pop-framedata 1 data 2

VAT

empl.

GSBPM 5.1: link & integrate

GSBPM 5.2-5.6: “process”

GSBPM 5.7-5.8: calculate aggregates

Check processing

GSBPM 6: analyse

GSBPM 7-9: disseminate

Observed: admin data incorporated in BR

When choosing this option, - important part of linking process outside the S-DWH- unless S-DWH integral part of S-DWH (maintenance ?)

Page 18: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

18

Determining default target population

ESSnet DWH – business register

If statistical-DWH covers annual statistics only

• relatively straightforward

- derive population frame from business register at the end of reference year t

- determine active or non-active as soon as VAT and/or employment data become available

If STS included in statistical-DWH more complicated:

- updating necessary !

Page 19: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

19

Updating population

ESSnet DWH – business registernov aug

jan apr jul okt jan apr jul

FATS survey population year T

FATS frame population of active

units year T

Undercoverage

Overcoverage

In both frame populations

FATS frame population of active

units year T (revised)

Frame error procedure

novRelease 3

julRelease 4

decRelease 1

julRelease 2

Page 20: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

20SBR

Pop-frame

data 1 data 2

VAT empl.

GSBPM 5.1: link & integrate

GSBPM 5.2-5.6: “process”

GSBPM 5.7-5.8: calculate aggregates

Check processing

“DATAWAREHOUSE”

The largest enterprises

L.E.

output 1output 2

output 3

If a team within a NSI produces consistent microdata for largest enterprises-> consider this source as backbone

Page 21: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

21

Units: ideal situation

ESSnet DWH – business register

• enterprise has a unique ID

• enterprise group has a unique ID

enterprise and enterprise group

• correspond with statistical definitions

• are used in all data sources

In practice more complex situations do exist (especially when using more admin data)

Page 22: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

Titel van de presentatie 224

GSBPM-step

5.1

5.2-

5.6

5.7

6

7

Linked data

Integrated data

processing on one unit + one population only

flexible datasources with different population and units

Flexible output for different populations, and units

Key question: how to manage these different in- and output units and their relationships to the statistical unit

Page 23: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

ESSnet DWH – business register 23

ENTERPRISE(=statistical unit)

ENTERPRISE GROUP

Legal unit

Legal unit

“Accountìng” unit

“Accountìng” unit

“VAT-unit”

other units

“other tax”units

enterprise

Enterprise

Local unit

LKAU

KAU

Enterprise group

INPUT IN S-DWHprocessing

OUTPUT

Page 24: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

24

The unit base

ESSnet DWH – business register

Some remarks:

• Complexity of unit base depends on

- scope of statistical-DWH

- national legislation (practices) with respect to enterprise units

• Unit base closely related to Business Register. Main motivation to place this base outside the Business registers

- more flexible in case of new in- and outputs

- more transparent in case of linking errors

Page 25: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

25SBR

Pop-frame

VAT empl.

GSBPM 5.1: link & integrate

GSBPM 5.2-5.6: “process”

GSBPM 5.7-5.8: calculate aggregates

Check processing

“DATAWAREHOUSE”

Position of Business Register in stat -DWH

L.E.

output 1output 2

output 3

survey

units

tax BIG DATA

other

Page 26: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

26

Feedback to Business Register

ESSnet DWH – business register

In case of conflicting information between datasources and conclusion is influential error in backbones (and indirectly SBR)

• When incorporating corrections in statistical DWH ?

• When incorporating corrections in backbones ?

• When incorporating corrections in SBR?

Page 27: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

27SBR

Pop-frame

survey otherunits

VAT empl.

GSBPM 5.1: link & integrate

GSBPM 5.1-5.6: “process”

GSBPM 5.7-5.8: calculate aggregates

Check processing

L.E.

Correction of information

In SDWH: corrections at 5.6

In backbones themselves: timing most important revisions

In SBR: after end of year (for consistency) – exception major impact

“DATAWAREHOUSE”

output 1output 2

output 3

Page 28: ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

28

Conclusions

ESSnet DWH – business register

Requirements for statistical-DWH

• Population well defined

• Use of one unit in processing

Backbones desired for

• populations, VAT-turnover, admin data employment, large enterprises

Business Register is indirect input for statistical DWH

• population frame, unit base, survey

Timing of corrections errors (backbone information)

• in DWH: before weighting

• in backbone: when revising

• in Business Register: end of year