GSBPM – a proposed evolution of the model - unece.org · GUARD. GROW. GIVE • Integration •...

23
Telling Canada’s story in numbers GSBPM – a proposed evolution of the model Paul Holness, Senior Analyst Jackey Mayda, Director International Cooperation and Corporate Statistical Methods April 2018

Transcript of GSBPM – a proposed evolution of the model - unece.org · GUARD. GROW. GIVE • Integration •...

Telling Canada’s story in numbers

GSBPM – a proposed evolution of the model

Paul Holness, Senior Analyst Jackey Mayda, Director International Cooperation and Corporate Statistical Methods

April 2018

Evolving data ecosystem

2

Data revolution, ingenuity and innovation

Proliferation of data and data providers

Increased expectations and demand for “real-time” and micro/ detailed data

Rapidly changing and increasingly complex economy and society

Statistical Organizations – Trends • Agency transformation

• Agility, Flexibility, Quality, Efficiency, Relevance • New modes of engagement and delivery –

products, services, partnerships, collaboration, leadership, education

• Provider Partnerships, Administrative and Big Data

• Digital platforms and shared services • Cost optimization

• Advanced methods and tools • Cross-agency, domain, levels of government,

statistics to support new policy, service delivery initiatives

• Locally relevant statistics (small area) for local government service delivery

• Sustainable development goals and world collaboration

• Internal innovation programs • Open data platforms

3

Statistics Canada Modernization Vision Statements Pillar

Vision

User-centric Service Delivery

Users have the information and data they need, when they need it, in the ways they want to access it, with the tools and knowledge to make full use of it.

Leading-edge Methods & Data Integration

Access to new or untapped data; modify the role of surveys; greater reliance on modelling and integration; capacity through R&D environment.

Statistical Capacity Building & Leadership

To be leaders in identifying, building and fostering savvy information and critical analysis skills beyond our own perimeters.

Sharing & Collaboration

Statistics Canada has developed and nurtured strategic, innovative partnerships that allow for the open sharing of data, expertise and best practices. We are proactive, flexible and responsive to partner needs.

Modern Workforce and Flexible Workplace

Have the talent and environment required to fulfill our business needs at the time and be open and nimble to continue to position ourselves for the future.

4

Meta Data - Driven

Statistics Canada's Data Model Vision BUSINESS PROCESS

• Discovery • Data needs • Negotiation • Preliminary files • Ingestion

GATHER • Temporary

Repository • Pre-processing • De-identification • Statistical

Identification • Corporate Repository • Registers

Management

Information

GUARD GROW GIVE

• Integration • Programs • Analysis • Update registers • Direct tabulation

• Dissemination • Open Government

IM Access and Security Rules

Data

Governance This is aligned with architectures such as

UK, Netherlands, Australia

DATA CHARACTERISTICS

STATISTICAL SYSTEM

6

Why do we want to enhance the GSBPM? Explore opportunities where changes in the data ecosystem have exposed gaps and challenges in the existing model • Encompass all activities undertaken in the production of official statistics that

result in data outputs • Applicable to all types of data sources, not just survey data:

• Administrative sources / register-based statistics • Non-survey sources (Big Data, earth observations, sensor data, scanner data) • Mixed sources

• Cover the comprehensive data lifecycle (including data preparation and integration)

• Support multiple input/output streams and data types • Structured, Semi-structured and unstructured

• Built-in data science platform • Support profiling & discovery, visualisation, integration and data analytics and decision

processing • Support data management and data quality • Provide built-in framework for performance measurement • Increase collaboration and promote the use of common statistical production architecture

7

Applying data visualization to the GSBPM A few conventions:

8

New Activity

STC Modernization Objective

Macro Economic Accounts

New

ME

Text Text Change

Continuation

Transition from GSBPM 5.0 • Overview of the data lifecycle

• Collect becomes Acquisition • Process becomes

Data Preparation (with sub processes Profile & Discover and Clean & Transform)

Integration (Join, Link, Model)

9

Data Preparation Integration

Join, Link, Model

GSBPM NL

Specify Needs

10

1.1 Identify Needs

1.2 Consult and confirm

needs

1.3 Establish output

objectives

1.4 Identify concepts &

sensitive data elements

1.5 Check data &

Intelligence availability (Environmental Scan)

& initial input data quality assessment

Specify Needs GSBPM V 5.0 GSBPMProposal Why change?

1.6 Prepare business case

& seek approval

1.4 Identify concepts

1.4 Identify concepts & sensitive data

elements

1.5 Check data availability

1.5 Check data & intelligence

availability & initial data quality

1.6 Prepare business

case

1.6 Prepare business

case & seek approval

Identification & protect sensitive information throughout lifecycle

Check availability, metadata, initial data quality

Text ME Legend: New

Design

11

2.1 Design outputs

2.2 Design variable

descriptions

2.3 Design data input

channels

2.4 Design sample &

target data strategy

2.5 Design processing and

analysis

Profile & Discover Design GSBPM V 5.0 GSBPMProposal Why change?

2.6 Design production

systems and work flow

2.3 Design data

collection

2.3 Design data input

channels

2.4 Design frame &

sample

2.4 Design frame &

sample

Multiple input data sources: survey, admin, streaming, earth observation, sensors, etc.; Greater use of unstructured data Changed underlying text to support alternative data types including survey, admin, web-based etc.

Text ME Legend: New

Build

12

Profile & Discover Build GSBPM V 5.0 GSBPMProposal Why change? 3.1

Build or enhance acquisition instrument

3.2 Build or enhanced

process components

3.3 Build or enhance

dissemination components

3.4 Configure workflows

3.5 Test production

system

3.6 Test statistical

business process

3.7 Finalise production

systems

3.1 Build collection

instrument

3.1 Build or enhance

acquisition instrument

Different sources & types require alternative instruments

Text ME Legend: New

Acquisition

13

4.1 Select sample & target

data

4.2 Set up data acquisition (collection & ingestion)

4.3 Acquire

(collect & ingest) data

4.4 Monitor acquisition, report, visualize &

adjust to support data quality

4.5 Finalise acquisition

(collection & ingestion)

Profile & Discover Acquisition GSBPM V 5.0 GSBPMProposal

4.4 Monitor

acquisition, report, visualize & adjust to support data

quality

This sub-process refers to the monitoring and remediation of the acquisition process towards optimizing the quality of data collection

Why change?

Text ME Legend: New

Data Preparation: Profile & Discover

14

Profiling and Discovery • The analysis of information for use (in a

data warehouse) in order to clarify the structure, content, relationships, and derivation rules of the data

• Profiling helps to not only understand anomalies and assess data quality, but also to discover, register, and assess enterprise metadata

• The result of the analysis is used to determine the suitability of the candidate source systems, usually giving the basis for an early go/no-go decision, and also to identify problems for later solution design

5.1 Procure access to raw

data & intelligence

5.2 Perform profile & overlap analysis

5.3 Locate, classify and mask sensitive data

5.4 Explore matching

variables, features & merge analysis

5.5 Discover & map

transformations from source to target

5.6 Conflict analysis

(Concept, definition, convention)

Feedback to/from data provider

5.7 Build & evaluate data /

statistical models Create unified data

models (MDM)

5.8 Document profile & transformations & prepare treatment

strategy Share program code

5.9 Export data objects

Profile & Discover Profile & Discover

Text ME Legend: New

Data Preparation: Clean & Transform

15

6.7 Derive new variables

& units

6.8 Finalize unified source

data files

6.9 Measure & document

the impact of cleansing &

transformation & lineage

6.1 Standardized attribute

formats

6.2 Parse, tokenize & map attributes to fields or

concepts

6.3 Normalize

abbreviations, honourifics &

stopwords

6.4 Classify & code

attributes

6.5 Review, validate

attributes

6.6 Edit & impute

attributes

Cleans & Transform Clean & Transform

• Convert all letters to lower case • Remove all punctuation marks (avoid if seeking emojis) • Remove all numerals (avoid when mining for quantities) • Remove all extraneous white space • Remove characters within brackets • Replace all numerals with words • Replace abbreviations • Replace contractions • Replace all symbols with words • Remove stop words and uninformative words • Stem words and complete stems to remove empty variation • Phonetic accent representation • Neologisms and portmanteaus • Poor translations or foreign words

Examples of cleansing & transformation

Text ME Legend: New

Data Integration

16

Profile & Discover Integration (Join, Link, Model)

7.7 Calculate aggregates, seasonality, deflation,

benchmarking

7.8 Assess data quality,

balance, adjust & recalculate

7.9 Document & report

methods & outcomes & metadata to Picasso

Match

Data Source 2

Profile & Discover

Cleans & Transform

Data Reduction -Blocking/Index

Field Comparison

Classification

Data Source 1

Possible Match

Unmatched

Staging data

Analytical data

Retrieve Analytical Variables

Assess data quality

Clerical Review

Generic Record Linkage Process

Type of Integration Method Example Identifier

• Transactional Joins Primary/Foreign Key Record Number

• Record Linkage Imperfect identifiers Name, Address, Postal

• Statistical Linkages Statistical & Model-based Matching

Statistical Attributes

7.1 Identify potential

record pairs

7.2 Reduce comparison

space

7.3 Compare & classify

candidate record pairs

7.4 Create new or update

existing integrated datasets

7.5 Assess join & linkage quality & performance

7.6 Calculate weights for

unit data

Text ME Legend: New

Data Analytics & Decision Process

17

This phase is broken down into eight sub-processes, which are generally sequential, from left to right, but can also occur in parallel, and can be iterative. It includes 3 new sub-processes.

8.1 Procuring access to analytical dataset

8.2 Exploratory Data Analysis • Analyzing data sets to summarize their main characteristics, often with visual

methods i.e. Self-service dashboards

8.3 Consists of three distinct data analytics • 8.3a Descriptive analytics or observe

• Uses data aggregation and data mining to provide insight into the past and answer: “What has happened?”: Mean, median, mode etc.

• 8.3b Predictive analytics or predict

• Encompasses statistical techniques ranging from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events

• 8.3c Prescriptive analytics or influence • This sub-process, Prescriptive analytics is the area of business analytics (BA)

dedicated to finding the best course of action for a given situation. Prescriptive analytics is related to both descriptive and predictive analytics.

8.6 Assess the impact of integration on analytical outputs

8.1 Procure access to analytical dataset

8.2 Exploratory data

analysis, visualization, measure, diagnose

8.3 Data analytics

8.4 Validate outputs

Profile & Discover Analytics & Decision Process

8.5 Interpret and explain

outputs

8.6 Assess the impact of

integration on analytical outputs

8.7 Apply disclosure

control

8.8 Finalise outputs

Text ME Legend: New

Disseminate

18

9.1 Update output

systems

9.2 Produce dissemination

products

9.3 Manage release of

dissemination products

9.4 Promote dissemination

products Assess products &

services

9.5 Manage user-support Track and measure quality of interaction with users and link to

prioritization

Profile & Discover Disseminate

Text ME Legend: New

Evaluate

19

10.1 Gather evaluation

inputs

10.2 Conduct evaluation

10.3 Agree to an action

plan

Profile & Discover Evaluate

Text ME Legend: New

20

GSBPM Proposal

Text ME Legend: New

Concluding remarks • We feel the proposed changes to GSBPM

support the current evolution in business process activities • Increases visibility into the data lifecycle • Supports multiple data types • Potentially improves information production time lines by

accelerating data preparation • Promotes standardized delivery of outputs (data,

metadata and code) • Supports activity-based costing by breaking down the

process into appropriate pieces

21

Next steps • Feedback within Statistics Canada has been

quite positive • We are seeking your input on the relevance of

further development of the GSBPM in this way

22