GSBPM – a proposed evolution of the model - unece.org · GUARD. GROW. GIVE • Integration •...
-
Upload
truongkiet -
Category
Documents
-
view
213 -
download
0
Transcript of GSBPM – a proposed evolution of the model - unece.org · GUARD. GROW. GIVE • Integration •...
Telling Canada’s story in numbers
GSBPM – a proposed evolution of the model
Paul Holness, Senior Analyst Jackey Mayda, Director International Cooperation and Corporate Statistical Methods
April 2018
Evolving data ecosystem
2
Data revolution, ingenuity and innovation
Proliferation of data and data providers
Increased expectations and demand for “real-time” and micro/ detailed data
Rapidly changing and increasingly complex economy and society
Statistical Organizations – Trends • Agency transformation
• Agility, Flexibility, Quality, Efficiency, Relevance • New modes of engagement and delivery –
products, services, partnerships, collaboration, leadership, education
• Provider Partnerships, Administrative and Big Data
• Digital platforms and shared services • Cost optimization
• Advanced methods and tools • Cross-agency, domain, levels of government,
statistics to support new policy, service delivery initiatives
• Locally relevant statistics (small area) for local government service delivery
• Sustainable development goals and world collaboration
• Internal innovation programs • Open data platforms
3
Statistics Canada Modernization Vision Statements Pillar
Vision
User-centric Service Delivery
Users have the information and data they need, when they need it, in the ways they want to access it, with the tools and knowledge to make full use of it.
Leading-edge Methods & Data Integration
Access to new or untapped data; modify the role of surveys; greater reliance on modelling and integration; capacity through R&D environment.
Statistical Capacity Building & Leadership
To be leaders in identifying, building and fostering savvy information and critical analysis skills beyond our own perimeters.
Sharing & Collaboration
Statistics Canada has developed and nurtured strategic, innovative partnerships that allow for the open sharing of data, expertise and best practices. We are proactive, flexible and responsive to partner needs.
Modern Workforce and Flexible Workplace
Have the talent and environment required to fulfill our business needs at the time and be open and nimble to continue to position ourselves for the future.
4
Meta Data - Driven
Statistics Canada's Data Model Vision BUSINESS PROCESS
• Discovery • Data needs • Negotiation • Preliminary files • Ingestion
GATHER • Temporary
Repository • Pre-processing • De-identification • Statistical
Identification • Corporate Repository • Registers
Management
Information
GUARD GROW GIVE
• Integration • Programs • Analysis • Update registers • Direct tabulation
• Dissemination • Open Government
IM Access and Security Rules
Data
Governance This is aligned with architectures such as
UK, Netherlands, Australia
Why do we want to enhance the GSBPM? Explore opportunities where changes in the data ecosystem have exposed gaps and challenges in the existing model • Encompass all activities undertaken in the production of official statistics that
result in data outputs • Applicable to all types of data sources, not just survey data:
• Administrative sources / register-based statistics • Non-survey sources (Big Data, earth observations, sensor data, scanner data) • Mixed sources
• Cover the comprehensive data lifecycle (including data preparation and integration)
• Support multiple input/output streams and data types • Structured, Semi-structured and unstructured
• Built-in data science platform • Support profiling & discovery, visualisation, integration and data analytics and decision
processing • Support data management and data quality • Provide built-in framework for performance measurement • Increase collaboration and promote the use of common statistical production architecture
7
Applying data visualization to the GSBPM A few conventions:
8
New Activity
STC Modernization Objective
Macro Economic Accounts
New
ME
Text Text Change
Continuation
Transition from GSBPM 5.0 • Overview of the data lifecycle
• Collect becomes Acquisition • Process becomes
Data Preparation (with sub processes Profile & Discover and Clean & Transform)
Integration (Join, Link, Model)
9
Data Preparation Integration
Join, Link, Model
GSBPM NL
Specify Needs
10
1.1 Identify Needs
1.2 Consult and confirm
needs
1.3 Establish output
objectives
1.4 Identify concepts &
sensitive data elements
1.5 Check data &
Intelligence availability (Environmental Scan)
& initial input data quality assessment
Specify Needs GSBPM V 5.0 GSBPMProposal Why change?
1.6 Prepare business case
& seek approval
1.4 Identify concepts
1.4 Identify concepts & sensitive data
elements
1.5 Check data availability
1.5 Check data & intelligence
availability & initial data quality
1.6 Prepare business
case
1.6 Prepare business
case & seek approval
Identification & protect sensitive information throughout lifecycle
Check availability, metadata, initial data quality
Text ME Legend: New
Design
11
2.1 Design outputs
2.2 Design variable
descriptions
2.3 Design data input
channels
2.4 Design sample &
target data strategy
2.5 Design processing and
analysis
Profile & Discover Design GSBPM V 5.0 GSBPMProposal Why change?
2.6 Design production
systems and work flow
2.3 Design data
collection
2.3 Design data input
channels
2.4 Design frame &
sample
2.4 Design frame &
sample
Multiple input data sources: survey, admin, streaming, earth observation, sensors, etc.; Greater use of unstructured data Changed underlying text to support alternative data types including survey, admin, web-based etc.
Text ME Legend: New
Build
12
Profile & Discover Build GSBPM V 5.0 GSBPMProposal Why change? 3.1
Build or enhance acquisition instrument
3.2 Build or enhanced
process components
3.3 Build or enhance
dissemination components
3.4 Configure workflows
3.5 Test production
system
3.6 Test statistical
business process
3.7 Finalise production
systems
3.1 Build collection
instrument
3.1 Build or enhance
acquisition instrument
Different sources & types require alternative instruments
Text ME Legend: New
Acquisition
13
4.1 Select sample & target
data
4.2 Set up data acquisition (collection & ingestion)
4.3 Acquire
(collect & ingest) data
4.4 Monitor acquisition, report, visualize &
adjust to support data quality
4.5 Finalise acquisition
(collection & ingestion)
Profile & Discover Acquisition GSBPM V 5.0 GSBPMProposal
4.4 Monitor
acquisition, report, visualize & adjust to support data
quality
This sub-process refers to the monitoring and remediation of the acquisition process towards optimizing the quality of data collection
Why change?
Text ME Legend: New
Data Preparation: Profile & Discover
14
Profiling and Discovery • The analysis of information for use (in a
data warehouse) in order to clarify the structure, content, relationships, and derivation rules of the data
• Profiling helps to not only understand anomalies and assess data quality, but also to discover, register, and assess enterprise metadata
• The result of the analysis is used to determine the suitability of the candidate source systems, usually giving the basis for an early go/no-go decision, and also to identify problems for later solution design
5.1 Procure access to raw
data & intelligence
5.2 Perform profile & overlap analysis
5.3 Locate, classify and mask sensitive data
5.4 Explore matching
variables, features & merge analysis
5.5 Discover & map
transformations from source to target
5.6 Conflict analysis
(Concept, definition, convention)
Feedback to/from data provider
5.7 Build & evaluate data /
statistical models Create unified data
models (MDM)
5.8 Document profile & transformations & prepare treatment
strategy Share program code
5.9 Export data objects
Profile & Discover Profile & Discover
Text ME Legend: New
Data Preparation: Clean & Transform
15
6.7 Derive new variables
& units
6.8 Finalize unified source
data files
6.9 Measure & document
the impact of cleansing &
transformation & lineage
6.1 Standardized attribute
formats
6.2 Parse, tokenize & map attributes to fields or
concepts
6.3 Normalize
abbreviations, honourifics &
stopwords
6.4 Classify & code
attributes
6.5 Review, validate
attributes
6.6 Edit & impute
attributes
Cleans & Transform Clean & Transform
• Convert all letters to lower case • Remove all punctuation marks (avoid if seeking emojis) • Remove all numerals (avoid when mining for quantities) • Remove all extraneous white space • Remove characters within brackets • Replace all numerals with words • Replace abbreviations • Replace contractions • Replace all symbols with words • Remove stop words and uninformative words • Stem words and complete stems to remove empty variation • Phonetic accent representation • Neologisms and portmanteaus • Poor translations or foreign words
Examples of cleansing & transformation
Text ME Legend: New
Data Integration
16
Profile & Discover Integration (Join, Link, Model)
7.7 Calculate aggregates, seasonality, deflation,
benchmarking
7.8 Assess data quality,
balance, adjust & recalculate
7.9 Document & report
methods & outcomes & metadata to Picasso
Match
Data Source 2
Profile & Discover
Cleans & Transform
Data Reduction -Blocking/Index
Field Comparison
Classification
Data Source 1
Possible Match
Unmatched
Staging data
Analytical data
Retrieve Analytical Variables
Assess data quality
Clerical Review
Generic Record Linkage Process
Type of Integration Method Example Identifier
• Transactional Joins Primary/Foreign Key Record Number
• Record Linkage Imperfect identifiers Name, Address, Postal
• Statistical Linkages Statistical & Model-based Matching
Statistical Attributes
7.1 Identify potential
record pairs
7.2 Reduce comparison
space
7.3 Compare & classify
candidate record pairs
7.4 Create new or update
existing integrated datasets
7.5 Assess join & linkage quality & performance
7.6 Calculate weights for
unit data
Text ME Legend: New
Data Analytics & Decision Process
17
This phase is broken down into eight sub-processes, which are generally sequential, from left to right, but can also occur in parallel, and can be iterative. It includes 3 new sub-processes.
8.1 Procuring access to analytical dataset
8.2 Exploratory Data Analysis • Analyzing data sets to summarize their main characteristics, often with visual
methods i.e. Self-service dashboards
8.3 Consists of three distinct data analytics • 8.3a Descriptive analytics or observe
• Uses data aggregation and data mining to provide insight into the past and answer: “What has happened?”: Mean, median, mode etc.
• 8.3b Predictive analytics or predict
• Encompasses statistical techniques ranging from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events
• 8.3c Prescriptive analytics or influence • This sub-process, Prescriptive analytics is the area of business analytics (BA)
dedicated to finding the best course of action for a given situation. Prescriptive analytics is related to both descriptive and predictive analytics.
8.6 Assess the impact of integration on analytical outputs
8.1 Procure access to analytical dataset
8.2 Exploratory data
analysis, visualization, measure, diagnose
8.3 Data analytics
8.4 Validate outputs
Profile & Discover Analytics & Decision Process
8.5 Interpret and explain
outputs
8.6 Assess the impact of
integration on analytical outputs
8.7 Apply disclosure
control
8.8 Finalise outputs
Text ME Legend: New
Disseminate
18
9.1 Update output
systems
9.2 Produce dissemination
products
9.3 Manage release of
dissemination products
9.4 Promote dissemination
products Assess products &
services
9.5 Manage user-support Track and measure quality of interaction with users and link to
prioritization
Profile & Discover Disseminate
Text ME Legend: New
Evaluate
19
10.1 Gather evaluation
inputs
10.2 Conduct evaluation
10.3 Agree to an action
plan
Profile & Discover Evaluate
Text ME Legend: New
Concluding remarks • We feel the proposed changes to GSBPM
support the current evolution in business process activities • Increases visibility into the data lifecycle • Supports multiple data types • Potentially improves information production time lines by
accelerating data preparation • Promotes standardized delivery of outputs (data,
metadata and code) • Supports activity-based costing by breaking down the
process into appropriate pieces
21
Next steps • Feedback within Statistics Canada has been
quite positive • We are seeking your input on the relevance of
further development of the GSBPM in this way
22
Comments and feedback are welcome
23