ISCTSC Workshop A7 Best Practices in Data Fusion.

9
ISCTSC Workshop A7 Best Practices in Data Fusion

Transcript of ISCTSC Workshop A7 Best Practices in Data Fusion.

Page 1: ISCTSC Workshop A7 Best Practices in Data Fusion.

ISCTSCWorkshop A7

Best Practices in Data Fusion

Page 2: ISCTSC Workshop A7 Best Practices in Data Fusion.

Objectives• Indentify the state of the art and the state of

practice

• Identify key research challenges and opportunities

• Identify tangible ways to accelerate methodological innovation and adoption in practice

Page 3: ISCTSC Workshop A7 Best Practices in Data Fusion.

What exactly is data fusion?• Using more than one data source to estimate a

parameter of interestReal world process

),( TT

The World ‘today’

)(N

Measurement 1 Measurement 2 Measurement n

……)( 1X )( 2X )( nX

Direct measurement

Real world process),( TT

The World ‘today’

)(N

Measurement 1 Measurement 2 Measurement n

……)( 1X )( 2X )( nX

Direct measurement

Page 4: ISCTSC Workshop A7 Best Practices in Data Fusion.

What exactly is data fusion?• Using more than one data source to estimate a

parameter of interestReal world process

),( TT

The World ‘today’

)(N

Measurement 1 Measurement 2 Measurement n

……)( 1X )( 2X )( nX

Direct measurement

Real world process),( TT

The World ‘today’

)(N

Measurement 1 Measurement 2 Measurement n

……)( 1X )( 2X )( nX

Direct measurement

Real world process),( TT

)(N

Measurement 1 Measurement 2 Measurement m

……)( 1Y )( 2Y ( )mY

Indirect measurement

The World ‘today’

Complex interaction with other quantities as captured in existing domain models

Real world process),( TT

)(N

Measurement 1 Measurement 2 Measurement m

……)( 1Y )( 2Y ( )mY

Indirect measurement

The World ‘today’

Complex interaction with other quantities as captured in existing domain models

Page 5: ISCTSC Workshop A7 Best Practices in Data Fusion.

SOP & SOA (1)• There is a long history of data fusion in transport,

but very fragmented

• Examples– Synthetic population generation

– OD matrix updating

– Data enrichment in discrete choice model estimation

– Network state estimation

– Activity pattern feature extraction from trace data

– Use of multiple survey modes

– Activity and time use survey consolidation

– Population exposure modelling

– Public transport (e.g. UK bus) OD matrix estimation

Page 6: ISCTSC Workshop A7 Best Practices in Data Fusion.

Summary: SOP & SOA (2)• Problem types:

– Direct observation by multiple methods • Requires error model

• Does not in general require system process model

– Direct and indirect observation• Requires error model

• Requires additionally a system process model to link indirect observations to parameters of interest

• Methods:– ‘Record linking’ methods (e.g., statistical matching,

data mining, imputation, fuzzy logic)

– Model-based inference (e.g., FIML, filtering, Bayesian inference)

Page 7: ISCTSC Workshop A7 Best Practices in Data Fusion.

Research needs (1)• Enabling research

– Better meta data (survey/data collection process + context) to support informed fusion (specially important in era of web 2.0)

– More professional and disciplined protocols in reporting data treatments in published work

– Better techniques of disclosure management

– Understanding how to make the business case for data fusion

• Benefits - sample size, precision;

• Barriers – perception of ‘made up data’, threat to incumbent data providers

Page 8: ISCTSC Workshop A7 Best Practices in Data Fusion.

Research needs (2)• Methodological research

– Detecting genuinely conflicting information (not fuseable) – a form of specification test

– Better means of validating fused data

– Better methods for modelling the propagation of data and model uncertainty during data fusion – enhance confidence in fused data

– Are deterministic/’mean imputation’ approaches adequate – how seriously do they distort the covariance structure?

– Better re-sampling/Bayesian methods in high dimensions

– Integrate methods from SAE

– Opportunities to reduce respondent burden by split designs and ex-post fusion (a la SP surveys and analysis) – question substitutability

– For record matching, what are the key connecting variables?

Page 9: ISCTSC Workshop A7 Best Practices in Data Fusion.

Research needs (3)• Research infrastructure

– Establish to more consistent and complete taxonomy of data fusion problems, methods, outcomes

– Establish reference datasets and reference ‘cases’