Oracle Fusion Middleware Application Adapter Best Practices Guide
ISCTSC Workshop A7 Best Practices in Data Fusion.
-
Upload
bertina-brown -
Category
Documents
-
view
214 -
download
0
Transcript of ISCTSC Workshop A7 Best Practices in Data Fusion.
![Page 1: ISCTSC Workshop A7 Best Practices in Data Fusion.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bffa1a28abf838cc0779/html5/thumbnails/1.jpg)
ISCTSCWorkshop A7
Best Practices in Data Fusion
![Page 2: ISCTSC Workshop A7 Best Practices in Data Fusion.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bffa1a28abf838cc0779/html5/thumbnails/2.jpg)
Objectives• Indentify the state of the art and the state of
practice
• Identify key research challenges and opportunities
• Identify tangible ways to accelerate methodological innovation and adoption in practice
![Page 3: ISCTSC Workshop A7 Best Practices in Data Fusion.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bffa1a28abf838cc0779/html5/thumbnails/3.jpg)
What exactly is data fusion?• Using more than one data source to estimate a
parameter of interestReal world process
),( TT
The World ‘today’
)(N
Measurement 1 Measurement 2 Measurement n
……)( 1X )( 2X )( nX
Direct measurement
Real world process),( TT
The World ‘today’
)(N
Measurement 1 Measurement 2 Measurement n
……)( 1X )( 2X )( nX
Direct measurement
![Page 4: ISCTSC Workshop A7 Best Practices in Data Fusion.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bffa1a28abf838cc0779/html5/thumbnails/4.jpg)
What exactly is data fusion?• Using more than one data source to estimate a
parameter of interestReal world process
),( TT
The World ‘today’
)(N
Measurement 1 Measurement 2 Measurement n
……)( 1X )( 2X )( nX
Direct measurement
Real world process),( TT
The World ‘today’
)(N
Measurement 1 Measurement 2 Measurement n
……)( 1X )( 2X )( nX
Direct measurement
Real world process),( TT
)(N
Measurement 1 Measurement 2 Measurement m
……)( 1Y )( 2Y ( )mY
Indirect measurement
The World ‘today’
Complex interaction with other quantities as captured in existing domain models
Real world process),( TT
)(N
Measurement 1 Measurement 2 Measurement m
……)( 1Y )( 2Y ( )mY
Indirect measurement
The World ‘today’
Complex interaction with other quantities as captured in existing domain models
![Page 5: ISCTSC Workshop A7 Best Practices in Data Fusion.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bffa1a28abf838cc0779/html5/thumbnails/5.jpg)
SOP & SOA (1)• There is a long history of data fusion in transport,
but very fragmented
• Examples– Synthetic population generation
– OD matrix updating
– Data enrichment in discrete choice model estimation
– Network state estimation
– Activity pattern feature extraction from trace data
– Use of multiple survey modes
– Activity and time use survey consolidation
– Population exposure modelling
– Public transport (e.g. UK bus) OD matrix estimation
![Page 6: ISCTSC Workshop A7 Best Practices in Data Fusion.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bffa1a28abf838cc0779/html5/thumbnails/6.jpg)
Summary: SOP & SOA (2)• Problem types:
– Direct observation by multiple methods • Requires error model
• Does not in general require system process model
– Direct and indirect observation• Requires error model
• Requires additionally a system process model to link indirect observations to parameters of interest
• Methods:– ‘Record linking’ methods (e.g., statistical matching,
data mining, imputation, fuzzy logic)
– Model-based inference (e.g., FIML, filtering, Bayesian inference)
![Page 7: ISCTSC Workshop A7 Best Practices in Data Fusion.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bffa1a28abf838cc0779/html5/thumbnails/7.jpg)
Research needs (1)• Enabling research
– Better meta data (survey/data collection process + context) to support informed fusion (specially important in era of web 2.0)
– More professional and disciplined protocols in reporting data treatments in published work
– Better techniques of disclosure management
– Understanding how to make the business case for data fusion
• Benefits - sample size, precision;
• Barriers – perception of ‘made up data’, threat to incumbent data providers
![Page 8: ISCTSC Workshop A7 Best Practices in Data Fusion.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bffa1a28abf838cc0779/html5/thumbnails/8.jpg)
Research needs (2)• Methodological research
– Detecting genuinely conflicting information (not fuseable) – a form of specification test
– Better means of validating fused data
– Better methods for modelling the propagation of data and model uncertainty during data fusion – enhance confidence in fused data
– Are deterministic/’mean imputation’ approaches adequate – how seriously do they distort the covariance structure?
– Better re-sampling/Bayesian methods in high dimensions
– Integrate methods from SAE
– Opportunities to reduce respondent burden by split designs and ex-post fusion (a la SP surveys and analysis) – question substitutability
– For record matching, what are the key connecting variables?
![Page 9: ISCTSC Workshop A7 Best Practices in Data Fusion.](https://reader036.fdocuments.us/reader036/viewer/2022082713/5697bffa1a28abf838cc0779/html5/thumbnails/9.jpg)
Research needs (3)• Research infrastructure
– Establish to more consistent and complete taxonomy of data fusion problems, methods, outcomes
– Establish reference datasets and reference ‘cases’