Freek bomhof tno
-
Upload
bigdataexpo -
Category
Data & Analytics
-
view
103 -
download
0
Transcript of Freek bomhof tno
DATAPIJPLIJNEN
GARBAGE-IN-GARBAGE-OUT OF EEN WASSTRAAT?
BIG DATA EXPO, 21 SEPTEMBER 2017
Freek Bomhof
CAN WE KNOW THE TRUTH?
Uncertainties in Big Data
CAN WE KNOW THE TRUTH?
Uncertainties in Big Data
CAN WE KNOW THE TRUTH?
Uncertainties in Big Data
TRAVEL TIME: WHAT IS THE BEST DECISION?
Uncertainties in Big Data
Uncertainties in Big Data
PREDICTION OF INCIDENT DURATION
Background: Travel time prediction is best done using classical
(statistical) methods. Incidents have significant influence on travel
time but are hard to predict
Aim: use deep learning technology to predict the incident duration
Approach: the developed ‘fingerprint’ method is combined with
Deep Learning
Input: Loop data from Dutch highway network (several years)
Result: Incident durations can be predicted with remarkable
accuracy.
Incident: starts when RWS
closes lane; ends when
average speed is >70%
Uncertainties in Big Data
EVIDENCE-BASED YOUTH POLICY
Multi-View Learning (MVL) is a Machine Learning framework that is
expected to be very well suited for creating predictive models.
Unlike standard statistical approaches, algorithms that are formulated within
this framework allow mining large amounts of heterogeneous data from
multiple sources, dealing with noisy and high dimensional tasks,
incorporating partially labelled data (e.g. semi-supervised or transductive
learning setting where annotated data is limited), and are based on
theoretically justified assumptions and error bounds.
Aim: to apply the Multiview Learning network approach to identify main factors
for truancy at schools. This could be the basis for evidence-based policy
Results
Tested on a database with 12 000 data subjects
Results provide some preliminary relations
Applicable to many other aspects (health: obesity; social: drugs abuse)
outcome
Feature selection to predict the effect
of an intervention
Time-resolved MV clustering to track the effect of
interventions over time
Uncertainties in Big Data
Source: Nature, Advances in nowcasting influenza-like
illness rates using search query logs (2015)
Uncertainties in Big Data
Source: tylervigen.com
All models are wrong
But some of them are useful
WHY THESE TOPICS? WHY NOW?
Our algorithms become smarter every day
Cross-domain & multistakeholder data exchange
Complexity is growing
Yet we expect the user to trust the outcomes
Uncertainties in Big Data
XAI
TNO RESEARCH AGENDA
Uncertainties in Big Data
DIVING INTO UNCERTAINTY: CASES
Adaptive Cruise Control
Factors that cause diabetes
Citizens applying for support
Find failing sensors in dikes
Evaluating cross-media effectiveness
ETA prediction container ships
Resource allocation in HD cameras
Hybrid Energy Grids
Intel from aerial observation
Machine 2 machine grids
Naval Mine Detection
Long-term effects in prematurely borns
Uncertainties in Big Data
Better decisions for risk-based Infrastructure
Assets
Youth Policy: school truancy
Detecting events in video
Smart batteries in Smart energy grids
Analysing football games for training
Stability of underground pipelines
Scenario detection for self-driving cars
Assessing safety of proteins in drugs
Predict duration of traffic incidents
Assess the size of internet-related business
Predict the need for municipal social support
Uncertainties in Big Data
the analysis framework is correct and complete
Source: sdxcentral.com
Uncertainties in Big Data
a complete and clear picture of uncertainty is useful
Source: flickr, ELKayPics (CC)
Uncertainties in Big Data
a higher accuracy would be valuable
Source: freegreatpicture.com
Uncertainties in Big Data
tackling uncertainties is
multidisciplinary
Source: Wikimedia commons
Uncertainties in Big Data
PREDICTION OF INCIDENT DURATION
Background: Travel time prediction is best done using classical
(statistical) methods. Incidents have significant influence on travel
time but are hard to predict
Aim: use deep learning technology to predict the incident duration
Approach: the developed ‘fingerprint’ method is combined with
Deep Learning
Input: Loop data from Dutch highway network (several years)
Result: Incident durations can be predicted with remarkable
accuracy.
Incident: starts when RWS
closes lane; ends when
average speed is >70%
A RESEARCH AGENDA
Data quality and representiveness
Including semantic uncertainty
Quantifying objectives
Uncertainty propagation
Model choice
Including robustness
Communicating uncertainty
Uncertainties in Big Data
OM OVER NA TE DENKEN… OOK VOOR DATA?
Uncertainties in Big Data
THANK YOU FOR YOUR
ATTENTION
Take a look: TIME.TNO.NL