Freek bomhof tno

21
DATAPIJPLIJNEN GARBAGE-IN-GARBAGE-OUT OF EEN WASSTRAAT? BIG DATA EXPO, 21 SEPTEMBER 2017 Freek Bomhof

Transcript of Freek bomhof tno

Page 1: Freek bomhof tno

DATAPIJPLIJNEN

GARBAGE-IN-GARBAGE-OUT OF EEN WASSTRAAT?

BIG DATA EXPO, 21 SEPTEMBER 2017

Freek Bomhof

Page 2: Freek bomhof tno

CAN WE KNOW THE TRUTH?

Uncertainties in Big Data

Page 3: Freek bomhof tno

CAN WE KNOW THE TRUTH?

Uncertainties in Big Data

Page 4: Freek bomhof tno

CAN WE KNOW THE TRUTH?

Uncertainties in Big Data

Page 5: Freek bomhof tno

TRAVEL TIME: WHAT IS THE BEST DECISION?

Uncertainties in Big Data

Page 6: Freek bomhof tno

Uncertainties in Big Data

PREDICTION OF INCIDENT DURATION

Background: Travel time prediction is best done using classical

(statistical) methods. Incidents have significant influence on travel

time but are hard to predict

Aim: use deep learning technology to predict the incident duration

Approach: the developed ‘fingerprint’ method is combined with

Deep Learning

Input: Loop data from Dutch highway network (several years)

Result: Incident durations can be predicted with remarkable

accuracy.

Incident: starts when RWS

closes lane; ends when

average speed is >70%

Page 7: Freek bomhof tno

Uncertainties in Big Data

EVIDENCE-BASED YOUTH POLICY

Multi-View Learning (MVL) is a Machine Learning framework that is

expected to be very well suited for creating predictive models.

Unlike standard statistical approaches, algorithms that are formulated within

this framework allow mining large amounts of heterogeneous data from

multiple sources, dealing with noisy and high dimensional tasks,

incorporating partially labelled data (e.g. semi-supervised or transductive

learning setting where annotated data is limited), and are based on

theoretically justified assumptions and error bounds.

Aim: to apply the Multiview Learning network approach to identify main factors

for truancy at schools. This could be the basis for evidence-based policy

Results

Tested on a database with 12 000 data subjects

Results provide some preliminary relations

Applicable to many other aspects (health: obesity; social: drugs abuse)

outcome

Feature selection to predict the effect

of an intervention

Time-resolved MV clustering to track the effect of

interventions over time

Page 8: Freek bomhof tno

Uncertainties in Big Data

Source: Nature, Advances in nowcasting influenza-like

illness rates using search query logs (2015)

Page 9: Freek bomhof tno

Uncertainties in Big Data

Source: tylervigen.com

Page 10: Freek bomhof tno

All models are wrong

But some of them are useful

Page 11: Freek bomhof tno

WHY THESE TOPICS? WHY NOW?

Our algorithms become smarter every day

Cross-domain & multistakeholder data exchange

Complexity is growing

Yet we expect the user to trust the outcomes

Uncertainties in Big Data

XAI

Page 12: Freek bomhof tno

TNO RESEARCH AGENDA

Uncertainties in Big Data

Page 13: Freek bomhof tno

DIVING INTO UNCERTAINTY: CASES

Adaptive Cruise Control

Factors that cause diabetes

Citizens applying for support

Find failing sensors in dikes

Evaluating cross-media effectiveness

ETA prediction container ships

Resource allocation in HD cameras

Hybrid Energy Grids

Intel from aerial observation

Machine 2 machine grids

Naval Mine Detection

Long-term effects in prematurely borns

Uncertainties in Big Data

Better decisions for risk-based Infrastructure

Assets

Youth Policy: school truancy

Detecting events in video

Smart batteries in Smart energy grids

Analysing football games for training

Stability of underground pipelines

Scenario detection for self-driving cars

Assessing safety of proteins in drugs

Predict duration of traffic incidents

Assess the size of internet-related business

Predict the need for municipal social support

Page 14: Freek bomhof tno

Uncertainties in Big Data

the analysis framework is correct and complete

Source: sdxcentral.com

Page 15: Freek bomhof tno

Uncertainties in Big Data

a complete and clear picture of uncertainty is useful

Source: flickr, ELKayPics (CC)

Page 16: Freek bomhof tno

Uncertainties in Big Data

a higher accuracy would be valuable

Source: freegreatpicture.com

Page 17: Freek bomhof tno

Uncertainties in Big Data

tackling uncertainties is

multidisciplinary

Source: Wikimedia commons

Page 18: Freek bomhof tno

Uncertainties in Big Data

PREDICTION OF INCIDENT DURATION

Background: Travel time prediction is best done using classical

(statistical) methods. Incidents have significant influence on travel

time but are hard to predict

Aim: use deep learning technology to predict the incident duration

Approach: the developed ‘fingerprint’ method is combined with

Deep Learning

Input: Loop data from Dutch highway network (several years)

Result: Incident durations can be predicted with remarkable

accuracy.

Incident: starts when RWS

closes lane; ends when

average speed is >70%

Page 19: Freek bomhof tno

A RESEARCH AGENDA

Data quality and representiveness

Including semantic uncertainty

Quantifying objectives

Uncertainty propagation

Model choice

Including robustness

Communicating uncertainty

Uncertainties in Big Data

Page 20: Freek bomhof tno

OM OVER NA TE DENKEN… OOK VOOR DATA?

Uncertainties in Big Data

Page 21: Freek bomhof tno

THANK YOU FOR YOUR

ATTENTION

Take a look: TIME.TNO.NL