Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data...

20
Data Science: Principles, Practice, Potential and Pitfalls Dionisio Acosta Institute of Health Informatics University College London June 2019 Dionisio Acosta (UCL) June 2019 1 / 20

Transcript of Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data...

Page 1: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Data Science: Principles, Practice, Potential andPitfalls

Dionisio Acosta

Institute of Health InformaticsUniversity College London

June 2019

Dionisio Acosta (UCL) June 2019 1 / 20

Page 2: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Outline

Principles what underpins Data Science

Practice showcase of exemplar applications

Pitfalls practices that could lead to over-optimism

Potential current research that addresses fundamental problems

Dionisio Acosta (UCL) June 2019 2 / 20

Page 3: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Principles: what underpins Data Science?

The aim is to address engineering practices that suppportdata-driven healthcare organisations.

Data-driven healthcare and Evidence-based practice (Smith, A.1996)

What are optimal (automated) decisions: What are the predictiontolerance levels? What are the value functions of actions?

We are concerned with supporting human decision making, but byand large we are concerned in how to support data-drivenautomated decision making.

There are degrees of support: visualisation, prediction, decisionmaking, automated (AI) planning.

Dionisio Acosta (UCL) June 2019 3 / 20

Page 4: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Principles: what underpins Data Science?

Data models and platforms (HW&SW): every research questioninduces a data model.

The fundamental problem of model selection: very well specified inStatistics but less understood elsewhere.

Statistics (Computational), Computer Science, Machine Learning,Database Systems, Distributed and High Performance Computing.

Dionisio Acosta (UCL) June 2019 4 / 20

Page 5: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Practice: exemplar applications

Some example applications that provide an overview of the art of thepossible:

Management of chest pain

Predicting emergency admissions

Phamacotherapy management in Parkinson’s Disease

Automatic data quality

Breast cancer treatment selector

Dionisio Acosta (UCL) June 2019 5 / 20

Page 6: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Inspirational Example: Management of Chest Pain

Mean follow-up of 21 ± 5 monthsZacharias, K., et. al. (2017). European Heart Journal-Cardiovascular Imaging, 18 (2), 195-202. doi:10.1093/ehjci/jew049

Dionisio Acosta (UCL) June 2019 6 / 20

Page 7: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Predicting emergency admissions

Error reduction from 20% to 2% using TBATS modelOdera, I. and Acosta, D. (2014)

Dionisio Acosta (UCL) June 2019 7 / 20

Page 8: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Phamacotherapy Management in Parkinson’s Disease

Nguyen, V. et al. (2018) Studies in Health Technology and Informatics, pp. 156–160, 2018.

Dionisio Acosta (UCL) June 2019 8 / 20

Page 9: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Automatic Data Quality Control

Saez, C, et al. (2016) Journal of the American Medical Informatics Association (23), pp1085–1095,

doi:10.1093/jamia/ocw010

Dionisio Acosta (UCL) June 2019 9 / 20

Page 10: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Breast Cancer Treatment Selector

Patkar, V, et al. (2012) BMJ Open 2, 3: e000439. doi:10.1136/bmjopen-2011-000439.

Dionisio Acosta (UCL) June 2019 10 / 20

Page 11: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Pitfalls: practices that could lead to over-optimism

Cross-validation practices

Model selection using AUC and model complexity

Model selection in penalised models a p-values

Model application context

Sample size in penalised regression models

Covariance matrix in high dimension small sample size context

Dionisio Acosta (UCL) June 2019 11 / 20

Page 12: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Cross Validation Practices

Dionisio Acosta (UCL) June 2019 12 / 20

Page 13: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Cross Validation Practices

Dionisio Acosta (UCL) June 2019 13 / 20

Page 14: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Cross Validation Practices

Dionisio Acosta (UCL) June 2019 14 / 20

Page 15: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Sample Size and Penalised Regression

k log p

n→ 0 (1)

Meinshausen, N., Yu, B., 2009. Lasso-type recovery of sparserepresentations for high-dimensional data. The Annals of Statistics37, 246–270.

Wainwright, M.J., 2009. Sharp Thresholds for High-Dimensionaland Noisy Sparsity Recovery Using L1-Constrained QuadraticProgramming (Lasso). IEEE Transactions on Information Theory55, 2183–2202.

Dionisio Acosta (UCL) June 2019 15 / 20

Page 16: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Covariance matrix in high dimemsion small sample size

Consider the effect in graphical methods and PCA

Ledoit, O. et al., (2012) The Annals of Statistics 40(2):1024–60. doi:10.1214/12-AOS989.

Dionisio Acosta (UCL) June 2019 16 / 20

Page 17: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Potential: addressing fundamental problems

Differential Privacy and Distributed Learning (Balcan et al. 2012)

Data-driven phenotypes: Learning longitudinal phenotypes

Dionisio Acosta (UCL) June 2019 17 / 20

Page 18: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Differential Privacy and Distributed Learning

Dionisio Acosta (UCL) June 2019 18 / 20

Page 19: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Learning Longitudinal Phenotypes

Dionisio Acosta (UCL) June 2019 19 / 20

Page 20: Data Science: Principles, Practice, Potential and Pitfalls · Principles: what underpins Data Science? The aim is to address engineering practices that suppport data-driven healthcare

Concluding Remarks

The data scientist reaches out to statisticians, computer scientists,informaticians, etc., to achieve the aim of creating data-drivenorganisations.

Data Science extends the language at my disposal to formulatemodels. The benefit is that I can find succinct ways to model reality.

The challenge is that, like learning a new language, one makesendless mistakes and those mistakes, however small, stick around.

The hope is that new generations are able to see those mistakes,sometimes kindly correct me, but most importantly go out there bythemselves and be able to express reality faithfully.

Dionisio Acosta (UCL) June 2019 20 / 20