Knowledge Engineering from Big Data in Oncology

19
Andre Dekker, PhD Medical Physicist MAASTRO Clinic Knowledge Engineering from Big Data in Oncology

Transcript of Knowledge Engineering from Big Data in Oncology

Page 1: Knowledge Engineering from Big Data in Oncology

Andre Dekker, PhDMedical PhysicistMAASTRO Clinic

Knowledge Engineering from Big Data in Oncology

Page 2: Knowledge Engineering from Big Data in Oncology

2

© MAASTRO 2015

Disclosures

Research collaborations incl. funding / honoraria etc.– Varian (VATE, chinaCAT, euroCAT), Siemens (euroCAT), Sohard (SeDI,

CloudAtlas), Mirada Medical (CloudAtlas), Philips (EURECA, TraIT, SWIFT-RT), Xerox (EURECA), De Praktijkindex (DLRA)

Public research funding– Radiomics (USA-NIH/U01CA143062), euroCAT(EU-Interreg), duCAT (NL-

STW), EURECA (EU-FP7), SeDI & CloudAtlas (EU-EUREKA), TraIT (NL-CTMM), DLRA (NL-NVRO)

Spin-offs and commercial ventures– MAASTRO Innovations B.V. (CSO)– Various patents on medical machine learning

Page 3: Knowledge Engineering from Big Data in Oncology

3

© MAASTRO 2015

Do we know which tulip will be pink or yellow?

http://www.amystewart.com

Page 4: Knowledge Engineering from Big Data in Oncology

4

© MAASTRO 2015

Do we know which tulip will be pink or yellow?

1.00

AUC

0.72

0.50

Page 5: Knowledge Engineering from Big Data in Oncology

5

© MAASTRO 2015

Do we know which patient is likely to survive?

AUC1.00

AUC0.50

AUC0.72

Page 6: Knowledge Engineering from Big Data in Oncology

6

© MAASTRO 2015

Testing predictions by MDs

Lung cancer2 year survival158 patients5 MDsProspectiveAUC: 0.56

Oberije et al. Kruger et al. 1999

Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence leads to inflated self-assessments. J Pers Soc Psych

Page 7: Knowledge Engineering from Big Data in Oncology

7

© MAASTRO 2015

The doctor is drowning

• Explosion of data• Explosion of decisions• Explosion of ‘evidence’*

• 3 % in trials, bias• Sharp knife

*2010: 1574 & 1354 articles on lung cancer & radiotherapy = 7.5 per dayHalf-life of knowledge estimated at 7 years (in young students)

Source: J Clin Oncol 2010;28:4268

Source: JMI 2012 Friedman, Rigby

Page 8: Knowledge Engineering from Big Data in Oncology

8

© MAASTRO 2015

Main Opportunity of Big Data Driven Medicine : Rapid Learning Health Care / Precision Medicine / Predict outcome in an individual

In [..] rapid-learning [..] data routinely generated through patient care and clinical research feed into an ever-growing [..] set of coordinated databases. J Clin Oncol 2010;28:4268

[..] rapid learning [..] where we can learn from each patient to guide practice, is [..] crucial to guide rational health policy and to contain costs [..].Lancet Oncol 2011;12:933

Examples: Radiotherapy CAT (www.eurocat.info) ASCO’s CancerLinQ

Source: J Clin Oncol 2010;28:4268

Page 9: Knowledge Engineering from Big Data in Oncology

9

© MAASTRO 2015

Why would we want to predict outcome in an individual patient?

If you can’t predict outcomes

Doctor/Patient perspective• you can’t inform and involve your patient properly• you might not make the right decision of treatment

A over treatment B

Quality perspective• you can’t know if your treatments are given the

predicted outcome

Innovation perspective• you can’t determine which patient (group) we need

to innovate in

Source: www.predictcancer.org (MAASTRO)

Source: www.lifemath.net (MGH)

Page 10: Knowledge Engineering from Big Data in Oncology

10

© MAASTRO 2015

Big data in Oncology

Oncology2005-2015140M patients100k hospitals1-10GB per patient140-1400PB80% unstructured

Source: Cancer Research UK

Source: Institute for Health Technology Transformation

Page 11: Knowledge Engineering from Big Data in Oncology

11

© MAASTRO 2015

Main challenge of using Big Data and Outcomes Research in Oncology

• You need to learn from other patients to predict the outcome of a new patient

• These data are spread out over 100k hospitals

• So we need to share…, challenges:• Administrative (I don’t have the

time)• Political (I don’t want to )• Ethical (I am not allowed)• Technical (I can’t)

Oncology2005-2015140M patients100k hospitals1-10GB per patient140-1400PB80% unstructured

[..] the problem is not really technical […]. Rather, the problems are ethical, political, and administrative. Lancet Oncol 2011;12:933

Page 12: Knowledge Engineering from Big Data in Oncology

12

© MAASTRO 2015

The ‘standard’ approach

• Sharing standardized, highly curated data from clinical research programs

• Very useful, but only 3% of patients (if that)• Worries about privacy, loss of control, limited amount of

features, limited reusability, a lot of work

Page 13: Knowledge Engineering from Big Data in Oncology

13

© MAASTRO 2015

A different approach

If sharing is the problem: Don’t share the data

If you can’t bring the data to the learning applicationYou have to bring the learning application to the data

Consequences• The learning application has to be distributed • The data has to be readable by an application (i.e. not a human)

• Solution: Sharing standardized highly curated research data• Solution: Not-sharing non-standardized non-curated clinical data

Page 14: Knowledge Engineering from Big Data in Oncology

14

© MAASTRO 2015

Distributed Learning

See youtube: https://www.youtube.com/watch?v=ZDJFOxpwqEA

Page 15: Knowledge Engineering from Big Data in Oncology

15

© MAASTRO 2015

euroCAT, duCAT, chinaCAT, ozCAT, VATE, ukCAT, dkCAT, worldCAT, BIONIC Network

Industry Partners

Active or funded CAT partners (19)

Prospective centers

2

5

Map from cgadvertising.com

5

Clinical / Academic Partners

Page 16: Knowledge Engineering from Big Data in Oncology

16

© MAASTRO 2015

Does it work ? euroCAT’s example

• Distributed = Centralized (ADMM method, Boyd-Stanford)• Distributed learning better than learning on single center data

• 550 iterations, two hours (centralized < 1 min)

Learn in Validate in AUCAachen (n=7) Liège (n=186) 0.61Eindhoven (n=32) Liège (n=186) 0.72Hasselt (n=45) Liège (n=186) 0.68Maastricht (n=52) Liège (n=186) 0.75All 4 together (n=136) Liège (n=186) 0.77All 5 together (n=322) World (n=inf) ?

Page 17: Knowledge Engineering from Big Data in Oncology

17

© MAASTRO 2015

Summary

Knowledge Engineering from Big Data in Oncology

• The challenge of Big Data in oncology• Is not the size but the distribution• Is imaging and not genomics (for now)

• The aim of Knowledge Engineering is • To predict outcomes better via prediction models• To update these models continuously in rapid learning

Page 18: Knowledge Engineering from Big Data in Oncology

18

© MAASTRO 2015

Acknowledgements

• Varian, Palo Alto, CA, USA• Siemens, Malvern, PA, USA• RTOG, Philadelphia, PA, USA• MAASTRO, Maastricht, Netherlands• Policlinico Gemelli, Roma, Italy• UH Ghent, Belgium• Catherina Zkh Eindhoven, Netherlands• UZ Leuven, Belgium• Radboud, Nijmegen, Netherlands• University of Sydney, Australia

• Liverpool and Macarthur CC, Australia• CHU Liege, Belgium• Uniklinikum Aachen, Germany• LOC Genk/Hasselt, Belgium• Princess Margaret Hospital, Canada• The Christie, Manchester, UK• UH Leuven, Belgium• State Hospital, Rovigo, Italy• Illawarra Shoalhaven CC, Australia • Fudan Cancer Center, Shanghai, China

More info on: www.predictcancer.org www.cancerdata.orgwww.eurocat.info www.mistir.info

Page 19: Knowledge Engineering from Big Data in Oncology

Andre Dekker, PhDMedical PhysicistMAASTRO Clinic

Thank you for your attention

More info on: www.eurocat.info

www.predictcancer.orgwww.cancerdata.org

www.mistir.infowww.maastro.nl