Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology...

21
Data mining for Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer Science University of Copenhagen

Transcript of Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology...

Page 1: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Data mining for Dummies

Melanie Ganz-Benjaminsen Assistant Professor

Neurobiology Research UnitCopenhagen University Hospital/Rigshospitalet

Department of Computer ScienceUniversity of Copenhagen

Page 2: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

MSc in Physics PhD in CSPostDocin USA

PostDocat RH

Asst. Prof. at DIKU

Who am I?

Page 3: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Page 4: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Data mining

• process used to extract usable data from a larger set of “raw” data

• greatly exceeds the average data analysis you can do manually

Page 5: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Data science

From http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

Page 6: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Machine learning

• think of machine learning as a means of building models of data

• mathematical models that help understand the data

• “learning” since there are parameters in the model that get tuned based on the available data

Page 7: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Machine learning

• Supervised learning:

– Classification

– Regression

• Unsupervised learning:

– Dimensionality reduction

– Clustering

• Semi-supervised learning

Page 8: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Classification: Predicting discrete labels

Page 9: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Classification: Predicting discrete labels

Page 10: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Classification: Predicting discrete labels

Page 11: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Contextualization

People who suffered a stroke

Healthy controls

Page 12: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Models of existing data

Page 13: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Prediction on new data

Categorize/make risk profiles for new patients

Page 14: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Too easy?

• benefit of the machine learning approach is that it can generalize to much larger datasets in many more dimensions!

• More dimensions? -> more variables e.g. gender, family history, etc.

Page 15: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Real example - clustering

Images taken from Beliveau et al., JNS (2017)

Page 16: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Real example - clustering

K = 7 K = 18

Images courtesy of Vincent Beliveau

Page 17: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Recap Machine learning

• is building mathematical models to help describe the relation between “input” and “output” data

– input can be age and blood pressure and output stroke status

– or input can be 5-dimensional serotonin data at ca. 10.000 vertexes of cortex and output the number of regions I want to cluster the cortical data in

• BUT mathematical models can be limited and need to be appropriate for your data

Page 18: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Bottom line?

High dimensional clinical and epidemiological data & statistical models with computer power

behind them (aka machine learning)

Personalized medicine ?

Page 19: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

KU Artificial Intelligence centre

• The Data Science Laboratory (DSL) acts as the entrance for researchers and students to the AI Centre.

• Its overall aim is to enhance the quality of data analyses in research carried out at SCIENCE.

Page 20: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

Thank for your attention!

Questions?

Page 21: Data mining for Dummies - DMSK · Dummies Melanie Ganz-Benjaminsen Assistant Professor Neurobiology Research Unit Copenhagen University Hospital/Rigshospitalet Department of Computer

Melanie Ganz-Benjaminsen, PhD

NRU, Copenhagen University Hospital, Rigshospitalet

References

• Brown, M.S., 2014. Data mining for dummies. John Wiley & Sons

• https://jakevdp.github.io/PythonDataScienceHandbook/

• http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

• Beliveau, V., Ganz, M., Feng, L., Ozenne, B., Højgaard, L., Fisher, P.M., Svarer, C., Greve, D.N. and Knudsen, G.M., 2017. A high-resolution in vivo atlas of the human brain's serotonin system. Journal of Neuroscience, 37(1), pp.120-128.

• Data Science lab: https://datalab.science.ku.dk/english/