The Application of Data Mining in Health Research
-
date post
19-Oct-2014 -
Category
Documents
-
view
1.279 -
download
0
description
Transcript of The Application of Data Mining in Health Research
![Page 1: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/1.jpg)
The Application of Data Mining
in Health Research
Li Xiaosong, M.D., M.P.H., Ph.D.
Prof. of Biostatistics
School of Public Health
Sichuan University
![Page 2: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/2.jpg)
With the rapid development of the Information In
dustry, great advances have been made in data pr
oduction and collection capacities,however, the co
nflict of “rich data but pool knowledge” is getting i
ncreasingly evident.
It is the “Knowledge discovery in databases” that
cater the demand!
Knowledge discovery in databases ( KDD)
![Page 3: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/3.jpg)
Data tomb? Knowledge
Data mining
![Page 4: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/4.jpg)
Data Mining (DM)
In the face of vast databases, how to discover the hidden but
useful knowledge from data, which can help in the government
and enterprises’ decision-making, so as to get more benefit had
become an important problem to solve
Data KnowledgeDM
Data Mining is the procedure of distilling the unknown but
potentially valuable information and knowledge from plentiful
data which is uncompleted, misty and stochastic
![Page 5: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/5.jpg)
Data Mining: A KDD Process
Data mining--the core of knowledge discovery process
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
![Page 6: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/6.jpg)
Classification of Data Mining technology
Association Rule Mining
Classification and Predicting
Clustering Analysis
Trend Analysis
Patten Analysis
… …
![Page 7: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/7.jpg)
The Application of Data Mining
Banking
Telecom
Economy
Meteorology
Agriculture
Health care
Military
… …
![Page 8: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/8.jpg)
Data Mining applied in health care
The application of data mining in medical and health
researches had prove itself to be effective, showing
great development potentialities
Now data mining has become a key method in
obtaining information in clinical medicine, biomedicine,
pharmacy and public health
![Page 9: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/9.jpg)
Data Mining applied in Clinical Research
Finding the relationship among diseases Searching the rule of disease development and prevalence Disease diagnosis and treatment Summarizing therapeutic effects
e.g. using Bayes classification & decision tree
classification in disease diagnosis
![Page 10: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/10.jpg)
Data Mining applied in Biomedicine
Using Sequence Model Analysis & Similarity Retrieval
to find the gene sequence model for certain kind of diseases
Data Cleaning & Data integration is valuable in the data
integration and database building of gene
Association Rules Analysis can help to discover Gene
crossover and correlation in a Genome
- A powerful tool in DNA analysis!
![Page 11: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/11.jpg)
Data Mining applied in Public Health
Spatio-temporal Data Mining used in infectious diseases
monitoring to search for the epidemic rules and distribution
characteristics of diseases
Using Time Series Analysis, Neural Networks to predict the
incidence and infant mortality rate of infectious diseases
Association Rules to discuss the influencing factors of
diseases and health seeking behavior
Clustering and classification are now widely used in the
decision support system for health insurance
![Page 12: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/12.jpg)
Association Rules aims to find out the relationship among valuables in database, resulting in deferent types of rules
LAD% - The percentage of heat disease caused by left anterior descending coronary artery RCA% - The percentage of heat disease caused by right coronary artery
Table 1 : original data from a research on heart disease
Gender Age Smoker LAD% RCA%
F 52 Y 85 100
M 62 N 80 0
M 75 Y 70 80
M 73 Y 40 99
M 66 N 50 45
… … … … …
e.g. Application of Association Rules in Medical Data Analysis
![Page 13: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/13.jpg)
Results:
Table.2 Medical Association Rules
NO. Rule
1 Gender=M∩Age≥ 70∩ Smoker=Y RCA%≥ 50(40%,100%)
2 Gender=F∩Age<70∩ Smoker=Y LAD%≥ 70(20%,100%)
Rule 1 indicates : 40% of the cases are male, over 70 years old and
have the habit of smoking, the possibility of RCA%≥50% is 100%
Rule 2 indicates : 20% of the cases are female, under 70 years old and
have the habit of smoking, the possibility of LAD%≥70% is 100%
![Page 14: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/14.jpg)
The future application of Data Mining in medical research
Data Mining is based on a series of new data process
technology
Wavelets Analysis
Neural Networks
Genetic Algorithm
Fuzzy Logic Reasoning
![Page 15: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/15.jpg)
Challenges facing Data Mining
The data of medical research are always complicated and
unique in types and structures To integrate the specialized knowledge of both medical and data-
processing staffs Plenty of data and repeatedly practices are needed
Targets:
To form a real useful data mining system for
health research
![Page 16: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/16.jpg)
Backpropagation Neural Networks (BPNNs)
A type of artificial neural network
A way to model highly complex, nonlinear solutions to
classification problems
Useful in classifying health-related phenomena
![Page 17: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/17.jpg)
e.g. Classification of smoking cessation status with BPNN
Classifier performance estimates:
The confidence intervals of Az for both the BPNN and logistic classifiers are narrow
And exceeds random chance (Az=0.5) by at least 25% points
The finding indicates the performance of both classifiers exceeds that of random
chance
* Az: area under receiver operating characteristic curve
![Page 18: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/18.jpg)
e.g. Classification of smoking cessation status with BPNN
The graph illustrates the
estimated true positive fraction
(TPF) at multiple values of the
false positive fraction (FPF)
The areas under the ROC curve
differ at a=0.05
Binormal conventional ROC curves for
BPNN and logistic regression classifiers
* ROC:receiver operating characteristic
![Page 19: The Application of Data Mining in Health Research](https://reader036.fdocuments.us/reader036/viewer/2022082915/5444590db1af9f680a8b4880/html5/thumbnails/19.jpg)
Thank you!