Analysis and Classification of Respiratory Health Risks with Respect to Air Pollution Levels
-
Upload
wei-yuan-chang -
Category
Data & Analytics
-
view
362 -
download
0
Transcript of Analysis and Classification of Respiratory Health Risks with Respect to Air Pollution Levels
Analysis and Classification of Respiratory HealthRisks with Respect to Air Pollution Levels
Ruhul Amin DickenNorth South University, Bangladesh
SNPD 2015
2015/10/26(Mon.)Chang Wei-Yuan @ MakeLab Lab Meeting
Keywords: data mining; health problem; decision tree; air pollution; respiratory diseases.
Introduction§ Air pollution is the harmful materials to cause
adverse effects on human lives.– more serious with the development of the growing
cities§ Bangladesh is facing this problem due to
continuous increase of population. – this study is focused on a developing country, Dhaka
3
Goal§ This paper works on the relationship between
the pollutants and the admittance of patients.– focused on a case of Dhaka, Bangladesh– K-means method: clustering different air pollutants in
different seasons– CART method: to classify the patients according to
different rate of admission
4
Data Description: Air pollutions§ Air quality data is collected from Dhaka City
monthly– by CASE (Clean Air and Sustainable Environment)– collected: air pollutants and meteorological variables
5
Data Description: Air pollutions§ Air quality data is collected from Dhaka City
monthly– by CASE (Clean Air and Sustainable Environment)– collected: air pollutants and meteorological variables
6
stations time SO2 NO2 … solar rainfall …string datetime float float … float float …
Data Description: Diseases § Respiratory diseases data is collected from
NIDCH monthly– for each diseases
7
location time Age group COPD ILD BroCarstring datetime string integer integer integer
• COPD (chronic obstructive pulmonary disease) 慢性阻塞性肺病• ILD (interstitial lung disease) 肺病變• Bronchogenic/Bronchial Carcinoma 支氣管癌
Clustering § Respiratory diseases admissions data– using k-means with k=3– High (H), Medium (M), Low (L)
11
location time Age group COPD ILD BroCarstring datetime string integer integer integer
location time Age group COPD ILD BroCarstring datetime string level level level
Classification§ Using the air pollution data and clustered
medical data acting as class label– to generate a decision tree which would predict the
level of hospital admissions level– for each Age groups and different diseases
12
stations time SO2 … solar … diseasesstring datetime float … float … level
location time Age group COPD ILD BroCarstring datetime string level level level
Classification§ The decision tree generation process was
conducted on the basis of the three different criterion metrics– (i) Information Gain– (ii) Gini Index– (iii) Gain Ratio
§ Then the two best trees were selected in our results.
13
Evaluation16
• In order for any model to be validated as an applicable model to real world scenarios it must have an accuracy higher than 50%.
Conclusion§ The COPD and ILD model came as applicable
but the bronchitis carcinoma gave a model which was not applicable in real life due to low accuracy.
§ The other factors related to the diagnosis of disease play more important role and levels of air pollution alone is not enough to create a sufficient classification model.
17