PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

24
PREDICTIVE DATA MINING OF CHRONIC DISEASES USING DECISION TREE: A CASE STUDY OF HEALTH INSURANCE COMPANY IN INDONESIA BY DINI HIDAYATUL QUDSI A dissertation submitted in fulfilment of the requirement for the degree of Master of Information Technology Kulliyyah of Information and Communication Technology International Islamic University Malaysia JULY 2015

Transcript of PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

Page 1: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

PREDICTIVE DATA MINING OF CHRONIC DISEASES

USING DECISION TREE:

A CASE STUDY OF HEALTH INSURANCE COMPANY

IN INDONESIA

BY

DINI HIDAYATUL QUDSI

A dissertation submitted in fulfilment of the requirement for

the degree of Master of Information Technology

Kulliyyah of Information and Communication Technology

International Islamic University Malaysia

JULY 2015

Page 2: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

ii

ABSTRACT

The development of information and communication technology has rapidly

penetrated to several sectors including health sector. A good data management has

become necessity for a healthcare company since it will provide better control of the

costs and mitigate risks. However, to develop a good quality data management is

complex. Therefore, data mining as one of the advancements of science and

technology development offers its technique (such as decision tree) to mine the hidden

information from the large amounts of medical data that may improve the decision

making. It is the aim of this study to identify the potential benefits that data mining

can bring to the health sector, using Indonesian Health Insurance company data as

case study. The most commonly data mining technique, decision tree, was used to

generate the prediction model by visualizing the tree to perform predictive analysis of

chronic diseases. All the steps in data mining process such as data collection, data

preprocessing and data mining have been performed by a data mining tool, named

WEKA. Additionally, WEKA also was utilized to evaluate the prediction performance

by measuring the accuracy, the specificity and the sensitivity. Among the result found

in this study shows some factors that the health insurance can take into account when

predicting the treatment cost of a patient.

Page 3: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

iii

ملخّص البحث

طاعات، قلقد توغل تطور تكنولوجيا المعلومات والاتصالات بشكل سريع إلى العديد من الريات للشركات بما في ذلك القطاع الصحي، وأصبحت إدارة البيانات الجيدة من الضرو

الصحية، لقدرتها على تحكم التكاليف، والحد من المخاطر بشكل أفضل، غير أن من ، ولهذا كان التنقيب أو البحث عن البيانات دارة البيانات ذات الجودة النوعيةالصعب تطوير إ

من الممكن أن إحدى المجالات المتقدمة من مجالات العلوم وتطوير التكنولوجيا، حيث لتقنيات التنقيب عن البيانات، مثل شجرة القرارات أن تقوم بالتنقيب عن البيانات المخفية

والمساهمة في تحسين صنع القرار. الهدف من هذه الدراسة معرفة ،يةلكتلة من البيانات الطبنقيب عن البيانات أن يجلبها للقطاع الصحي، وذلك بالاستعانة تالتي يمكن لل الفوائد المحتملة

ببيانات شركة صحية إندونيسية للتأمين، واستخدمها كحالة دراسة. وتعُد شجرة القرارات يات التنقيب عن البيانات، لذلك تمَّ استخدامها لاستخراج أكثر التقنيات شيوعا ضمن تقن

ي للأمراض يل تنبؤ النموذج التنبؤي وذلك عن طريق إظهار الشجرة ومعاينتها لغرض إجراء تحل ،المزمنة. وقد تمَّ إجراء جميع الخطوات أثناء عمليات التنقيب عن البيانات، مثل: جمع البيانات

، (WEKA) االبيانات والمسماة بِويك أداة تنقيب تخلاصها عن طريقومعالجتها، واسلتقييم أداء التنبؤ من خلال قياس كل من: الدقة، ابالإضافة إلى ذلك تمَّ استخدام ويك

والخصوصية، والحساسية. أظهرت نتائج الدراسة بعض العوامل التي يمكن لشركات التأمين الصحي وضعها بعين الاعتبار أثناء التنبؤ بتكاليف علاج المريض.

Page 4: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

iv

APPROVAL PAGE

I certify that I have supervised and read this study and that in my opinion, it conforms

to acceptable standards of scholarly presentation and is fully adequate, in scope and

quality, as a dissertation for the degree of Master of Information Technology

......................................................

Mira Kartiwi

Supervisor

I certify that I have read this study and that in my opinion it conforms to acceptable

standards of scholarly presentation and is fully adequate, in scope and quality, as a

dissertation for the degree of Master of Information Technology

......................................................

Jamaludin Ibrahim

Examiner

This dissertation was submitted to the Department of Information Systems and is

accepted as a fulfilment of the requirement for the degree of Master of Information

Technology

......................................................

Siti Rohimi Bt. Hamedon

Head, Department of Information

Systems

This dissertation was submitted to the Kulliyyah of Information and Communication

Technology and is accepted as a fulfilment of the requirement for the degree of Master

of Information Technology

......................................................

Abdul Wahab Abdul Rahman

Dean, Kulliyyah of Information

and Communication Technology

Page 5: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

v

DECLARATION

I hereby declare that this dissertation is the result of my own investigations, except

where otherwise stated. I also declare that it has not been previously or concurrently

submitted as a whole for any other degrees at IIUM or other institutions.

Dini Hidayatul Qudsi

Signature …………………… Date ……………………………

Page 6: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

vi

Copyright Page

INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

DECLARATION OF COPYRIGHT AND AFFIRMATION

OF FAIR USE OF UNPUBLISHED RESEARCH

Copyright © 2015 by Dini Hidayatul Qudsi. All rights reserved.

PREDICTIVE DATA MINING OF CHRONIC DISEASES USING DECISION

TREE: A CASE STUDY OF HEALTH INSURANCE COMPANY IN

INDONESIA

No part of this unpublished research may be reproduced, stored in a retrieval system,

or transmitted, in any form or by any means, electronic, mechanical, photocopying,

recording, or otherwise without prior written permission of the copyright holder

except as provided below.

1. Any material contained in or derived from this unpublished research

may only be used by others in their writing with due

acknowledgement.

2. IIUM or its library will have the right to make and transmit copies

(print or electronic) for institutional and academic purposes.

3. The IIUM library will have the right to make, store in a retrieval

system and supply copies of this unpublished research if requested

by other universities and research libraries.

Affirmed by Dini Hidayatul Qudsi

…………………..….. ……………… Signature Date

Page 7: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

vii

Dedication

Dedication to:

My beloved parents, brother, sisters and friends

Thank you for your prayers, support, and believe in me

Page 8: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

viii

ACKNOWLEDGEMENTS

In the name of Allah, the Most Gracious and the Most Merciful, along with Salawat

and Salam to our role model, the Prophet Muhammad SAW.

Alhamdulillah, praise to Allah for his blessings and guidance of His grace, and

also for giving me strength, idea, ability and patience, so I could complete this

dissertation.

I would like to take this opportunity to express my gratitude to my supervisor,

Assistant Professor Dr. Mira Kartiwi who has given me guidance, advice, supervision

and idea throughout the study. There is nothing more pleasant than having a

supervisor who is very kind and understanding like her.

I would like to thank Mr. Jamaludin Ibrahim and Dr. Izzuddin Mohd Thamrin

for their helpful feedback and suggestions.

Enormous thanks to my beloved parents, my dearest brother and sisters. Thank

you for always being by my side and never stop believing in me that I would be able

to get here.

Finally, I would like to thank my lovely friends who have given me support,

strength and smiles during the completion of this dissertation.

Alhamdulillah

Page 9: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

ix

TABLE OF CONTENTS

Abstract .......................................................................................................................... ii

Abstract in Arabic ......................................................................................................... iii

Approval Page ............................................................................................................... iv

Declaration ..................................................................................................................... v

Copyright Page .............................................................................................................. vi

Dedication .................................................................................................................... vii

Acknowledgements ..................................................................................................... viii

List of Tables ................................................................................................................ xi

List of Figures .............................................................................................................. xii

CHAPTER 1: INTRODUCTION ............................................................................... 1 1.1 Background of the Study ............................................................................. 1

1.2 Problem Statement ...................................................................................... 4

1.3 Research Objectives .................................................................................... 5

1.4 Research Questions ..................................................................................... 5

1.5 Research Scope ........................................................................................... 5

1.6 Significance of the Study ............................................................................ 6

1.7 Organization of the Study ........................................................................... 7

1.8 Chapter Summary........................................................................................ 7

CHAPTER 2: LITERATURE REVIEW ................................................................... 8 2.1 Introduction ................................................................................................. 8

2.2 Chronic Disease .......................................................................................... 8

2.3 Use of IT in the Health Sector..................................................................... 9

2.3.1 Electronic Health (E-Health)............................................................ 10

2.3.2 Telemedicine .................................................................................... 11

2.3.3 IT Decision Support ......................................................................... 12

2.4 Data Management Challenge in Health Sector ......................................... 14

2.5 Data Mining Implementation in the Health Sector ................................... 16

2.5.1 An Overview of Data Mining .......................................................... 16

2.5.2 Knowledge Discovery in Databases (KDD) .................................... 19

2.5.3 Data Mining Benefit to Health Sector .............................................. 20

2.5.4 Previous Research of Predictive Medical Data Mining ................... 23

2.5.5 Data Mining Challenges in the Health Sector .................................. 25

2.6 Data Mining Techniques ........................................................................... 27

2.6.1 Decision Tree ................................................................................... 29

2.7 WEKA ....................................................................................................... 31

2.8 Chapter Summary...................................................................................... 34

CHAPTER 3: METHODOLOGY ............................................................................ 36 3.1 Introduction ............................................................................................... 36

3.2 CRISP-DM Methodology ......................................................................... 36

3.3 Research Design ........................................................................................ 38

3.4 Data Preprocessing .................................................................................... 42

3.4.1 Raw Data .......................................................................................... 43

Page 10: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

x

3.4.2 Data Integration ................................................................................ 43

3.4.3 Data Cleaning ................................................................................... 44

3.5 Prediction Model ....................................................................................... 44

3.5.1 WEKA .............................................................................................. 45

3.5.2 Decision Tree ................................................................................... 45

3.5.3 J48 Algorithm .................................................................................. 45

3.5.4 Visualize the Tree ............................................................................ 48

3.5.4.1 How to Analyze the Tree Visualizer ..................................... 49

3.6 Validate Prediction Model Performance ................................................... 50

3.7 Instrument/Measures ................................................................................. 51

3.8 Chapter Summary...................................................................................... 52

CHAPTER 4: RESULT AND FINDINGS .............................................................. 53 4.1 Introduction ............................................................................................... 53

4.2 Data Collection and Data Understanding .................................................. 53

4.3 Data Preprocessing .................................................................................... 54

4.4 Modelling and Experiments ...................................................................... 55

4.4.1 Data Preprocessing in WEKA .......................................................... 56

4.4.2 Data Mining Process ........................................................................ 58

4.4.3 Decision Tree ................................................................................... 60

4.4.4 Outpatients Analysis ........................................................................ 62

4.4.5 Inpatients Analysis ........................................................................... 64

4.5 The Summary Evaluation .......................................................................... 66

4.5.1 The Decision Tree Summary Analysis ............................................ 67

4.5.2 The Accuracy of Prediction Performance Analysis ......................... 67

4.5.3 The Confusion Matrix ...................................................................... 75

4.5.3.1 Sensitivity and Specificity ..................................................... 77

4.6 Data Mining Performance ......................................................................... 78

4.7 Chapter Summary...................................................................................... 79

CHAPTER 5: CONCLUSION AND SUGGESTIONS .......................................... 80 5.1 Introduction ............................................................................................... 80

5.2 Summary of Findings ................................................................................ 80

5.2.1 Data Mining Benefit For Health Insurance Company ..................... 81

5.2.2 The Importance of Data Quality in Health Sector ........................... 83

5.3 Limitations and Recommendations ........................................................... 85

5.4 Chapter Summary...................................................................................... 86

BIBLIOGRAPHY ...................................................................................................... 87

APPENDIX I: PREPROCESS PANEL OF WEKA .................................................... 94

APPENDIX II: CLASSIFY PANEL OF WEKA ........................................................ 95

APPENDIX III: CLASSIFIER OPTIONS IN WEKA ................................................ 96

APPENDIX IV: THE CLASSIFICATION DATA IN WEKA ................................... 97

Page 11: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

xi

LIST OF TABLES

Table no. Page no.

2.1 The Medical Data Mining Previous Research For Predictive Modeling 24

3.1 WEKA Confusion Matrix Description 51

4.1 The Description of the Attributes in the Dataset 55

4.2 The Attributes of the Sample Set 55

4.3 The Classification Rules of Outpatients 62

4.4 The Prediction of Chronic Diseases For Outpatients Based On Age Group 63

4.5 The Classification Rules of Inpatients 64

4.6 The Prediction of Chronic Diseases For Inpatients 66

4.7 The Factors that Influence Chronic Diseases 67

4.8 The Reliability of Data Validation Using SPSS 75

4.9 The Confusion Matrix 76

Page 12: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

xii

LIST OF FIGURES

Figure No. Page No.

2.1 The Data Quality Improvement Activities 15

2.2 The Data Mining Process 17

2.3 KDD Process 20

2.4 The Classification of Data Mining Methods 29

2.5 A Simple Decision Tree For Mammalia Classification 31

2.6 WEKA GUI Chooser Interface 32

3.1 The Research Framework 40

3.2 The Visualization of Decision Tree with One Leaf 49

3.3 How to Analyze the Visualization Tree 50

4.1 Preprocess the Data 56

4.2 The Histograms Based On Gender, LOS, Chronic Disease Attribute 57

4.3 The Classify Tab of WEKA 59

4.4 The Decision Tree 60

4.5 The Decision Tree where LOS as the Most Critical Factor 61

4.6 The Classifier Output of Decision Tree 68

4.7 The Classifier Error Graph of Prediction Model 69

4.8 The Classifier Error Information 69

4.9 The Accuracy of Prediction Performance After Removing the Errors 71

4.10 The Decision Tree Based On Age Grouping 72

4.11 The Accuracy of Prediction Performance For Outpatient Only 74

4.12 The Accuracy of Prediction Performance For Inpatient Only 74

4.13 Sensitivity And Specificity Proportion 78

Page 13: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

1

CHAPTER ONE

INTRODUCTION

1.1 BACKGROUND OF THE STUDY

In the last few years, Information Technology (IT) has been developing very fast,

which has changed the way we live and work. The utilization of Information

Technology (IT) wisely will greatly assist in the work field. Therefore, the

development of Information Technology (IT) has been penetrated into the health

sector.

Information Technology (IT) plays an important role in the health sector. Some

examples of the IT implementation in the health sector; namely e-health, telemedicine,

and IT as decision support are successful support business processes and decision

making processes. It is agreed that the use of IT in the health sector will not only

provide benefit for the users, but also for the health organizations, such as hospitals,

health centers, clinics and insurance health company.

Badan Penyelenggara Jaminan Sosial (BPJS), Healthcare and Social Security

Agency, is one of the health insurance companies in Indonesia. BPJS Health is a State

Owned Enterprise that is specifically assigned by the government to provide health

insurance for civil servants, Pension Recipients army / police, Veterans, Independence

Pioneers (and their families) and other business entities. This company handles health

insurance for all kinds of diseases starting from mild to serious illnesses. An example

of the latter is their handling of chronic diseases.

Chronic disease is a disease that lasts for a long duration of time. Currently,

the numbers of diseases that belong to the category of chronic disease have increased.

Page 14: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

2

The following is a list of chronic diseases that are common in this modern era, such as

ALS (Lou Gehrig’s Disease), Alzheimer’s disease (and other Dementias), Arthritis,

Asthma, Cancer, Chronic Obstructive Pulmonary Disease (COPD), Cystic Fibrosis,

Diabetes, Heart Disease, Oral Health, Osteoporosis, Reflex Sympathetic Dystrophy

(RSD) Syndrome, etc. In addition, nowadays, the majority of chronic diseases are

caused by an unhealthy and wrong lifestyle. Therefore, health counseling is needed to

provide an early warning to the young people to start living healthy.

BPJS Health insurance company has been implementing IT system for many

years, to perform several business tasks, such as to manage company’s participants

database which is about 16.8 million people from around Indonesia, to manage some

applications which are spread in several hospitals and to manage medicines and

medical services. Thus, it is necessary for BPJS Health insurance company to have a

good data management. Since a good data management will generate revenue, control

costs and mitigate risks (American Institute of of CPAs, 2013).

Money and people have long been considered to be assets, but nowadays,

many organizations rely on their data to make more informed and effective decisions

which help the organizations to achieve their goals. Hence, data needs to be managed

seriously (Searchdatamanagement.techtarget.com, 2013). But developing a good

quality data management is not easy. Still sometimes, the organization meets some

challenges during data management process, especially in the health sector, huge

amounts of data need to be organized and stored.

Price et al. (2013) stated that various efforts such as case management

implementation, utilization review, and disease management, have been made by the

health care data management practitioners to control the cost of the healthcare and

handle the utilization of services. However, all of these programs do not appear to

Page 15: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

3

work in controlling the cost. They suggested different methods to identify patients

with chronic disease (since it has higher risk for readmission) or to predict disease

progression and health status have been considered by the health insurers and health

systems to control cost in medical professional manner (Price et al., 2013). Therefore,

predictive models built by data mining could be one of the solutions.

Therefore, data mining as one of the advancements of science and technology

development offers its technique (such as decision tree) to extract information from

the huge amount of data that may improve the quality of data decision making

management (Milovic & Milovic, 2012). Data mining can be greatly beneficial for the

healthcare industry.

Data mining is becoming more well-known day by day, since it strengthens the

companies to discover profitable patterns and trends from their existing databases

(Larose, 2005). The crucial objective of data mining is prediction. Predictive data

mining is the most common type of data mining and one that has the most straight

business applications (Statsoft.com, 2014). Data mining uses a technique to build a

model and to validate the predictive performance. Decision tree, as one of the data

mining techniques has proven to become the most accurate predictor among other

techniques, namely artificial neural network and regression model (Delen, Walker, &

Kadam, 2005).

As can be seen from the benefits of data mining above, it is the aim of this

study to identify the potential benefits that data mining can bring to the health sector

in Indonesia. The study will utilize the health insurance data owned by BPJS Health

insurance company to predict factors that influence chronic diseases by using decision

tree as the data mining technique.

Page 16: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

4

1.2 PROBLEM STATEMENT

In many studies done previously, data mining has been proven to be a very useful tool

to predict a disease from medical data records and has been applied in many health

organizations. Data mining executes a large amount of data that can be used to make

better decisions in an organization (Taylor, 2012). However, implementing data

mining in medical sector is a challenging task which requires time and efforts. As

(Cios & Moore, 2002) stated that it is very challenging yet fascinating to apply data

mining, knowledge discovery and machine learning techniques to medical data.

The quantity and the quality of the data play an important role to generate an

accurate predictive model of data mining (Cios & Moore, 2002). However, it is not

easy to obtain accurate and comprehensive medical data. Generally, the dataset to

process data mining are very large, heterogeneous, complicated and differ in quality

(Hosseinkhah, Ashktorab, Veen, & Owrang-Ojaboni, 2008). Data cleaning and data

preprocessing are needed to eliminate the redundant and incomplete data before

processing data mining. But, even after carrying out those two methods, the result

achieved is not useful if the quality data is poor. Thus, poor data quality is one of the

major obstacles to generate a successful data mining (Thorat & Kute, 2014).

In addition, the lack of integration can also be the cause of the data mining

failure. The dataset which is used to load into data mining process comes from

heterogeneous system (distributed between hospitals, health insurance and

government departments) which increases the requirement to uniform standard for

integrating dataset (Thorat & Kute, 2014).

Page 17: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

5

1.3 RESEARCH OBJECTIVES

The objectives of this research are the following:

1. To identify the capabilities of data mining in health sector, especially

using medical dataset.

2. To predict the factors that influence chronic disease and to identify length

of treatment.

3. To evaluate the knowledge derived from patterns generated by the data

mining technique for BPJS Health insurance company.

1.4 RESEARCH QUESTIONS

The research questions of this research are the following:

1. What kind of knowledge emerged from patterns generated by the

classification process and how it may benefit health insurance company.

2. From the data mining process, what are the conditions that the company /

organization has to handle in producing optimum model.

1.5 RESEARCH SCOPE

The scopes and limitations of this research are:

1. This research will focus on how data mining can predict the factors which

influence chronic diseases based on 4 criteria, namely age, gender, los

(length of stay) and disease.

2. All of the data mining process, such as data preprocessing and data

classifying will be executed in WEKA, a data mining tool.

3. Decision tree technique is used in this research for data mining process.

Page 18: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

6

4. C4.5, known as J4.8 in WEKA, is used as the algorithm to generate the

decision tree.

5. Medical dataset that is utilized in this research has come from BPJS

Health insurance company in Indonesia.

1.6 SIGNIFICANCE OF THE STUDY

The findings of this study would provide some potential benefits that data mining can

bring to the health sector in Indonesia, especially to BPJS Health insurance company.

Also, some information can be provided on how a good data management could be

beneficial for the company in order to produce an optimum model of predictive data

mining.

The findings of this study are developed through data mining process which

utilized the health insurance data owned by BPJS Health insurance company. Such

findings can be used to predict factors that influence chronic diseases where it can

assist in the implementation of a new policies / wisdoms for the company. The new

information is extracted from patterns generated after data mining process by using

decision tree as the data mining technique. The reason for using decision tree is the

predictive result of decision tree is easier to read and interpret, especially for people

who are not familiar with data mining, so they can directly draw the information from

the visualization of decision tree.

Page 19: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

7

1.7 ORGANIZATION OF THE STUDY

Chapter One: The introduction of the study, the background of the research, problem

statement, research objectives, research scope, and significance of the study are

elaborated in this chapter.

Chapter Two: This chapter describes the literature review which starts from the

introduction of chronic diseases, continued with the use of IT in the health sector,

followed by the challenges of data management in health sector and how data mining

could offer its benefit. Also, a brief overview of data mining, the previous researches

of predictive medical data mining, the benefit and the challenges of data mining to

health sector have been elaborated. Lastly, in the last section, it explains the decision

tree as a data mining technique and WEKA as a data mining tool

Chapter Three: This chapter describes the methodology that has been implemented in

the research. This is followed by the research framework, which explains the process

taken in every stage to carry out the study. In the last section, the instrument or

measurement of the research is elaborated.

Chapter Four: This chapter presents the finding and result from data analysis after

implementing the research methodology of this study.

Chapter Five: This chapter provides the conclusions, limitations and recommendations

for further study.

1.8 CHAPTER SUMMARY

This chapter is an introduction to the background of this study and why it was

conducted. Followed by the problem statement; the scope, the objectives, and the

questions of the research are discussed. In the end, the significance of the study and

the organization of this report have been described.

Page 20: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

8

CHAPTER TWO

LITERATURE REVIEW

2.1 INTRODUCTION

This chapter would elaborate on the chronic diseases in the first section, followed by

the use of IT in the health sector, for the second section. Section three explains the

challenges of data management in the health sector. Meanwhile, in the fourth section,

data mining implementation in the health sector would be elaborated, starting from a

brief overview of data mining and Knowledge Discovery in Databases (KDD),

followed by the previous researches of predictive medical data mining, ending with

the benefits and challenges of data mining to the health sector. Data mining techniques

will be introduced in the fifth section. Finally, WEKA, data mining software used in

this research would be introduced in the last section.

2.2 CHRONIC DISEASE

Chronic Disease, also known as Non Communicable Disease (NCDs) is a disease that

continues for a long time with the slow progress which is not distributed from person

to person (Who.int, 2014). There are 4 main types of NCDs, which are cardiovascular

diseases (heart attack and stroke), cancers, chronic respiratory disease (chronic

obstructed pulmonary disease and asthma) and diabetes. Moreover, it is elaborated by

Health.ny.gov (2014) that chronic diseases also include ALS (Lou Gehrig’s Disease),

Alzheimer’s Disease and other Dementias, Arthritis, Cystic Fibrosis, Oral Health,

Osteoporosis, Reflex Sympathetic Dystrophy (RSD) Syndrome.

Page 21: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

9

Cmcd.sph.umich.edu (2014) stated that chronic diseases have a major impact

on the people around the world. Chronic diseases are not only the most deadly disease

but also the cause of disability and premature death all over the world. But at the same

time, chronic diseases can be efficiently prevented and avoided.

The risk factors of chronic diseases are widely known which include unhealthy

diet, physical inactivity and tobacco use. These factors have been proven as the causes

that increase blood pressure, elevate glucose levels, abnormal blood lipids,

overweight, and obesity (World Health Organization, 2014). Furthermore, chronic

diseases do not always happen to old people but may also occur to all age groups,

such as type 1 of diabetes and childhood asthma; these are the examples of chronic

diseases that start in early life (Aihw.gov.au, 2014).

2.3 USE OF IT IN THE HEALTH SECTOR

There are several benefits of using IT in the health sector which create efficiency

among patients, doctors, and practitioners. IT has become an important role of

information management in several hospitals in Indonesia. It has been stated by

Oberty (2012), that in Indonesia, information system has assisted health practitioners

in performing their duties related to decision making (Decision Support System). Not

only in decision making, but the benefits of IT can also be seen in the implementation

of e-health, tele nursing, etc. that can improve Indonesian public health services

(Murdiyanti, 2012). Below are the several benefits of IT implementation in the health

sector.

Page 22: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

10

2.3.1 Electronic Health (E-Health)

Eysenbach (2001) defined E-Health as a developing area of using information and

communication technology not only in technical development which are the business

of public health, medical informatics and health services improved through the

internet and technology, but also a state-of-mind, a way of thinking, an attitude, a

pledge for networked, global thinking, to develop health care locally, regionally and

universally.

E-health has been implemented in many healthcare enterprises in order to

support the transformation of the healthcare organizations. For example, the

Tasmanian e-Health Collaborative Project which cooperated with private insurers,

private hospitals, Veterans Affair Department, the Australian Centre for Health

Research and Tasmanian Department of Health and Human Services. The objective of

this project was to develop an ICT infrastructure that could share the electronic

discharge summaries over public and private sector or from private hospitals to the

primary General Practitioner (GP) and other legal care providers. Meanwhile the

information was delivered in a safe way through a broadband network, so it would be

possible to share the information with GP, other legal healthcare providers and

consumers. The fast, reliable, and appropriate e-Health system was expected to

improve the quality, safety, coordination and steadiness of care for patients to

exchange the information between private hospitals and GPs. Moreover, e-Health

system performed a business and technology solution that assisted to support the

sustainability of the health sector and the range of healthcare providers as well as

specialists, community services and federated health (Georgeff, 2007).

Furthermore, the existence of the internet allows HIT (Health Information

Technology) system to utilize cloud computing that can provide a robust infrastructure

Page 23: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

11

and better services. HIT enable health organizations to act more efficiently and cost-

effectively. This encourages the development of a model, "E-health Cloud" that can

help the healthcare industry to overcome the recent and future demands with

minimum cost. Even though HIT, as a Cloud Computing model, has a great potential

to increase the quality of healthcare industry, still not much literature discusses about

this integrated HIT (AbuKhousa, Mohamed, & Al-Jaroodi, 2012).

In the future, all of the relevant stakeholders of e-health (healthcare

professionals, hospitals, insurance companies and drug productions) will concentrate

on digitization of their process and assist the practice of these processes by others. For

instances, pharmaceuticals will concentrate on e-prescriptions and e-refills, and also

will assist doctors and other practitioners to obtain and put their orders of several

medicines. Meanwhile, the insurance companies will concentrate only in e-billing and

e-payments among hospitals, doctors, and pharmacies, whereas the patients will get

the information of results, diagnoses, prescriptions, appointments, and insurance

(Varshney, 2009a).

2.3.2 Telemedicine

Telemedicine can generally be defined as the utilization of telecommunication

technologies that offers medical healthcare services to the patients to share the

information over distances (Varshney, 2009b). It is not difficult to find the

telemedicine technology nowadays. Strong telecommunications network and video

equipment are available extensively with many options. Although currently the

technologies in the health field are still underdeveloped, the manufacturers have

already offered several products that meet industry standards and ensure

interoperability with other devices. Therefore, the development of data integration

Page 24: PREDICTIVE DATA MINING OF CHRONIC DISEASES USING …

12

between different systems continues to be pursued. Since it is important to health care

considering the patient’s data must be available whenever needed (Harnett, 2006).

Telemedicine technologies have been implemented in many health industries.

For example, tele-psychiatry, videoconferencing and tele-psychology were utilized in

the field of mental health in UK. Videoconferencing has been proved that it could

improve the psychiatric services, particularly for those patients who lived in rural

areas, which can be an effective method to help the patients (Norman, 2006).

Another example of the telemedicine implementation is in South Africa. The

potential advantage of Information and Communication Technology (ICT) in

delivering healthcare to the rural areas has been well-known for the South African

National Department of Health (DoH). Since half of the population of the country

lives in rural areas, telemedicine becomes a good strategy to overcome the

disproportionate distribution of healthcare resources. The telemedicine maturity model

was proposed to measure, manage, and enhance all the components of a telemedicine

system to generate an improvement process that would suite an enterprise (Van Dyk,

Fortuin, & Schutte, 2012).

2.3.3 IT Decision Support

Berner & Lande (2007) stated that the implementation of IT in health sector to support

decision making, called Clinical Decision Support (CDS) System, is a computer

system that is built to influence physicians’ decision-making for the patients. The

decision is made at that moment in time.

Some of the benefits of the clinical decision support system are to increase the

quality of the medical diagnosis and to decrease diagnostic errors. Graber & Mathew

(2008) have developed a new web-based clinical decision support system that can