An Implementation of Data Mining Techniques in Health Care Industries

7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries

1/7

International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,

ISSN: 1837-7823

10

An Implementation of Data Mining Techniques in Health Care Industries

R.Anand1, Dr.S.K. Srivatsa

2

1Research

Scholar, Sri Chandra Sekarendra Viswa Maha Vidyalaya, Enathur, Kanchipuram-531 602.

2Sr.Professor, St. Joseph College of Engineering, Chennai-600 119.

Abstract

Data Mining is an emerging field. Every day health care industry produces huge volume of data. It is very tedious

task to find the right data for the right place. In other words we are having rich data but poor utilization. In this paper

we discuss how data mining techniques can be used in health care industry. Health care industry has enormous data.

Healthcare industry today generates large amounts of complex data about patients, hospitals resources, disease

diagnosis, electronic patient records, medical devices etc. The large amounts of data is a key resource to be

processed and analyzed for knowledge extraction that enables support for cost-savings and decision making. Data

mining brings a set of tools and techniques that can be applied to this processed data to discover hidden patterns that

provide healthcare professionals an additional source of knowledge for making decisions. With continuous advances

in technology, increasing number of clinicians are using electronic medical records to accumulate substantialamounts of data about their patients with the associated clinical conditions and treatment details. The hidden

relationships and patterns within these information would further our medical knowledge including its efficiencies

and deficiencies. Methodologies that are being used in parallel industries with increasing effectively need to be

modified and applied to discover this knowledge. In this article we are going to discuss about how data mining can

be used in medical field, how it solves the business issues, challenges in data mining, data mining techniques and the

semma methodology.

Keywords: Data Mining, Health Care, Neural Networks, Neuro Fuzzy, Decision Tree algorithm

1.Introduction

Data mining can be defined as the process of finding previously unknown patterns and trends in databases and using

that information to build predictive models. Alternatively, it can be defined as the process of data selection and

exploration and building models using vast data stores to uncover previously unknown patterns. Data mining is not

newit has been used intensively and extensively by financial institutions, for credit scoring and fraud detection;

marketers, for direct marketing and cross-selling or up-selling; retailers, for market segmentation and store layout;

and manufacturers, for quality control and maintenance scheduling.

In healthcare, data mining is becoming increasingly popular, if not increasingly essential. Several factors have

motivated the use of data mining applications in healthcare. The existence of medical insurance fraud and abuse, for

example, has led many healthcare insurers to attempt to reduce their losses by using data mining tools to help them

find and track offenders. Fraud detection using data mining applications is prevalent in the commercial world, for

example, in the detection of fraudulent credit card transactions. Recently, there have been reports of successful data

mining applications in healthcare fraud and abuse detection. Another factor is that the huge amounts of data

generated by healthcare transactions are too complex and voluminous to be processed and analyzed by traditional

methods. Data mining can improve decision-making by discovering patterns and trends in large amounts of complexdata. Such analysis has become increasingly essential as financial pressures have heightened the need for healthcare

organizations to make decisions based on the analysis of clinical and financial data. Insights gained from data

mining can influence cost, revenue, and operating efficiency while maintaining a high level of care.


2/7


ISSN: 1837-7823

11

2. Data Mining in Medical Data Bases

Data mining is an essential step of knowledge discovery. In recent years it has attracted great deal of interest in

Information industry. Knowledge discovery process consists of an iterative sequence of data cleaning, data

integration, data selection, data mining pattern recognition and knowledge presentation. In particulars, data mining

may accomplish class description, association, classification, clustering, prediction and time series analysis. Datamining in contrast to traditional data analysis is discovery driven. Data mining is a young interdisciplinary field

closely connected to data warehousing, statistics, machine learning, neural networks and inductive logic

programming.

Fig 1 Data Base ArchitectureData mining provides automatic pattern recognition and attempts to uncover patterns in data that are difficult to

detect with traditional statistical methods. Without data mining it is difficult to realize the full potential of data

collected within healthcare organization as data under analysis is massive, highly dimensional, distributed and

uncertain. Massive healthcare data needs to be converted into information and knowledge, which can help control,

cost and maintains high quality of patient care. Healthcare data includes Patient centric data and Aggregate data. For

health care organization to succeed they must have the ability to capture, store and analyze data (Fig.1) Online

analytical processing (OLAP) provides one way for data to be analyzed in a multi-dimensional capacity. With the

adoption of data warehousing and data analysis/OLAP tools, an organization can make strides in leveraging data for

better decision making. Many healthcare organizations struggle with the utilization of data collected through an

organization online transaction processing (OLTP) system that is not integrated for decision making and pattern

analysis. For successful healthcare organization it is important to empower the management and staff with data

warehousing based on critical thinking and knowledge management tools for strategic decision making. Data


3/7


ISSN: 1837-7823

12

warehousing can be supported by decision support tools such as data mart, OLAP and data mining tools. A data mart

is a subset of data warehouse. It focuses on selected subjects. Online analytical processing (OLAP) solution provides

a multi-dimensional view of the data found in relational databases. With stored data in two dimensional format

OLAP makes it possible to analyze potentially large amount of data with very fast response times and provides the

ability for users to go through the data and drill down or roll up through various dimensions as defined by the data

structure.

3. Solving Business Issues

The SAS data mining solution has been used in health care to overcome a wide range of business issues and

problems. Some of these include:

Segmenting customers/patients accurately into groups with similar health patterns. Rapidly identifying who are the most profitable customers and the underlying reasons. Understanding why customers leave for competitors (attrition, churn analysis). Planning for effective information systems management. Preparing for demand of resources. Anticipating customers/patients future actions, given their history and characteristics. Predicting medical diagnosis. Forecasting treatment costs. Predicting length of stay in a hospital. Identifying medical procedure expenditures and utilization by analyzing claims and point-of-care data. Predicting total cost of patient care.

Once the business problems have been defined and agreed upon, the next logical step is to determine the type and

amount of data that will be necessary for making business decisions. As a precursor to data mining, a data

warehouse strategy and implementation is suggested. Integration with SAS software gives the SAS data mining

solution several distinguishing characteristics which allow faster, easier and more accurate conversion of data into

knowledge useful to decision makers.

Data diversityThe SAS data mining solution is designed to accept a wider range of data formats than any other data mining

product currently on the market. It will accept data from relational and hierarchical databases, flat files, and other

data formats, and it will accept this data from all major hardware platforms.

Distributed client/serverThe SAS data mining solution supports both the data server model of client/server computing, in which data

located on a remote machine can be accessed, and the compute server model, which allows data to be processed

on a remote server and then forwarded to a client. This is particularly well suited to analytical tasks involving large

volumes of data that require superior processing capabilities.

Consistent implementation on multiple platformsThe SAS data mining solutions are fully integrated with SAS Software and give users the flexibility to use their

platforms of choice, ranging form desktop machines to powerful servers.

Integrated data management

SAS softwares data management facilities guarantee data integrity without the need for re-keying or additional

validation of data. No other data mining solution includes seamless integration with such a comprehensive range of

data management functionality. Once the business objectives and data issues have been resolved, the methodology

and approach to data mining can begin.


4/7


ISSN: 1837-7823

13

4. Data Mining Challenges

Efficiency and scalability of data mining algorithms Parallel, distributed, stream, and incremental mining methods Handling high-dimensionality Handling noise, uncertainty, and incompleteness of data Incorporation of constraints, expert knowledge, and background knowledge in data mining Pattern evaluation and knowledge integration Mining diverse and heterogeneous kinds of data: e.g., bioinformatics, Web, software/system engineering,

information networks

Application-oriented and domain-specific data mining Invisible data mining (embedded in other functional modules) Protection of security, integrity, and privacy in data mining

5. Semma Methodology

The methodology and approach that SAS Institute proposes is referred to as SEMMA, for Sample,Explore,Modify,

Model, andAssess. Beginning with a statistically representative sample of data, users can apply exploratory

statistical and visualization techniques, select and transform the most significant predictive variables, model the

variables to predict outcomes, and affirm the models accuracy.

Sample

The first step is to extract a portion of a large data set big enough to contain the significant information yet small

enough to manipulate quickly.

Explore

This phase involves searching speculatively for unanticipated trends and anomalies so as to gain understanding and

ideas. This can reveal which subset of attributes will be the most productive to work with the modeling phase. Data

visualization delivers intuitive tools for business professionals, while statistical techniques offer added detail for

specialist.

Modify

The insights that are gained from the exploration phase enable knowledge workers to group the most productive

subsets and clusters of data together for further analysis and exploration.

Model

This process involves searching automatically for a variable combination that reliably predicts a desired outcome.

Data mining techniques such as neural networks, tree-based models, and traditional statistical techniques can help

reveal patterns in the data and provide a best-fitting predictive model.

Assess

During this evaluation process, assessment of the results gained from modeling provides indications as to which

results should be conveyed to senior management, how to model new questions that have been raised by the

previous results and thus proceed back to the exploration phase.

SEMMA is a process that allows SAS Institute to distinguish ourselves by being the only vendor that can offer all of

these components, as well as the ability to seamlessly integrate them with a companys existing hardware and

software strategy.


5/7


ISSN: 1837-7823

14

Fig.2 Semma Methodology

6. Data Mining Techniques

There are various data mining techniques available with their suitability dependent on the domain application.

Statistics provide a strong fundamental background for quantification and evaluation of results. However, algorithms

based on statistics need to be modified and scaled before they are applied to data mining. We now describe a few

Classification data mining techniques with illustrations of their applications to healthcare.

Rule set classifiers

Complex decision trees can be difficult to understand, for instance because information about one class is usually

distributed throughout the tree. An alternative formalism consisting of a list of rules of the form if A and B and C

and ... then class X, where rules for each class are grouped together. A case is classified by finding the first rule

whose conditions are satisfied by the case; if no rule is satisfied, the case is assigned to a default class

IF conditions THEN conclusion

This kind of rule consists of two parts. The rule antecedent (the IF part) contains one or more conditions about value

of predictor attributes where as the rule consequent (THEN part) contains a prediction about the value of a goal

attribute. An accurate prediction of the value of a goal attribute will improve decision-making process. IF-THEN

prediction rules are very popular in data mining; they represent discovered knowledge at a high level of abstraction.

In the health care system it can be applied as follows:

(Symptoms) (Previous--- history) ---->

(Causeof--- disease)

Example 1: If_then_rule induced in the diagnosis of level of

alcohol in blood

IF Sex = MALE AND Unit = 8.9 AND Meal = FULL

THEN


6/7


ISSN: 1837-7823

15

Diagnosis=Blood_alcohol_content_HIGH.

Decision Tree algorithms

Decision tree include CART (Classification and Regression Tree), ID3 (Iterative Dichotomized 3). These algorithms

differ in selection of splits, when to stop a node from splitting, and assignment of class to a non-split node. CART

uses Gini index to measure the impurity of a partition or set of training tuples. It can handle high dimensional

categorical data. Decision Trees can also handle continuous data (as in regression) but they must be converted to

categorical data.

Neural Network Architecture

The architecture of the neural network used in this study is the multilayered feed-forward network architecture with

20 input nodes, 10 hidden nodes, and 10 output nodes. The number of input nodes is determined by the finalized

data; the number of hidden nodes is determined through trial and error and the number of output nodes is

represented as a range showing the disease classification. The most widely used neural-network learning method is

the BP algorithm. Learning in a neural network involves modifying the weights and biases of the network in order to

minimize a cost function. The cost function always includes an error term a measure of how close the network's

predictions are to the class labels for the examples in the training set. Additionally, it may include a complexity term

that reacts to a prior distribution over the values that the parameters can take. Neural networks have been proposedas useful tools in decision making in a variety of medical applications. Neural networks will never replace human

experts but they can help in screening and can be used by experts to double-check their diagnosis. In general, results

of disease classification or prediction task are true only with a certain probability.

Neuro-Fuzzy

Stochastic back propagation algorithm is used for the construction of fuzzy based neural network. The steps

involved in the algorithm are as follows: First, initialize weights of the connections with random values. Second for

each unit compute net input value, output value and error rate. Third, to handle uncertainty for each node, certainty

measure (c) for each node is calculated. Based on the certainty measure the decision is made. The level of the

certainty is computed using the following conditions.

a. If 0.8 \< c 1, then there exists very high certaintyb. If 0.6 \< c 0.8, then there exists high certaintyc. If 0.4 \< c 0.6, then there exists average certainty

d. If 0.1 \< c 0.4, then there exists less certainty

e. If c 0.1, then there exists very less certainty

The network constructed consists of 3 layers namely an input layer, a hidden layer and an output layer. Sample

trained neural network consisting of 9 input nodes, 3 hidden nodes and 1 output node is shown in Figure 2. When a

thrombus or blood clot occupies more than 75% of surface area of the lumen of an artery then the expected result

may be a prediction of cell death or heart disease according to medical guidelines i.e. R is generated with reference

to the given set of input data.

7. Summary

The effective use of information and technology is crucial for health care organizations to stay competitive in

todays complex, evolving environment. The challenges faced when trying to make sense of large, diverse, and often

complex data source are considerable. In an effort to turn information into knowledge, health care organizations are

implementing data mining technologies to help control costs and improve the efficacy of patient care. Data mining

can be used to help predict future patient behavior and to improve treatment programs. By identifying high-risk

patients, clinicians can better manage the care of patients today so they do not become the problems of tomorrow.


7/7


ISSN: 1837-7823

16

We studied the problem of constraining and summarizing different algorithms of data mining. We focused on using

different algorithms for predicting combinations of several target attributes. Finally we conclude that if we use

proper data mining algorithms in health care industry we produce better results and it can be used to prevent several

diseases.

8. References

[1] Shams, K. and M. Frashita, 2001. Data Warehousing Toward Knowledge Management. Topics in Health

Information Management, 21: 3.

[2] Jones, A.W., 1990. Physiological Aspects of Breath-Alcohol Measurements. Alcohol Drugs Driving, 6:1-25.

[3] Han, J. and M. Kamber, 2001. Data Mining: Concepts and Techniques. San Francisco, Morgan Kauffmann

Publishers.

[4] Veletsos, A. (2003). Getting to the bottom of hospital finances. Health Management Technology, 24(8), 30-31.

[5] Dakins, D.R. (2001). Center takes data tracking to heart. Health Data Management, 9(1), 32-36.

[6] Johnson, D.E.L. (2001). Web-based data analysis tools help providers, MCOs contain costs. Health Care

Strategic Management, 19(4), 16-19.

[7] Schuerenberg, B.K. (2003). An information excavation.Health Data Management, 11(6), 80-82.

[8] Piazza, P. (2002). Health alerts to fight bioterror. Security Management, 46(5), 40.[9] Brewin, B. (2003). New health data net may help in fight against SARS. Computerworld, 37(17), 1, 59.

[10] Paddison, N. (2000). Index predicts individual service use. Health Management Technology, 21(2), 14-17.

[11] Johnston G. System adds to biodefense readiness. Bio-IT World. November 1, 2002. Available at www.bio-

itworld.com/ news/110102_report1436.html. Accessed July 21, 2004.

[12] Jiawei Han, Micheline Kamber. Data mining concepts and techniques. Morgan Kaufmann Publishers. ISBN

1055860-489-8

[13] Philip Baylis et al. Better health care with data mining, Clementine working with health care. SPSS white

paper. Shared Medical Systems Limited, UK

[14] Kristin B. Degrug, MSHS. Healthcare Applications of Knowledge Discovery in Databases. Journal of

Healthcare Information Management, Vol. 14, no. 2, Summer 2000.

An Implementation of Data Mining Techniques in Health Care Industries

Documents

Transcript of An Implementation of Data Mining Techniques in Health Care Industries