An Implementation of Data Mining Techniques in Health Care Industries
-
Upload
rachel-wheeler -
Category
Documents
-
view
216 -
download
0
Transcript of An Implementation of Data Mining Techniques in Health Care Industries
-
7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries
1/7
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,
ISSN: 1837-7823
10
An Implementation of Data Mining Techniques in Health Care Industries
R.Anand1, Dr.S.K. Srivatsa
2
1Research
Scholar, Sri Chandra Sekarendra Viswa Maha Vidyalaya, Enathur, Kanchipuram-531 602.
2Sr.Professor, St. Joseph College of Engineering, Chennai-600 119.
Abstract
Data Mining is an emerging field. Every day health care industry produces huge volume of data. It is very tedious
task to find the right data for the right place. In other words we are having rich data but poor utilization. In this paper
we discuss how data mining techniques can be used in health care industry. Health care industry has enormous data.
Healthcare industry today generates large amounts of complex data about patients, hospitals resources, disease
diagnosis, electronic patient records, medical devices etc. The large amounts of data is a key resource to be
processed and analyzed for knowledge extraction that enables support for cost-savings and decision making. Data
mining brings a set of tools and techniques that can be applied to this processed data to discover hidden patterns that
provide healthcare professionals an additional source of knowledge for making decisions. With continuous advances
in technology, increasing number of clinicians are using electronic medical records to accumulate substantialamounts of data about their patients with the associated clinical conditions and treatment details. The hidden
relationships and patterns within these information would further our medical knowledge including its efficiencies
and deficiencies. Methodologies that are being used in parallel industries with increasing effectively need to be
modified and applied to discover this knowledge. In this article we are going to discuss about how data mining can
be used in medical field, how it solves the business issues, challenges in data mining, data mining techniques and the
semma methodology.
Keywords: Data Mining, Health Care, Neural Networks, Neuro Fuzzy, Decision Tree algorithm
1.Introduction
Data mining can be defined as the process of finding previously unknown patterns and trends in databases and using
that information to build predictive models. Alternatively, it can be defined as the process of data selection and
exploration and building models using vast data stores to uncover previously unknown patterns. Data mining is not
newit has been used intensively and extensively by financial institutions, for credit scoring and fraud detection;
marketers, for direct marketing and cross-selling or up-selling; retailers, for market segmentation and store layout;
and manufacturers, for quality control and maintenance scheduling.
In healthcare, data mining is becoming increasingly popular, if not increasingly essential. Several factors have
motivated the use of data mining applications in healthcare. The existence of medical insurance fraud and abuse, for
example, has led many healthcare insurers to attempt to reduce their losses by using data mining tools to help them
find and track offenders. Fraud detection using data mining applications is prevalent in the commercial world, for
example, in the detection of fraudulent credit card transactions. Recently, there have been reports of successful data
mining applications in healthcare fraud and abuse detection. Another factor is that the huge amounts of data
generated by healthcare transactions are too complex and voluminous to be processed and analyzed by traditional
methods. Data mining can improve decision-making by discovering patterns and trends in large amounts of complexdata. Such analysis has become increasingly essential as financial pressures have heightened the need for healthcare
organizations to make decisions based on the analysis of clinical and financial data. Insights gained from data
mining can influence cost, revenue, and operating efficiency while maintaining a high level of care.
-
7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries
2/7
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,
ISSN: 1837-7823
11
2. Data Mining in Medical Data Bases
Data mining is an essential step of knowledge discovery. In recent years it has attracted great deal of interest in
Information industry. Knowledge discovery process consists of an iterative sequence of data cleaning, data
integration, data selection, data mining pattern recognition and knowledge presentation. In particulars, data mining
may accomplish class description, association, classification, clustering, prediction and time series analysis. Datamining in contrast to traditional data analysis is discovery driven. Data mining is a young interdisciplinary field
closely connected to data warehousing, statistics, machine learning, neural networks and inductive logic
programming.
Fig 1 Data Base ArchitectureData mining provides automatic pattern recognition and attempts to uncover patterns in data that are difficult to
detect with traditional statistical methods. Without data mining it is difficult to realize the full potential of data
collected within healthcare organization as data under analysis is massive, highly dimensional, distributed and
uncertain. Massive healthcare data needs to be converted into information and knowledge, which can help control,
cost and maintains high quality of patient care. Healthcare data includes Patient centric data and Aggregate data. For
health care organization to succeed they must have the ability to capture, store and analyze data (Fig.1) Online
analytical processing (OLAP) provides one way for data to be analyzed in a multi-dimensional capacity. With the
adoption of data warehousing and data analysis/OLAP tools, an organization can make strides in leveraging data for
better decision making. Many healthcare organizations struggle with the utilization of data collected through an
organization online transaction processing (OLTP) system that is not integrated for decision making and pattern
analysis. For successful healthcare organization it is important to empower the management and staff with data
warehousing based on critical thinking and knowledge management tools for strategic decision making. Data
-
7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries
3/7
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,
ISSN: 1837-7823
12
warehousing can be supported by decision support tools such as data mart, OLAP and data mining tools. A data mart
is a subset of data warehouse. It focuses on selected subjects. Online analytical processing (OLAP) solution provides
a multi-dimensional view of the data found in relational databases. With stored data in two dimensional format
OLAP makes it possible to analyze potentially large amount of data with very fast response times and provides the
ability for users to go through the data and drill down or roll up through various dimensions as defined by the data
structure.
3. Solving Business Issues
The SAS data mining solution has been used in health care to overcome a wide range of business issues and
problems. Some of these include:
Segmenting customers/patients accurately into groups with similar health patterns. Rapidly identifying who are the most profitable customers and the underlying reasons. Understanding why customers leave for competitors (attrition, churn analysis). Planning for effective information systems management. Preparing for demand of resources. Anticipating customers/patients future actions, given their history and characteristics. Predicting medical diagnosis. Forecasting treatment costs. Predicting length of stay in a hospital. Identifying medical procedure expenditures and utilization by analyzing claims and point-of-care data. Predicting total cost of patient care.
Once the business problems have been defined and agreed upon, the next logical step is to determine the type and
amount of data that will be necessary for making business decisions. As a precursor to data mining, a data
warehouse strategy and implementation is suggested. Integration with SAS software gives the SAS data mining
solution several distinguishing characteristics which allow faster, easier and more accurate conversion of data into
knowledge useful to decision makers.
Data diversityThe SAS data mining solution is designed to accept a wider range of data formats than any other data mining
product currently on the market. It will accept data from relational and hierarchical databases, flat files, and other
data formats, and it will accept this data from all major hardware platforms.
Distributed client/serverThe SAS data mining solution supports both the data server model of client/server computing, in which data
located on a remote machine can be accessed, and the compute server model, which allows data to be processed
on a remote server and then forwarded to a client. This is particularly well suited to analytical tasks involving large
volumes of data that require superior processing capabilities.
Consistent implementation on multiple platformsThe SAS data mining solutions are fully integrated with SAS Software and give users the flexibility to use their
platforms of choice, ranging form desktop machines to powerful servers.
Integrated data management
SAS softwares data management facilities guarantee data integrity without the need for re-keying or additional
validation of data. No other data mining solution includes seamless integration with such a comprehensive range of
data management functionality. Once the business objectives and data issues have been resolved, the methodology
and approach to data mining can begin.
-
7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries
4/7
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,
ISSN: 1837-7823
13
4. Data Mining Challenges
Efficiency and scalability of data mining algorithms Parallel, distributed, stream, and incremental mining methods Handling high-dimensionality Handling noise, uncertainty, and incompleteness of data Incorporation of constraints, expert knowledge, and background knowledge in data mining Pattern evaluation and knowledge integration Mining diverse and heterogeneous kinds of data: e.g., bioinformatics, Web, software/system engineering,
information networks
Application-oriented and domain-specific data mining Invisible data mining (embedded in other functional modules) Protection of security, integrity, and privacy in data mining
5. Semma Methodology
The methodology and approach that SAS Institute proposes is referred to as SEMMA, for Sample,Explore,Modify,
Model, andAssess. Beginning with a statistically representative sample of data, users can apply exploratory
statistical and visualization techniques, select and transform the most significant predictive variables, model the
variables to predict outcomes, and affirm the models accuracy.
Sample
The first step is to extract a portion of a large data set big enough to contain the significant information yet small
enough to manipulate quickly.
Explore
This phase involves searching speculatively for unanticipated trends and anomalies so as to gain understanding and
ideas. This can reveal which subset of attributes will be the most productive to work with the modeling phase. Data
visualization delivers intuitive tools for business professionals, while statistical techniques offer added detail for
specialist.
Modify
The insights that are gained from the exploration phase enable knowledge workers to group the most productive
subsets and clusters of data together for further analysis and exploration.
Model
This process involves searching automatically for a variable combination that reliably predicts a desired outcome.
Data mining techniques such as neural networks, tree-based models, and traditional statistical techniques can help
reveal patterns in the data and provide a best-fitting predictive model.
Assess
During this evaluation process, assessment of the results gained from modeling provides indications as to which
results should be conveyed to senior management, how to model new questions that have been raised by the
previous results and thus proceed back to the exploration phase.
SEMMA is a process that allows SAS Institute to distinguish ourselves by being the only vendor that can offer all of
these components, as well as the ability to seamlessly integrate them with a companys existing hardware and
software strategy.
-
7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries
5/7
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,
ISSN: 1837-7823
14
Fig.2 Semma Methodology
6. Data Mining Techniques
There are various data mining techniques available with their suitability dependent on the domain application.
Statistics provide a strong fundamental background for quantification and evaluation of results. However, algorithms
based on statistics need to be modified and scaled before they are applied to data mining. We now describe a few
Classification data mining techniques with illustrations of their applications to healthcare.
Rule set classifiers
Complex decision trees can be difficult to understand, for instance because information about one class is usually
distributed throughout the tree. An alternative formalism consisting of a list of rules of the form if A and B and C
and ... then class X, where rules for each class are grouped together. A case is classified by finding the first rule
whose conditions are satisfied by the case; if no rule is satisfied, the case is assigned to a default class
IF conditions THEN conclusion
This kind of rule consists of two parts. The rule antecedent (the IF part) contains one or more conditions about value
of predictor attributes where as the rule consequent (THEN part) contains a prediction about the value of a goal
attribute. An accurate prediction of the value of a goal attribute will improve decision-making process. IF-THEN
prediction rules are very popular in data mining; they represent discovered knowledge at a high level of abstraction.
In the health care system it can be applied as follows:
(Symptoms) (Previous--- history) ---->
(Causeof--- disease)
Example 1: If_then_rule induced in the diagnosis of level of
alcohol in blood
IF Sex = MALE AND Unit = 8.9 AND Meal = FULL
THEN
-
7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries
6/7
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,
ISSN: 1837-7823
15
Diagnosis=Blood_alcohol_content_HIGH.
Decision Tree algorithms
Decision tree include CART (Classification and Regression Tree), ID3 (Iterative Dichotomized 3). These algorithms
differ in selection of splits, when to stop a node from splitting, and assignment of class to a non-split node. CART
uses Gini index to measure the impurity of a partition or set of training tuples. It can handle high dimensional
categorical data. Decision Trees can also handle continuous data (as in regression) but they must be converted to
categorical data.
Neural Network Architecture
The architecture of the neural network used in this study is the multilayered feed-forward network architecture with
20 input nodes, 10 hidden nodes, and 10 output nodes. The number of input nodes is determined by the finalized
data; the number of hidden nodes is determined through trial and error and the number of output nodes is
represented as a range showing the disease classification. The most widely used neural-network learning method is
the BP algorithm. Learning in a neural network involves modifying the weights and biases of the network in order to
minimize a cost function. The cost function always includes an error term a measure of how close the network's
predictions are to the class labels for the examples in the training set. Additionally, it may include a complexity term
that reacts to a prior distribution over the values that the parameters can take. Neural networks have been proposedas useful tools in decision making in a variety of medical applications. Neural networks will never replace human
experts but they can help in screening and can be used by experts to double-check their diagnosis. In general, results
of disease classification or prediction task are true only with a certain probability.
Neuro-Fuzzy
Stochastic back propagation algorithm is used for the construction of fuzzy based neural network. The steps
involved in the algorithm are as follows: First, initialize weights of the connections with random values. Second for
each unit compute net input value, output value and error rate. Third, to handle uncertainty for each node, certainty
measure (c) for each node is calculated. Based on the certainty measure the decision is made. The level of the
certainty is computed using the following conditions.
a. If 0.8 \< c 1, then there exists very high certaintyb. If 0.6 \< c 0.8, then there exists high certaintyc. If 0.4 \< c 0.6, then there exists average certainty
d. If 0.1 \< c 0.4, then there exists less certainty
e. If c 0.1, then there exists very less certainty
The network constructed consists of 3 layers namely an input layer, a hidden layer and an output layer. Sample
trained neural network consisting of 9 input nodes, 3 hidden nodes and 1 output node is shown in Figure 2. When a
thrombus or blood clot occupies more than 75% of surface area of the lumen of an artery then the expected result
may be a prediction of cell death or heart disease according to medical guidelines i.e. R is generated with reference
to the given set of input data.
7. Summary
The effective use of information and technology is crucial for health care organizations to stay competitive in
todays complex, evolving environment. The challenges faced when trying to make sense of large, diverse, and often
complex data source are considerable. In an effort to turn information into knowledge, health care organizations are
implementing data mining technologies to help control costs and improve the efficacy of patient care. Data mining
can be used to help predict future patient behavior and to improve treatment programs. By identifying high-risk
patients, clinicians can better manage the care of patients today so they do not become the problems of tomorrow.
-
7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries
7/7
International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,
ISSN: 1837-7823
16
We studied the problem of constraining and summarizing different algorithms of data mining. We focused on using
different algorithms for predicting combinations of several target attributes. Finally we conclude that if we use
proper data mining algorithms in health care industry we produce better results and it can be used to prevent several
diseases.
8. References
[1] Shams, K. and M. Frashita, 2001. Data Warehousing Toward Knowledge Management. Topics in Health
Information Management, 21: 3.
[2] Jones, A.W., 1990. Physiological Aspects of Breath-Alcohol Measurements. Alcohol Drugs Driving, 6:1-25.
[3] Han, J. and M. Kamber, 2001. Data Mining: Concepts and Techniques. San Francisco, Morgan Kauffmann
Publishers.
[4] Veletsos, A. (2003). Getting to the bottom of hospital finances. Health Management Technology, 24(8), 30-31.
[5] Dakins, D.R. (2001). Center takes data tracking to heart. Health Data Management, 9(1), 32-36.
[6] Johnson, D.E.L. (2001). Web-based data analysis tools help providers, MCOs contain costs. Health Care
Strategic Management, 19(4), 16-19.
[7] Schuerenberg, B.K. (2003). An information excavation.Health Data Management, 11(6), 80-82.
[8] Piazza, P. (2002). Health alerts to fight bioterror. Security Management, 46(5), 40.[9] Brewin, B. (2003). New health data net may help in fight against SARS. Computerworld, 37(17), 1, 59.
[10] Paddison, N. (2000). Index predicts individual service use. Health Management Technology, 21(2), 14-17.
[11] Johnston G. System adds to biodefense readiness. Bio-IT World. November 1, 2002. Available at www.bio-
itworld.com/ news/110102_report1436.html. Accessed July 21, 2004.
[12] Jiawei Han, Micheline Kamber. Data mining concepts and techniques. Morgan Kaufmann Publishers. ISBN
1055860-489-8
[13] Philip Baylis et al. Better health care with data mining, Clementine working with health care. SPSS white
paper. Shared Medical Systems Limited, UK
[14] Kristin B. Degrug, MSHS. Healthcare Applications of Knowledge Discovery in Databases. Journal of
Healthcare Information Management, Vol. 14, no. 2, Summer 2000.