An Implementation of Data Mining Techniques in Health Care Industries

download An Implementation of Data Mining Techniques in Health Care Industries

of 7

Transcript of An Implementation of Data Mining Techniques in Health Care Industries

  • 7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries

    1/7

    International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,

    ISSN: 1837-7823

    10

    An Implementation of Data Mining Techniques in Health Care Industries

    R.Anand1, Dr.S.K. Srivatsa

    2

    1Research

    Scholar, Sri Chandra Sekarendra Viswa Maha Vidyalaya, Enathur, Kanchipuram-531 602.

    2Sr.Professor, St. Joseph College of Engineering, Chennai-600 119.

    Abstract

    Data Mining is an emerging field. Every day health care industry produces huge volume of data. It is very tedious

    task to find the right data for the right place. In other words we are having rich data but poor utilization. In this paper

    we discuss how data mining techniques can be used in health care industry. Health care industry has enormous data.

    Healthcare industry today generates large amounts of complex data about patients, hospitals resources, disease

    diagnosis, electronic patient records, medical devices etc. The large amounts of data is a key resource to be

    processed and analyzed for knowledge extraction that enables support for cost-savings and decision making. Data

    mining brings a set of tools and techniques that can be applied to this processed data to discover hidden patterns that

    provide healthcare professionals an additional source of knowledge for making decisions. With continuous advances

    in technology, increasing number of clinicians are using electronic medical records to accumulate substantialamounts of data about their patients with the associated clinical conditions and treatment details. The hidden

    relationships and patterns within these information would further our medical knowledge including its efficiencies

    and deficiencies. Methodologies that are being used in parallel industries with increasing effectively need to be

    modified and applied to discover this knowledge. In this article we are going to discuss about how data mining can

    be used in medical field, how it solves the business issues, challenges in data mining, data mining techniques and the

    semma methodology.

    Keywords: Data Mining, Health Care, Neural Networks, Neuro Fuzzy, Decision Tree algorithm

    1.Introduction

    Data mining can be defined as the process of finding previously unknown patterns and trends in databases and using

    that information to build predictive models. Alternatively, it can be defined as the process of data selection and

    exploration and building models using vast data stores to uncover previously unknown patterns. Data mining is not

    newit has been used intensively and extensively by financial institutions, for credit scoring and fraud detection;

    marketers, for direct marketing and cross-selling or up-selling; retailers, for market segmentation and store layout;

    and manufacturers, for quality control and maintenance scheduling.

    In healthcare, data mining is becoming increasingly popular, if not increasingly essential. Several factors have

    motivated the use of data mining applications in healthcare. The existence of medical insurance fraud and abuse, for

    example, has led many healthcare insurers to attempt to reduce their losses by using data mining tools to help them

    find and track offenders. Fraud detection using data mining applications is prevalent in the commercial world, for

    example, in the detection of fraudulent credit card transactions. Recently, there have been reports of successful data

    mining applications in healthcare fraud and abuse detection. Another factor is that the huge amounts of data

    generated by healthcare transactions are too complex and voluminous to be processed and analyzed by traditional

    methods. Data mining can improve decision-making by discovering patterns and trends in large amounts of complexdata. Such analysis has become increasingly essential as financial pressures have heightened the need for healthcare

    organizations to make decisions based on the analysis of clinical and financial data. Insights gained from data

    mining can influence cost, revenue, and operating efficiency while maintaining a high level of care.

  • 7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries

    2/7

    International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,

    ISSN: 1837-7823

    11

    2. Data Mining in Medical Data Bases

    Data mining is an essential step of knowledge discovery. In recent years it has attracted great deal of interest in

    Information industry. Knowledge discovery process consists of an iterative sequence of data cleaning, data

    integration, data selection, data mining pattern recognition and knowledge presentation. In particulars, data mining

    may accomplish class description, association, classification, clustering, prediction and time series analysis. Datamining in contrast to traditional data analysis is discovery driven. Data mining is a young interdisciplinary field

    closely connected to data warehousing, statistics, machine learning, neural networks and inductive logic

    programming.

    Fig 1 Data Base ArchitectureData mining provides automatic pattern recognition and attempts to uncover patterns in data that are difficult to

    detect with traditional statistical methods. Without data mining it is difficult to realize the full potential of data

    collected within healthcare organization as data under analysis is massive, highly dimensional, distributed and

    uncertain. Massive healthcare data needs to be converted into information and knowledge, which can help control,

    cost and maintains high quality of patient care. Healthcare data includes Patient centric data and Aggregate data. For

    health care organization to succeed they must have the ability to capture, store and analyze data (Fig.1) Online

    analytical processing (OLAP) provides one way for data to be analyzed in a multi-dimensional capacity. With the

    adoption of data warehousing and data analysis/OLAP tools, an organization can make strides in leveraging data for

    better decision making. Many healthcare organizations struggle with the utilization of data collected through an

    organization online transaction processing (OLTP) system that is not integrated for decision making and pattern

    analysis. For successful healthcare organization it is important to empower the management and staff with data

    warehousing based on critical thinking and knowledge management tools for strategic decision making. Data

  • 7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries

    3/7

    International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,

    ISSN: 1837-7823

    12

    warehousing can be supported by decision support tools such as data mart, OLAP and data mining tools. A data mart

    is a subset of data warehouse. It focuses on selected subjects. Online analytical processing (OLAP) solution provides

    a multi-dimensional view of the data found in relational databases. With stored data in two dimensional format

    OLAP makes it possible to analyze potentially large amount of data with very fast response times and provides the

    ability for users to go through the data and drill down or roll up through various dimensions as defined by the data

    structure.

    3. Solving Business Issues

    The SAS data mining solution has been used in health care to overcome a wide range of business issues and

    problems. Some of these include:

    Segmenting customers/patients accurately into groups with similar health patterns. Rapidly identifying who are the most profitable customers and the underlying reasons. Understanding why customers leave for competitors (attrition, churn analysis). Planning for effective information systems management. Preparing for demand of resources. Anticipating customers/patients future actions, given their history and characteristics. Predicting medical diagnosis. Forecasting treatment costs. Predicting length of stay in a hospital. Identifying medical procedure expenditures and utilization by analyzing claims and point-of-care data. Predicting total cost of patient care.

    Once the business problems have been defined and agreed upon, the next logical step is to determine the type and

    amount of data that will be necessary for making business decisions. As a precursor to data mining, a data

    warehouse strategy and implementation is suggested. Integration with SAS software gives the SAS data mining

    solution several distinguishing characteristics which allow faster, easier and more accurate conversion of data into

    knowledge useful to decision makers.

    Data diversityThe SAS data mining solution is designed to accept a wider range of data formats than any other data mining

    product currently on the market. It will accept data from relational and hierarchical databases, flat files, and other

    data formats, and it will accept this data from all major hardware platforms.

    Distributed client/serverThe SAS data mining solution supports both the data server model of client/server computing, in which data

    located on a remote machine can be accessed, and the compute server model, which allows data to be processed

    on a remote server and then forwarded to a client. This is particularly well suited to analytical tasks involving large

    volumes of data that require superior processing capabilities.

    Consistent implementation on multiple platformsThe SAS data mining solutions are fully integrated with SAS Software and give users the flexibility to use their

    platforms of choice, ranging form desktop machines to powerful servers.

    Integrated data management

    SAS softwares data management facilities guarantee data integrity without the need for re-keying or additional

    validation of data. No other data mining solution includes seamless integration with such a comprehensive range of

    data management functionality. Once the business objectives and data issues have been resolved, the methodology

    and approach to data mining can begin.

  • 7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries

    4/7

    International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,

    ISSN: 1837-7823

    13

    4. Data Mining Challenges

    Efficiency and scalability of data mining algorithms Parallel, distributed, stream, and incremental mining methods Handling high-dimensionality Handling noise, uncertainty, and incompleteness of data Incorporation of constraints, expert knowledge, and background knowledge in data mining Pattern evaluation and knowledge integration Mining diverse and heterogeneous kinds of data: e.g., bioinformatics, Web, software/system engineering,

    information networks

    Application-oriented and domain-specific data mining Invisible data mining (embedded in other functional modules) Protection of security, integrity, and privacy in data mining

    5. Semma Methodology

    The methodology and approach that SAS Institute proposes is referred to as SEMMA, for Sample,Explore,Modify,

    Model, andAssess. Beginning with a statistically representative sample of data, users can apply exploratory

    statistical and visualization techniques, select and transform the most significant predictive variables, model the

    variables to predict outcomes, and affirm the models accuracy.

    Sample

    The first step is to extract a portion of a large data set big enough to contain the significant information yet small

    enough to manipulate quickly.

    Explore

    This phase involves searching speculatively for unanticipated trends and anomalies so as to gain understanding and

    ideas. This can reveal which subset of attributes will be the most productive to work with the modeling phase. Data

    visualization delivers intuitive tools for business professionals, while statistical techniques offer added detail for

    specialist.

    Modify

    The insights that are gained from the exploration phase enable knowledge workers to group the most productive

    subsets and clusters of data together for further analysis and exploration.

    Model

    This process involves searching automatically for a variable combination that reliably predicts a desired outcome.

    Data mining techniques such as neural networks, tree-based models, and traditional statistical techniques can help

    reveal patterns in the data and provide a best-fitting predictive model.

    Assess

    During this evaluation process, assessment of the results gained from modeling provides indications as to which

    results should be conveyed to senior management, how to model new questions that have been raised by the

    previous results and thus proceed back to the exploration phase.

    SEMMA is a process that allows SAS Institute to distinguish ourselves by being the only vendor that can offer all of

    these components, as well as the ability to seamlessly integrate them with a companys existing hardware and

    software strategy.

  • 7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries

    5/7

    International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,

    ISSN: 1837-7823

    14

    Fig.2 Semma Methodology

    6. Data Mining Techniques

    There are various data mining techniques available with their suitability dependent on the domain application.

    Statistics provide a strong fundamental background for quantification and evaluation of results. However, algorithms

    based on statistics need to be modified and scaled before they are applied to data mining. We now describe a few

    Classification data mining techniques with illustrations of their applications to healthcare.

    Rule set classifiers

    Complex decision trees can be difficult to understand, for instance because information about one class is usually

    distributed throughout the tree. An alternative formalism consisting of a list of rules of the form if A and B and C

    and ... then class X, where rules for each class are grouped together. A case is classified by finding the first rule

    whose conditions are satisfied by the case; if no rule is satisfied, the case is assigned to a default class

    IF conditions THEN conclusion

    This kind of rule consists of two parts. The rule antecedent (the IF part) contains one or more conditions about value

    of predictor attributes where as the rule consequent (THEN part) contains a prediction about the value of a goal

    attribute. An accurate prediction of the value of a goal attribute will improve decision-making process. IF-THEN

    prediction rules are very popular in data mining; they represent discovered knowledge at a high level of abstraction.

    In the health care system it can be applied as follows:

    (Symptoms) (Previous--- history) ---->

    (Causeof--- disease)

    Example 1: If_then_rule induced in the diagnosis of level of

    alcohol in blood

    IF Sex = MALE AND Unit = 8.9 AND Meal = FULL

    THEN

  • 7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries

    6/7

    International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,

    ISSN: 1837-7823

    15

    Diagnosis=Blood_alcohol_content_HIGH.

    Decision Tree algorithms

    Decision tree include CART (Classification and Regression Tree), ID3 (Iterative Dichotomized 3). These algorithms

    differ in selection of splits, when to stop a node from splitting, and assignment of class to a non-split node. CART

    uses Gini index to measure the impurity of a partition or set of training tuples. It can handle high dimensional

    categorical data. Decision Trees can also handle continuous data (as in regression) but they must be converted to

    categorical data.

    Neural Network Architecture

    The architecture of the neural network used in this study is the multilayered feed-forward network architecture with

    20 input nodes, 10 hidden nodes, and 10 output nodes. The number of input nodes is determined by the finalized

    data; the number of hidden nodes is determined through trial and error and the number of output nodes is

    represented as a range showing the disease classification. The most widely used neural-network learning method is

    the BP algorithm. Learning in a neural network involves modifying the weights and biases of the network in order to

    minimize a cost function. The cost function always includes an error term a measure of how close the network's

    predictions are to the class labels for the examples in the training set. Additionally, it may include a complexity term

    that reacts to a prior distribution over the values that the parameters can take. Neural networks have been proposedas useful tools in decision making in a variety of medical applications. Neural networks will never replace human

    experts but they can help in screening and can be used by experts to double-check their diagnosis. In general, results

    of disease classification or prediction task are true only with a certain probability.

    Neuro-Fuzzy

    Stochastic back propagation algorithm is used for the construction of fuzzy based neural network. The steps

    involved in the algorithm are as follows: First, initialize weights of the connections with random values. Second for

    each unit compute net input value, output value and error rate. Third, to handle uncertainty for each node, certainty

    measure (c) for each node is calculated. Based on the certainty measure the decision is made. The level of the

    certainty is computed using the following conditions.

    a. If 0.8 \< c 1, then there exists very high certaintyb. If 0.6 \< c 0.8, then there exists high certaintyc. If 0.4 \< c 0.6, then there exists average certainty

    d. If 0.1 \< c 0.4, then there exists less certainty

    e. If c 0.1, then there exists very less certainty

    The network constructed consists of 3 layers namely an input layer, a hidden layer and an output layer. Sample

    trained neural network consisting of 9 input nodes, 3 hidden nodes and 1 output node is shown in Figure 2. When a

    thrombus or blood clot occupies more than 75% of surface area of the lumen of an artery then the expected result

    may be a prediction of cell death or heart disease according to medical guidelines i.e. R is generated with reference

    to the given set of input data.

    7. Summary

    The effective use of information and technology is crucial for health care organizations to stay competitive in

    todays complex, evolving environment. The challenges faced when trying to make sense of large, diverse, and often

    complex data source are considerable. In an effort to turn information into knowledge, health care organizations are

    implementing data mining technologies to help control costs and improve the efficacy of patient care. Data mining

    can be used to help predict future patient behavior and to improve treatment programs. By identifying high-risk

    patients, clinicians can better manage the care of patients today so they do not become the problems of tomorrow.

  • 7/28/2019 An Implementation of Data Mining Techniques in Health Care Industries

    7/7

    International Journal of Computational Intelligence and Information Security, May 2013, Vol. 4 No. 5,

    ISSN: 1837-7823

    16

    We studied the problem of constraining and summarizing different algorithms of data mining. We focused on using

    different algorithms for predicting combinations of several target attributes. Finally we conclude that if we use

    proper data mining algorithms in health care industry we produce better results and it can be used to prevent several

    diseases.

    8. References

    [1] Shams, K. and M. Frashita, 2001. Data Warehousing Toward Knowledge Management. Topics in Health

    Information Management, 21: 3.

    [2] Jones, A.W., 1990. Physiological Aspects of Breath-Alcohol Measurements. Alcohol Drugs Driving, 6:1-25.

    [3] Han, J. and M. Kamber, 2001. Data Mining: Concepts and Techniques. San Francisco, Morgan Kauffmann

    Publishers.

    [4] Veletsos, A. (2003). Getting to the bottom of hospital finances. Health Management Technology, 24(8), 30-31.

    [5] Dakins, D.R. (2001). Center takes data tracking to heart. Health Data Management, 9(1), 32-36.

    [6] Johnson, D.E.L. (2001). Web-based data analysis tools help providers, MCOs contain costs. Health Care

    Strategic Management, 19(4), 16-19.

    [7] Schuerenberg, B.K. (2003). An information excavation.Health Data Management, 11(6), 80-82.

    [8] Piazza, P. (2002). Health alerts to fight bioterror. Security Management, 46(5), 40.[9] Brewin, B. (2003). New health data net may help in fight against SARS. Computerworld, 37(17), 1, 59.

    [10] Paddison, N. (2000). Index predicts individual service use. Health Management Technology, 21(2), 14-17.

    [11] Johnston G. System adds to biodefense readiness. Bio-IT World. November 1, 2002. Available at www.bio-

    itworld.com/ news/110102_report1436.html. Accessed July 21, 2004.

    [12] Jiawei Han, Micheline Kamber. Data mining concepts and techniques. Morgan Kaufmann Publishers. ISBN

    1055860-489-8

    [13] Philip Baylis et al. Better health care with data mining, Clementine working with health care. SPSS white

    paper. Shared Medical Systems Limited, UK

    [14] Kristin B. Degrug, MSHS. Healthcare Applications of Knowledge Discovery in Databases. Journal of

    Healthcare Information Management, Vol. 14, no. 2, Summer 2000.