CONSUMER RELATION MANAGEMENT OF .NET AND J2EE USING ... fileSALEM, TAMILNADU, INDIA February 2014 ....
Transcript of CONSUMER RELATION MANAGEMENT OF .NET AND J2EE USING ... fileSALEM, TAMILNADU, INDIA February 2014 ....
i
CONSUMER RELATION MANAGEMENT OF .NET AND J2EE
USING BUSINESS INTELLIGENCE
Thesis submitted in partial fulfillment for the award of
Degree of Doctor of Philosophy in INFORMATION TECHNOLOGY
By
VANGALA V NARENDRA KUMAR
Guide
Dr. R.S.D.WAHIDA BANU, Ph.D.
VINAYAKA MISSIONS UNIVERSITY
SALEM, TAMILNADU, INDIA
February 2014
ii
VINAYAKA MISSIONS UNIVERSITY
DECLARATION
I, VANGALA V NARENDRA KUMAR, declare that the thesis entitled
CONSUMER RELATION MANAGEMENT OF .NET AND J2EE USING
BUSINESS INTELLIGENCE submitted by me for the Degree of Doctor of
Philosophy is the record of work carried out by me during the period from
JULY 2008 to FEBRUARY 2014 under the guidance of Dr.R.S.D.WAHIDA
BANU, PRINCIPAL GOVT. COLLEGE OF ENGINEERING, SALEM and that
not formed the basis for the award of any degree, diploma, associateship,
fellowship, titles in this or any other University or other similar institutions of
higher learning.
Place: Signature of the Candidate
Date:
iii
VINAYAKA MISSIONS UNIVERSITY
CERTIFICATE BY THE GUIDE
I, Dr. R.S.D.WAHIDA BANU, PRINCIPAL, GOVT. COLLEGE OF
ENGINEERING, SALEM, TAMILNADU certify that the thesis entitled
CONSUMER RELATION MANAGEMENT OF .NET AND J2EE USING
BUSINESS INTELLIGENCE submitted for the Degree of Doctor of Philosophy
by Mr. VANGALA V NARENDRA KUMAR is the record of research work
carried out by him/her during the period from JULY 2008 to FEBRUARY
2013 under my guidance and supervision and that this work has not formed
the basis for the award of any degree, diploma, associateship, fellowship or
other titles in this University or any other University or Institution of higher
learning.
Place: Signature of the Supervisor with designation
Date:
iv
ACKNOWLEDGEMENT
This thesis is a milestone in my career. I express my sincere gratitude to my
Supervisor Dr.R.S.D. Wahidabanu madam, Principal Govt. College of
Engineering, Salem, Tamilnadu, who guided me throughout this research.
This research would not have been possible without her valuable help and
support. I would also thank Prof.Dr.K.Rajendran, Dean, Research, and the
committee members of Vinayaka Missions University, Salem, Tamilnadu for
providing the opportunity to do this research.I express my profound
gratefulness to Dr.Banda Prakash, Secretary and Correspondent, Alluri
Institute of Management Sciences, Warangal, Andhra Pradesh, for
encouraging me to do this work and providing all the necessary facilities in the
institute. I am eternally grateful to the remarkable contributions of our fellow
faculty Dr.G.Ravi, Mr.Md.Nayeemuddin, Mr.K.Ravi, Mr.K.Anil Kumar and also
to all the software executives, students, faculty, reliance fresh consumers and
staff for extending their support in conducting various research studies. I am
grateful to each and everyone who supported me throughout this research.
Finally I am thankful to my entire family for their encouragement and
emotional support to complete my thesis. I bestow this entire effort to Lord Sri
Venkateswara of Seven Hills.
Vangala V Narendra Kumar
v
TABLE OF CONTENTS
TITLE PAGE NO.
LIST OF FIGURES viii
LIST OF TABLES x
LIST OF ABBREVIATIONS xii
LIST OF SYMBOLS USED xv
ABSTRACT 1
CHAPTER
1. INTRODUCTION 2-13
1.1. Overview 2
1.2. Consumer Relation Management 4
1.3. .NET Platform 5
1.4. J2EE Platform 6
1.5. Business Intelligence 8
1.6. Need for the Study 9
1.7. Objective of Research 10
1.8. Methodology 10
1.9. Organization of thesis 11
2. REVIEW OF LITERATURE 14-23
2.1. Related Research in CRM and Data Mining 14
2.2. CRM software in IT sector 19
2.3. Statistics supporting the use of .NET and J2EE in software
industry
21
2.4. Business Intelligence and CRM 22
2.5 Summary 23
3. CONSUMER RELATION MANAGEMENT OF .NET AND J2EE USING BUSINESS INTELLIGENCE
24-81
3.1 Overview 24
vi
3.2. Consumer Relation Management 24
3.2.1. Consumer Satisfaction 24
3.2.2. Consumer Loyalty 25
3.2.3. Mobile CRM 27
3.2.4. CRM in Cloud Computing 27
3.2.5. Big data and CRM – future CRM 29
3.3. The .NET Framework 30
3.3.1.The Common Language Runtime (CLR) 33
3.3.2. Microsoft Intermediate Language (MISL) 34
3.3.3. Common Type System (CTS) 34
3.3.4. .NET Framework Class Library 35
3.3.5. ASP.NET Web Services 36
3.4. The J2EE Framework 37
3.4.1. The J2EE Platform 37
3.4.2. The J2EE Runtime 38
3.4.3. The J2EE APIs 38
3.4.4. J2EE Technologies 40
3.4.5. Java Server Pages 41
3.4.6. J2EE Service Technologies 42
3.4.7. Some popular Java editors and IDEs 44
3.5. A Comparative Study of .NET and J2EE 45
3.6. Business Intelligence and Data Mining 47
3.6.1. Association Rules 50
3.6.2. Association Rule Mining 51
3.6.3. Apriori Algorithm 51
3.6.4. Implementation of Apriori Algorithm 53
3.6.5. Generating Association Rules from Frequent Item sets 55
3.6.6. Correlation Analysis 57
3.6.7. k-means Clustering 57
3.6.8. Fuzzy Set Approach 60
3.7. Rough Set Theory 61
vii
3.7.1. Approximations 62
3.7.2. Reduction and Significance of Attributes and
Approximation Reducts
64
3.8. Scapegoat Trees and Max Heaps 66
3.8.1. Scapegoat trees 66
3.8.2. Max-Heaps 72
3.8.2.1. Building a Heap 73
3.8.2.2. Cost of Building a Heap 74
3.8.2.3. Heap Sort 75
3.8.2.4. Cost of Heap Sort 75
3.9. MUlticriteria Satisfaction Analysis (MUSA) 76
3.9.1. Satisfaction Indices 78
3.9.3. Demanding Indices 79
3.10. Summary 81
4. METHODOLOGY 82-106
4.1. Overview 82
4.2. Sampling procedures 82
4.3. Data Collection Techniques 83
4.4. Research Methodology 87
4.5. Tools Used 105
5. RESULTS AND DISCUSSION 107-159
6. CONCLUSIONS AND FUTURE WORK 160-165
6.1. Conclusions 160
6.2. Limitations of the Study 163
6.2. Future Work and Suggestions 164
7. BIBLIOGRAPHY / REFERENCES 166-179
List of Publications 177
viii
LIST OF FIGURES
FIGURE TITLE PAGE NO
1.1 Java Language Environment 7
3.1 .NET framework architecture 31
3.2 Web services protocol stack 36
3.3 The J2EE framework 37
3.4 Architecture of a JSP page 42
3.5 Generating frequent item sets with min_sup 2 using
Apriori Algorithm
54
3.6 A scapegoat tree with 10 nodes and height 5 69
3.7 Finding a scapegoat and inserting 7 at node 5 69
3.8 A complete binary tree depicting max-heap 72
3.9 Building a heap 73
3.10 Cost of building a heap 74
4.1 Consumer Profile form 83
4.2 Survey questionnaire for main satisfaction criteria 84
4.3 Survey Questionnaire form for sub criteria satisfaction 85
4.4 Consumer service survey Form 86
4.5 k-means clustering algorithm 90
4.6 KCUSTMH-(k-means clustering using scapegoat trees
and max heaps)
94
4.7 Apriori algorithm for finding frequent item sets 95
ix
4.8 Psuedo code to implement Apriori algorithm 96
4.9 Flowchart of MUSA method integrating with rough set
theory
98
4.10 Consumer satisfaction & Consumer loyalty obtained from a scale
99
5.1 Preferred communication channels, “Media mix”, by consumers
107
5.2 Consumer transaction data in .csv file format 117
5.3 Run information of k-means clustering performed on
gender attribute using Weka
119
5.4 Graph showing efficiency of KCUSTMH vs. traditional k-means algorithm
122
5.5 Clustering association rules using 2-D grid 124
5.6 Run information of Weka apriori algorithm implementation
126
5.7 Dashboard projecting the behavior of the sales of different items
128
x
LIST OF TABLES
TABLE TITLE PAGE NO
2.1 List of top 10 open-sources CRM software 20
3.1 Overview of .NET Framework release history 32
3.2 Comparative study on the various features of .NET
and J2EE platforms
46
3.3 Limitations and missing capabilities of .NET vs. J2EE 47
3.4 Sample consumer Transactions 54
3.5 Consumer transactions in vertical data format 55
5.1 Responses summary on J2EE and .NET platforms 109
5.1.A Overall satisfaction analysis 109
5.1.B Group-wise % satisfaction 109
5.2 User satisfaction opinion of J2EE and .NET 110
5.3 Memory utilization and response time of ASP.NET and
J2EE
116
5.4 Consumer data segmented into 6 clusters 121
5.5 Survey data of sample 20 consumers on global criteria 131
5.6 Overall satisfaction results for global criteria 132
5.7 Criteria satisfaction results 135
5.8 Sample Consumer data with condition and decision attributes
137
5.9 Nominal values of the sample consumer data 138
xi
5.10 Sample consumer data organized w.r.t. decision
attribute
140
5.11 Reduct of Information of sample consumer data 145
5.12 Analysis of condition attributes with Personnel criteria 146
5.13 Analysis of condition attributes with Product citeria 147
5.14 Analysis of Physical Appearance attributes 148
5.15 Analysis of attributes Personnel and Product 149
5.16 Analysis of Personnel and Physical Appearance 150
5.17 Analysis of Attributes Product and Physical Appearance 151
5.18 Reduct of Personnel and Product 152
5.19 Reduct of Personnel and Physical Appearance 152
5.20 Reduct of Product and Physical Appearance 153
5.21 Final reduct information of the sample consumer data 153
5.22 Sample data set transaction format 155
5.23 Opinion of consumer service survey 158
xii
LIST OF ABBREVIATIONS
ABBREVIATION .NET Network Enabled Technology
4P’S Personnel, Product, Place, Physical Appearance
ADI Average Demanding Index ADO Active X Data Objects AJAX Asynchronous JavaScript and XML API Application Programming Interface APT Automatically Programmed Tool ARFF Attribute-Relation File Format ASI Average Satisfaction Index ASP Active Server Pages AVL Adelson-Velskii and Landis' AWT Abstract Window Toolkit BCL Base Class Library BI Business Intelligence
BIRCH Balanced Iterative Reducing and Clustering using Hierarchies
BPM Business Process Management BST Binary Search Tree CAAS /CRAAS Customer Relationship As Service Software CBT Complete Binary Tree CLR Common Language Runtime CLV Consumer Life Time Value COBOL Common Business Oriented Langugae CORBA Common Request Broker Architecture CPU Central Processing Unit CRM Consumer Relation Management CSS Cascading Style Sheets CSV Comma Separated Values CTS Common Type System CURE Clustering Using REpresentatives DBMS Database Management Systems DLL Dynamic Link Libraries EJB Enterprise Java Beans ELKI Environment for DeveLoping KDD-Applications
Supported by Index-Structures ERP Enterprise Resource Planning ETL Extract Transform Load
xiii
FOSS Free and Open Source Software GC Gross Contribution GUI Graphical User Interface HTML Hyper Text Markup Language IDC International Data Corporation IDE Integrated Development Environment IDL Interface Definition Language IL Intermediate Language IMAP Internet Mail Access Protocol ISS Intelligent Software Solutions IT Information Technology J2EE Java 2 Platform, Enterprise Edition J2ME Java 2 Platform Micro Edition J2SE Java 2 Platform Standard Edition JAAS Java Authentication and Authorization Service JAF JavaBeans Activation Framework JAXP Java API For XML Parsing JCA Java Connector Architecture JDBC Java Database Connectivity JDK Java Development Kit JIT Just In Time JMS Java Message Service JMX Java Mail Xtension JNDI Java Naming Directory Interface JRE Java Runtime Environment JSP Java Server Pages JTA Java Transaction API JVM Java Virtual Machine KDD Knowledge Discovery From Data KNIME Konstanz Information Miner LDAP Light Weight Directory Access Protocol MB Mega Bytes MISL Microsoft Intermediate System Language MOM Message-Oriented Middleware MS Microsoft MSEC Milli Seconds MSIL Microsoft Intermediate Language MSMQ Microsoft Message Queue MUSA Multicriteria Satisfaction Analysis MVC Model View Controller MYSQL My Structured Query Language NT Network Technology OLAP Online Analytical Processing OLAP Online Analytical Processing
xiv
OSS Open Source Software PC Personnel Computer PERL Program Evaluation and Report Language PHP Hypertext Pre Processor
PL/SQL Procedural Language/Structured Query Language
POP Post Office Protocol R2 Release 2 RFID Radio Frequency Identification RMI Remote Method Invocation SAAS Software as a Service SCAVIS Scientific Computation and Visualization
Environment SCM Supply Chain Management SGT Scapegoat Tree SMS Short Message Service SMTP Simple Mail Transfer Protocol SOAP Service Oriented Architecture Protocol SPSS Statistical Package for Social Sciences UDDI Universal Description, Discovery, and
Integration VB Visual Basic VJ# Visual Java Sharp W3 World Wide Web WCF Windows Communication Foundation WEKA Waikato Environment for Knowledge Analysis WORA Write Once Run Anywhere WPF Windows Presentation Foundation WSDL Web Services Description Language XHTML eXtensible Hyper Text Markup Language XML eXtensible Markup Language XSLT eXtensible Stylesheet Language
Transformations
xv
LIST OF SYMBOLS USED
Correlation Coefficient, Chi square
set membership
Intersection or wedge product
ρ Spearman correlation coefficient
ᴛ Kendall Coefficient
⊆ subset
⋂ Set- theoretic intersection
=> implies
⊂ subset
P probability
Ф or Ø Null set
∑ summation
U Set- theoretic union
γ degree of dependency
σ selection
ε approximate reduct
≤ Less than or equal to
1
ABSTRACT
Consumer buying behaviors often change for various reasons. Advancements
in technology, changing life styles and need are few factors that change
buying and selling of consumer behaviours. Changing behaviours show a
substantial impact on the profitability and survival of business. Hence
companies need to adopt appropriate decision making techniques to
withstand these changing situations. In this research, an attempt was made to
expound the effectiveness of business intelligence techniques on consumer
relation management. Consumer satisfaction and loyalty are two essential
factors in CRM for improving consumer relations and profitability of any
company. To analyse these two factors, consumer data was segmented using
k-means algorithm. Efficiency of k-means algorithm is improved with
scapegoat tree and max heaps which resulted quality clusters. New
opportunities in buying and selling are identified by analysing consumer
buying behaviours with Apriori algorithm. Consumer satisfaction has been
explored on product, personnel, place and physical appearance (4Ps) of the
enterprise integrating MUlti-criteria Satisfaction Analysis (MUSA) and rough
set theory. Fuzzy set theory is deployed to analyse consumer loyalty.
Exploratory study of consumer satisfaction and loyalty resulted in “if…then…”
rules. A comparative study of .NET and J2EE identified J2EE as a preferred
platform to build customised consumer relation management software. This
research established the dominant role of business intelligence and J2EE in
consumer relation management for effective decision making.
2
1. INTRODUCTION
1.1. Overview
Consumer relation management (CRM) identifies a need for maintaining good
consumer relations to increase the profitability of an enterprise. Maintaining
good consumer relations ensures consumer satisfaction and loyalty.
Consumer satisfaction and better service ensures consumers stay loyal to
enterprise. Such consumers are retained for a longer duration. Consumer
retention is essential since acquiring new consumers is a costly process than
retaining old ones.
An extensive study of consumers buying behaviours is important to make
them satisfied with an enterprise (Hsieh Nan-Chen, Chu Kuo-Chung (2009)).
Enterprises need to collect consumer personal profiles and transaction data to
analyse the buying behaviors. Consumer segmentation is done to study their
buying behaviours. Segmentation is performed on demographic factors like
age, gender or on other factors such as geographical location, occasion,
behaviour etc. using various techniques like k-means algorithm, Customer
Segmentation using High Utility Rare Itemset Mining, CSHURI (Pillai Jyothi,
Vyas .O.P., 2012) and so on.
3
Consumer information is stored in databases or data warehouses. Commonly
used databases are MS-Access, My-SQL, MS-SQL Server etc. Based on
factors such as quantity, ease of use and so on, other sources of data like
MS-Excel spreadsheets, comma separated value (.csv) files, attribute-relation
file format (.arff) files etc. are also preferred. Data from these sources is
cleaned, integrated and mined to identify interesting patterns representing
knowledge based on interesting measures. Software tools like Weka (Waikato
Environment for Knowledge Analysis), SPSS (Statistical Package for Social
Sciences) Clementine, Minitab, Intelligent miner, Siebel etc. are utilised for
data analysis.
Data is gathered using CRM software built either with J2EE or .NET or other
similar software. This data is analysed using data mining techniques
(Seddawy Bahgat El Ahmed, Moawad Ramadan, Hana Maha Attia (2010)) of
business intelligence like classification (Zu Qiaohong, Wu Ting, Wang Hui
(2010)), clustering (Jain.A. K., Murty. M. N.,Flynn.P.J. (1999)), fuzzy logic,
decision trees, prediction, neural networks (Meng Qingliang, Kong Qinghua,
Han Yuqi, Chen Jie (2004)) etc. Enterprises use business intelligence to gain
in-depth knowledge on consumer data and thereby help them to take quick
and well-versed decisions. Business intelligence is also used in decision
making, querying, reporting, online analytical processing (OLAP), statistical
analysis and forecasting.
4
Microsoft .NET and Sun J2EE offer exceptional features to build customized
CRM software. Microsoft .Net offers various applications such as Windows
applications, Web Services, mobile applications, cloud applications etc. Its
framework consists of Silverlight, Windows Communication Foundation
(WCF), Windows Presentation Foundation (WPF), AJAX controls, Windows
Azure etc. Similarly, J2EE framework offers different applications like Swings,
Struts, Java Server Faces, AJAX controls, Android applications etc. Both
these platforms provide effective solutions for e-commerce, business process
management (BPM), supply chain management (SCM) etc. These two
platforms are well supported in mobile devices and cloud environment.
These platforms are used in developing various CRM applications and also in
various web services like websites, e-mails, short message services (SMS),
live chat, alerts, dashboards etc.
1.2. Consumer Relation Management
Consumer or Customer Relationship Management (CRM) is a strategy
adopted by companies to acquire, retain, partner, improve consumer loyalty
and provide consumer satisfaction with potential consumers to create value
for consumer and company. Pareto's Principle or the 80/20 rule can be
applied for CRM which states that 20% of satisfied and loyal consumers can
generate more than 80% of revenues. The main elements of CRM include
5
consumer identification, attraction, retention and development. The important
factors of CRM are consumer satisfaction and loyalty.
Advancements in information technology and growing utility of web changed
CRM strategies of the enterprises and buying behaviours of consumers.
Several companies started offering new electronic communication channels
like e-mail, SMS, live chat, e-brochure, e-newsletter etc. for interacting with
consumers. Many CRM systems largely rely on technology. A good CRM
system collects consumer transaction data from different sources and
processes it using data mining techniques of business intelligence. Using this
information, companies analyze consumer buying behaviours for profitability.
CRM dashboards are also utilised to analyse enterprise performance.
Microsoft .NET and Sun J2EE are two major software platforms which offer
technology support to CRM systems.
1.3. .NET Platform
.NET (also referred as Network Enabled Technology) is a software framework
developed by Microsoft that runs mainly on Windows operating system. .NET
includes vast library of various features and provides language interoperability
because of which every language can use the code written in other
6
languages. Programs written for .NET framework execute in software
environment known as Common Language Runtime (CLR). CLR is an
application virtual machine that provides services such as memory
management, exception handling, security etc .NET framework essentially
constitutes class library and CLR together.
.NET framework's Base Class Library (BCL) provides user interface, data
access, database connectivity, cryptography, web application development
and network communications etc. Users develop software by combining
source code with .NET framework and other libraries. .NET framework is
being used by numerous applications created under Windows platform.
Microsoft Visual Studio is an integrated development environment (IDE) used
for .NET software.
1.4. J2EE Platform
Java is a most matured and commonly used programming language for
building enterprise software. Java is meant for developing applets to be run in
browsers. It evolved as a programming model capable of driving enterprise
applications. To address distinct set of programming needs, Java has three
different platform editions namely Java 2 Platform Standard Edition (J2SE),
Java 2 Platform Enterprise Edition (J2EE) and Java 2 Platform Micro Edition
7
(J2ME).Out of these J2SE is the most commonly used form of Java
technology. J2SE is usually referred to as Java Development Kit (JDK). Figure
1.1 shows the architecture of Java language environment.
Figure 1.1 Java Language Environment
J2EE replaced several proprietary and non-standard technologies as a
preferred choice for building e-commerce and other web-based enterprise
applications like BPM, SCM etc., other alternative being Microsoft’s .NET
based technologies. Sun and its associates made Java a credible platform for
distributed applications.
8
1.5. Business Intelligence
Companies implement business intelligence to analyse their performance and
create effective strategies to withstand their competitors (Habul.A.(2010)).
Business intelligence techniques are implemented to improve business
processes, strengthen consumer relations and collaborations for profitability of
enterprise. Business intelligence helps enterprise not only in making rapid and
improved decisions but also in identifying various business challenges and
opportunities. With business intelligence enterprises create, manage and
deliver valuable reports on internet.
In business intelligence, data is gathered from various sources and stored in a
database or data warehouse. In a database, data is stored in the form of
tables whereas in data warehouse data is stored in data cubes. Enterprises
analyse this data for making better decisions. Business intelligence consists of
various activities like decision making, querying, reporting, online analytical
processing, statistical analysis, forecasting etc. Business intelligence applies
data mining techniques like classification, clustering, decision tress, prediction,
neural networks etc. for analysing data to provide visibility, clarity and better
insight. One main objective of business intelligence is to improve timeliness
and quality of information so as to keep businesses on right track for all times.
Business Intelligence is very useful especially during economical downturns.
9
1.6. Need for the study
Buying behaviours of consumers often change depending on life style. Varied
buying behaviours show significant impact on profitability of enterprise which
in turn leads to economic downturns of the enterprise. Market conditions are
changing day by day and many competitors are entering into consumer
market. These changing market conditions pose a challenge for enterprises to
withstand stiff competition from other companies. Hence enterprises should
implement suitable techniques to overcome various challenges that arise in
consumer market. It is found that business intelligence provide enterprises a
variety of solutions to such environments.
The role of CRM software in maintaining consumer data is very important for
the reason that an enterprise should store precise data to take important
decisions and generate accurate reports. Therefore there is a need of suitable
software platform to build customised CRM packages and analyse this data.
Hence in this research, .NET and J2EE software platforms are considered to
build customised CRM packages and business intelligence techniques are
adopted to analyse consumer data.
10
1.7. Objective of this research
In this research an attempt was made to analyse the following objectives of
consumer relation management. This research is aimed to
1. Identify better communication channels, “Media mix”, for new
selling and buying opportunities, effective reach and quality net
working.
2. Create techniques for maintaining database of consumers with
J2EE or .NET software.
3. Identify consumer requirements and opportunities that facilitate
increase in terms of profit margins, revenues, buying and selling.
4. Improve consumer service, online consumer support systems and
to build loyalty to position as a competitive advantage.
1.8. Methodology
After identifying these objectives, consumer survey was conducted on
preferred “Media mix”. Opinion of consumers who are using .NET and J2EE is
explored on various factors and J2EE was adjudged as users’ choice to build
required CRM software. Then, business intelligence techniques were
implemented to explore consumer data. Consumer segmentation was done
using k-means clustering. Consumer buying behaviors were analysed using
Apriori algorithm. MUlticriteria Satisfaction Analysis (MUSA) along with rough
set theory was implemented to analyse consumer satisfaction and fuzzy
11
classification for analysing consumer loyalty. k-means and Apriori algorithms
were implemented using open source data mining tool Weka. Further an
attempt was made to improve efficiency of some of these algorithms to
implement them on large databases.
1.9. Organisation of the thesis
Chapter 1- Introduction - In this chapter importance of consumer relation
management is discussed. An outline of .NET and J2EE platforms is given.
The role of business intelligence and various data mining techniques along
with methodology implemented in this research is briefed. It also cites the
need for this research.
Chapter 2 - Review of Literature- This chapter is divided into 3 sub sections.
In first subsection various data mining techniques of business intelligence
used in CRM in various research studies are described. It also mentions the
use of MUSA method in analysing consumer satisfaction. The next subsection
identifies the various CRM software used in IT sector. In third subsection
statistics supporting use of .NET and J2EE in software industry and also the
present and future prospects of BI in CRM industry are given.
12
Chapter 3 - Consumer relation management of .NET and J2EE using
business intelligence – This chapter elaborately explains various aspects
used in this research. It explains the importance of consumer satisfaction and
loyalty in CRM. Features of .NET and J2EE for building enterprise applications
are also elaborated. Apart from that, this chapter clearly explains various data
mining techniques of BI like k-means algorithm, association rules and Apriori
algorithm. This chapter also explains MUSA method, rough set theory, fuzzy
set theory, max-heaps and scapegoat trees used in this research.
Chapter 4 – Methodology - In this chapter research methodology that is
implemented to study identified objectives of CRM is mentioned. This chapter
also contains how and where survey is conducted to identify “Media mix”,
consumer opinion on .NET and J2EE in building customized CRM software
and consumer service. Apart from the survey process this chapter also
mentions methodology that is involved in improving efficiency k-means
algorithm and integration technique of MUSA method and rough set theory to
understand consumer satisfaction in an exploratory way.
Chapter 5 – Results and Discussion- This chapter elaborately explains
process involved in solving identified objectives of CRM using surveys; data
mining techniques of business intelligence like consumer segmentation using
k-means algorithm and Apriori algorithm for analysing consumer behaviors;
13
Integration of MUSA and rough set theories for an extensive study of
consumer satisfaction; implementation of fuzzy techniques to analyse
consumer loyalty and finally the consumer service survey. The proposed
KCUSTMH algorithm is also outlined for increasing the efficiency of k-means
algorithm for large databases.
Chapter 6 – Conclusions and Future Work- This chapter summarises
findings of this research in the order of identified objectives and its limitations.
It also mentions future scope of study and trends that are likely to dominate
CRM sector.
At the end, references that are used in this research are mentioned in
Bibliography. Appendix contains sample screen shots, dash boards that are
utilised in decision making process and sample programming code that is
used in this research
14
2. REVIEW OF LITERATURE
2.1. Related Research in CRM and Data Mining
Consumer satisfaction was judged using preference disaggregation of ordinal
values popularly known as MUlticriteria Satisfaction Analysis (MUSA) method.
MUSA method was adopted using preference disaggregation model following
the principles of ordinal regression analysis (Grigoroudis.E.,. Siskos.Y.
Christina Diakaki (2001)). This integrated methodology evaluated the
satisfaction level of a set of individuals like customers, employees, etc. based
on their values and expressed preferences. (Siskos Yannis, Grigoroudis
Evangelos (2002)). Using this satisfaction survey data, MUSA method
aggregated different preferences into unique satisfaction functions. This
aggregation and disaggregation process was achieved with minimum possible
errors. The main advantage of MUSA method was that it fully considered the
qualitative form of customer’s judgments and preferences. Development of a
set of quantitative indices and perceptual maps made it possible to evaluate
consumer satisfaction. Finally reliability analysis made MUSA method
undisputable. Consumer satisfaction studies were also conducted using
several other techniques (Ling Amy Poh Ai, Saludin Mohamad Nasir,
Mukaidono Masao,(2012)).
15
Data mining techniques like k-means clustering which was originally proposed
by James MacQueen (1967); Hugo Steinhans, (1957); Forgy. E.W. (1965);
Hartigan, Wong (1975/79) was used for consumer segmentation (Seddawy
Bahgat El Ahmed, Moawad Ramadan, Hana Maha Attia (2010)). Using
Association rules (Rakesh Agarwal et al (1993,1994,1995)) consumer buying
behaviors and their requirements were analysed by various researches till
date. Apriori algorithm finds frequent itemsets in market basket analysis (Teng
Shaohua, Su Jiangyu, Zhang Wei, Fu Xiufen, Chen Shuqing, China. P. R.
(2009)). Several studies were also conducted to improve their efficiencies.
Consumer segmentation using k-means clustering (Russell K.H., Ching, Chen
Ja-Shen, Lin Yi-Shen ( 2002)) was time consuming for large databases. Its
efficiency got reduced due to frequent calculation of euclidean distances to
cluster data. Efficiency of this method was enhanced by using Red-Black trees
originally proposed by Rudolf Bayer (1972); Leonida Guibas.J. & Robert
Sedgewick (1978); and min heaps (Rajeev Kumar, Rajeswar Puran, & Joydip
Dhar (2011)). k-means clustering algorithm was implemented using Red-Black
trees and min heaps in order to reduce the number of iterations of k-means
algorithm which occur because of the repeated calculation of distances to find
cluster centroids (Yuan.F, Meng.Z.H, Zhang.H. X .and Dong.C.R.(2004)). Use
of these data structures helped to reduce the running time of k-means
algorithm. Implementation of this new algorithm provided quality clusters for
large databases. These data structures are readily available in programming
16
languages like C++ and Java as tree maps. This improved version of k-means
algorithm was superior over traditional one as it improved running time of the
algorithm for large databases.
Only satisfied consumers remain loyal to enterprise and thereby place it in a
competitive advantage position over other firms. Consumer loyalty was
estimated by using fuzzy set theory (Lotfi Zadeh (1975), Isakki.P.,
Rajagoplan.S.P.(2011)). Consumer behaviours were analysed to maintain
good relationships with them (Raorane Abhijit, Kulkarni .R.V. (2011)).
Maximizing consumer satisfaction improves their loyalty and retention. Based
on the previous transactions of consumers, prediction of their buying
behaviours was made and data was analysed using clustering and association
rules. From customer profiles and transaction records segmentation was
performed using k-means algorithm and then Apriori algorithm was applied to
identify consumer behavior which was then followed by identification of
product associations within different consumer segments. Consumer
transaction data was analysed to develop new trend and launch new series of
products (Isakki.P. , Rajagoplan.S.P. (2012)).
Several studies related to tree based approaches for mining frequent itemsets
were conducted on mining of frequent itemsets (Senthil Kumar. A.V.,
Wahidabanu. R.S.D. (2007, 2008)). Effective algorithms for mining association
17
rules were also developed (Senthil Kumar. A.V., Wahidabanu. R.S.D. (2006,
2007)). An improved version of Apriori algorithm using hashing technique was
used to reduce large item sets into candidate 2-itemsets (Vanitha.K.,
Santhi.R.(2011)). A hybrid classification algorithm was proposed for mining
the customer data which was used for decision making (Aurangzeb khan,
Baharum baharudin, Khairullah khan (2011)). Fuzzy data mining was applied
in order to estimate customer loyalty (Chien Hua Wang, Chin Tzong Pang,
(2011)). Several studies were also conducted to analyse the customer buying
behaviours, satisfaction and loyalty using data mining techniques. Hash based
Mining Algorithm for Maximal Frequent Itemsets using Linear Probing were
also developed (Rahman Zubair.A.M.J. Md., Balasubramanie.P., Venkata
Krihsna.P., in (2009), Gangadhara Rao. N.V.B., Sirisha Aguru
,(2012)).Efficient algorithms for mining of frequent items were also developed
by researchers (Vijayarani..S., Ms.Sathya.P.,2013).
“A literature review and classification of application of data mining techniques
in CRM” had given a comprehensive picture of data mining techniques used in
about 87 research articles from 2004 to 2006 (Ngai.E.W.T, Li Xiu,
Chau.D.C.K., (2009)). Among four CRM dimensions namely customer
retention, attraction, identification and development customer retention was
the most common dimension for which data mining was used to support
decision making (54 out of 87 articles, 62.1%). There were 13 articles on
customer identification and customer development covering various
18
aspects of CRM. Of the 54 customer retention articles 51.9% (28 articles) and
44.4% (24 articles) related to one-to-one marketing and loyalty programs
respectively. One-to-one marketing and loyalty programs also ranked first (28
articles out of 87 articles, 32.2%) and second (24 articles out of 87 articles,
27.6%) in terms of subject matter. Data mining and CRM was dealt in these
articles. In one-to-one marketing, 46.4% (13 out of 28 articles) used
association models to analyse customer data, followed by 25.0% (7 out of 28
articles) which used classification models. With regard to loyalty programs
83.3% (20 out of 24 articles) used classification models to assist in decision
making. Among 34 data mining techniques which were applied in CRM, neural
networks were the most commonly used technique. Decision tree and
association rules were described in 21 (24.1%) and 20 (23.0%) articles
respectively. Several studies related to the use of data mning techniques in
CRM like artificial neural networks (Rada Rexhep, Ruseti Bashkim, 2012),
fuzzy evaluation models (Lu Dai, Arun Kumar. S.,(2012), Wang Chien Hua,
Pang Chin Tzong(2012)), were also conducted.
Studies were conducted on mining of association rules using hash based
algorithms (Jong Soo Park, Ming-Syan Chen, Pilip S Yu,(1995)) and on large
databases (Han Jiawei and Yongjian Fu,(1995)). Studies related to fuzzy “if-
then” rules (Ishibuchi.H., Nozaki. K., Yamamoto. N., Tanaka.H,(1995)) and
were also were done and their performance was also evaluated on decision
making (Ishibuchi.H, Nakashima.T., Murata.T.,(1999)) . Several studies
19
related to fuzzy data mining and fuzzy association rules were conducted to
handle data discretisation and continuous attributes (Ishibuchi. H., Yamamoto.
T., Nakashima,.T.,(2001)).
In our research, an attempt was made to study effect of 4P’s namely product,
personnel, physical appearance, and place on consumer satisfaction using
rough set theory (Zdzislaw Palwak.I. (2002)). These were major attributes on
which consumers were pondering upon to express their satisfaction on the
enterprise.
2.2. CRM Software in IT Sector
Open source software (OSS) is widely used to build software applications
including CRM applications. As per Gartner (American information technology
research and advisory firm) report in 2011, 46% used OSS in specific
departments and projects, 22% were adopting OSS consistently in all
departments of a company and 21% were in process of evaluating
advantages of OSS usage.
Few open source CRM websites are mentioned in this research work. These
were developed using either Java or .NET as front end. Source Forge Inc., a
web-based source code repository, lists 369 active open-source CRM projects
20
out of which 10 open source CRM software are on top of the list are shown in
Table 2.1.
Table 2.1 List of top 10 open source CRM software (source:internet)
SNO CRM Software Name
Founded Year
Software Used
1 Sugar CRM Inc.
2004 PHP & MySQL
2 Splendid CRM
2005 .NET2.0 with AJAX & SQL Server (Windows, ISS, SQL Server, C# and ASP)
3 Centric CRM 2007 Java & MySQL
4 Hipergate 2009 Java and JSP, compatible with Microsoft SQL Server, MySQL, Oracle and PostgresSQL
5 Compiere Inc 2006 Java, JavaScript and PL/SQL, and it is compatible with JDBC and Oracle databases
6 Vtiger CRM 1996 JavaScript, PHP and Visual Basic. It is compatible with ADOdb, MySQL and PostgresSQL databases. is built upon the LAMP/WAMP (Linux/Windows, Apache, MySQL and PHP) architecture
7 CentraView Inc.'s
2004 Java and JSP and is compatible with MySQL databases
8 XRMS CRM 2006(last updated)
Written in an interpreted language (PHP). Compatible databases include ADOdb, SQL-based, Microsoft SQL Server, MySQL and other network-based DBMS
9 Cream CRM It is written in Java and JavaScript
10 Tustena CRM Written in C#, ASP.NET and JavaScript. It is compatible with Microsoft SQL Server
21
Table 2.1 clearly states that top CRM websites prefer Java or .NET. Being an
open source many CRM websites prefer java. Open source solutions are
proving to be popular among businesses with limited costs and unique needs.
Several open source java based data mining tools (Kalra Shipra, Gupta
Rachika (2011)) like ELKI, SCaViS, KNIME, Orange, Rapid Miner, Scriptella
ETL, Weka, Jasper Soft etc. are available to analyse the data.
2.3. Statistics supporting the use of .NET and J2EE in software
industry
In a market survey conducted by W3Techs.com on 24th May 2012 Java was
used by 3.9% of all websites. Other languages like PHP, ASP.NET and Cold
Fusion were also used to build websites. As per a Java market report in 2013,
growth rates of Java were high when compared to all other server-side
programming languages. In 2012 W3Techs.com survey stated that ASP.NET
was used by 21.4% of all websites and till April 2013, 18.9% of websites used
ASP.NET and 89.5% used JavaScript. As per International Data Corporation
(IDC) 78% universities teach Java and 50% universities require Java.
In 2008 novell.com connection magazine stated that the two stacks ASP.NET
and J2EE had equal shares of 45% each in the world market share and only
about 10% of the world market was driven by other stacks which are mostly
open source application servers.
22
2.4. Business Intelligence and CRM
As per Gartner report major advantage of business intelligence application in
consumer relationship management was to provide a better understanding of
consumer needs.
In Forbes report on 18/6/2013, Gartner predicts that by 2017 CRM revenues
will cross $36 million and BI revenues will cross $18 million, which clearly
indicates the growing contribution of CRM and BI in the world market share.
Gartner report 2013 also predicts that mobile CRM applications that can be
downloaded from app stores will grow from over 200 in 2012 to 1200 by 2014.
It also predicts that total CRM software applications will be delivered as SaaS
(Software as a Service) during 2016 and Salesforce.com will remain the
largest vendor in terms of revenue in 2013.
Until 2012 several categories of business intelligence tools like spreadsheets
were used for analysing the data. Reporting and querying software tools that
extract, sort, summarise and present selected data were also used for data
analysis. Several other applications used OLAP (Online analytical
processing), digital dashboards, data mining, data warehousing, decision
engineering, process mining, business performance management and local
information systems. Except for spreadsheets these tools are sold as
standalone tools, suites of tools, components of ERP systems or as
23
components of software targeted to specific industry. These tools are
sometimes packaged into data warehouse appliances. Several free open-
source data mining software and applications which are based on these
categories are available in the present day software industry.
2.5. Summary
In this chapter various researches related to CRM which implemented data
mining techniques of business intelligence are discussed. The usage of IT in
CRM sector is also briefed along with the growing importance of free and
open source software. Statistics supporting use of .NET and J2EE in software
industry and the increasing importance of CRM and BI in the world market
share are also discussed.
24
3. CONSUMER RELATION MANAGEMENT OF .NET AND J2EE
USING BUSINESS INTELLIGENCE
3.1. Overview
Consumer relation management (CRM) is a model for managing a company’s
interactions with consumers. CRM involves technology usage to organise and
automate sales, marketing, consumer service and technical support. CRM
includes relationship management and automation of consumer transaction
processes using CRM software and opportunity management
3.2. Consumer Relation Management
The main elements of consumer relation management include consumer
Identification, consumer attraction, consumer retention and consumer
development. Consumer satisfaction and loyalty have considerable impact on
these elements. Hence, study of these two factors is essential to improve any
business.
3.2.1. Consumer Satisfaction
Consumer satisfaction is the extent with which products are purchased and
services offering taking place. Consumer satisfaction changes the buying
behaviours of consumers (Hsieh Nan-Chen , Chu Kuo-Chung (2009)).
Satisfied consumers have a greater desire to purchase more products. This
25
strengthens the enterprise relations with existing consumers. It is a cost-
saving approach for enterprises to encourage repurchasing tendency of the
consumer. Through their word-of-mouth enterprises gain new consumers who
improve profitability of the company. Satisfied consumers never bother to pay
even more prices if necessary. Few factors that influence consumer
satisfaction and retention are products, service and corporate Images.
Consumer retention enables to calculate life time value of a consumer which
in turn helps to improve the profitability of an enterprise.
Consumer life time value (CLV) is calculated as
CLV = GC * [(1+ d) / (1+d-r)] ………… (3.1)
In equation 3.1, GC is the yearly gross contribution, d is the yearly discount
rate and r is the yearly retention rate.
3.2.2. Consumer Loyalty
Consumer loyalty is the willingness of consumer to buy specific products or
services of a company. There are two types of consumer loyalty. Short term
loyalty is the consumer waver when there are better alternatives. Long term
loyalty makes the consumer stay with company for a longer duration.
Enterprises believe that long term consumer loyalty is obtained through better
service and novelty (Jones.T. O. , Jr. Sasser. W. E. (1995)).
26
Consumer segmentation strengthens relations with the consumers and makes
them stay loyal towards the enterprise (Pillai Jyothi, Vyas .O.P. (2012)). Good
consumers create an intense feeling of loyalty towards the enterprise, its
products and services. Consumer potential is calculated from the number of
visits made by a consumer and the value of items purchased on each visit,
using Spearman’s rho (ρ) or Kendall’s tau.
Spearman correlation coefficient is defined as the Pearson correlation
coefficient between the ranked variables. For a sample of size n, the n raw
scores Xi, Yi are converted to ranks xi, yi and ρ is computed from these:
…………(3.2)
Kendall τ coefficient for n values is defined as:
………….(3.3)
.
Advancements in the mobile technology, vast availability of mobile devices,
growth in the internet speeds and also their declining costs makes the
enterprises to focus on mobile consumer relation management (mobile CRM)
for their profitability.
27
3.2.3. Mobile CRM
Mobile devices such as tablet-PC, smart-phone, i-phone etc. are utilised in
consumer relation management for a better profitability. Companies use these
devices to interact with consumers and also for accessing their data from
wherever they are located. Since mobile apps define security and access
preferences to individuals and group of consumers, any executive can interact
with a consumer or data without any difficulty.
Mobile CRM helps enterprises to improve consumer conversion rates. These
in turn help gaining a competitive advantage and also explore new buying and
selling business opportunities.
Consumer relation management utilises cloud computing platform for
enhancing the profitability of an enterprise. Latest technological
advancements in the cloud computing are forcing enterprises to shift towards
the cloud environment.
3.2.4. CRM in Cloud Computing
There are multiple benefits of CRM under cloud computing. CRM under cloud
creates new buying and selling opportunities. Cloud platforms enhance repeat
purchases and thus increase the profitability of a company. Because of the
28
virtualisation technology various benefits like rapid deployment, easy
upgradation, reduced cost and better security is achieved.
CRM under cloud helps companies to improve relations with existing
consumers and thus lead to better marketing of products and services. It
enhances consumer satisfaction, retention and places company in a
competitive advantage.
The best known example of Software as a service (SaaS) cloud model is
consumer relationship management offered by Salesforce.com whose solution
offers sales, service, support, marketing, content, analytical analysis and even
collaboration through a platform called “Chatter”.
This type of software is referred to as CaaS or CRaaS for Consumer
Relationship as service software. Sales force website extended its SaaS
offering to allow developers to create add-on applications, essentially turning
the SaaS service into Platform as a Service (PaaS) offering called
“Force.com” platform. Applications built on “”Force.com” are in the form of
Java variant called “Apex” which uses XML syntax for creating user interfaces
in HTML, AJAX and Flex. Nearly thousand applications exist for this platform
from hundreds of vendors.
29
Latest technological advancements in IT achieved a number of benefits to
CRM technology. Various aspects of cloud computing like pay per use,
virtualisation, and easy changeover made CRM less expensive. Social media
such as twitter, orkut, facebook etc., radically changed vendor marketing and
consumer services. Mobile devices have opened up new sales and marketing
channels.
3.2. 5. Big Data and CRM – future CRM
CRM uses technology for automating, organising and synchronising consumer
related activities like buying and selling, consumer service and support. These
activities involve collecting huge consumer data in the form of text, pictures,
audio, video etc. This data is called the “Big Data”.
Big data involves collection of data sets that form huge quantity of data. Such
data is difficult to process using conventional data processing techniques. Big
data enhances business opportunities and improves CRM. Big data analytics
is transforming vendor-consumer relations and interactions. With big data
impingement even business models are getting transformed.
Future “Big data” analytics will offer businesses powerful tools capable of
identifying sales opportunities. Using these tools responses or comments on
30
products on social media can be combined with internal data to understand
consumers’ preferences.
Gartner states that big data will renovate the way companies manage their
relationships with consumers. Hence companies must be prepared to face
such an environment. Gartner also states that big data shall be the next major
emerging technology of CRM to enhance consumer relations with enterprises.
3.3. .NET framework
The architecture of .NET framework is shown in Figure 3.1 It has a Base
Class Library, (BCL) which provides
User interface
Data access
Database connectivity
Cryptography
Web application development and
Network communications etc.
31
Figure 3.1 .NET framework architecture
Developers construct software by combining source code with .NET
framework and other libraries. .NET framework is being used by several new
applications created for Windows platform. Microsoft.NET has an integrated
development environment (IDE) called Visual Studio. Microsoft (MS) Visual
Studio supports programming languages such as C++, C#, J#, Visual Basic,
Visual C++. It supports these programming languages by means of language
services which allow the code editor and debugger to support nearly any
programming language. Other languages such as M, Python and Ruby are
also supported through support software. Web services of .NET platform
support XML/XSLT, HTML/XHTML, Javascript, CSS and SOAP protocol.
VB.NET C++ C# J#
Common Language Specifications (CLS)
Others
ASP.NET Web Forms and Services
Windows Forms
ADO.NET and XML
Base Class Library (BCL)
Common Language Runtime (CLR)
Operating System (OS)
Vis
ual
Stu
dio
. NET
32
Since the release of .NET in 2002, Microsoft released several versions of
.NET platform. Table 3.1 presents an overview of .NET release history. Visual
Studio 2013 was released on October 17, 2013 along with .NET 4.5.
Table 3.1 Overview of .NET framework release history (Source internet)
Version Released date
Development tool Supported OS
1.0 2002-02-13 Visual Studio .NET
1.1 2003-04-24 Visual Studio .NET
2003
Windows Server 2003
2.0 2005-11-07 Visual Studio 2005 Windows Server 2003
R2
3.0 2006-11-06 Expression Blend Windows Vista, Windows Server 2008
3.5 2007-11-19 Visual Studio 2008 Windows 7, Windows Server 2008 R2
4.0 2010-04-12 Visual Studio 2010
4.5 2012-08-15 Visual Studio 2012 Windows 8, Windows Server 2012
4.5.1 2013-10-17 Visual Studio 2013 Windows 8.1, Windows Server 2012 R2
33
A typical .NET platform has various design features such as
Interoperability
Common Language Runtime (CLR) engine
Language Independence
Base Class Library (BCL)
Simplified deployment
Security and
Portability.
3.3.1. Common Language Runtime
Common Language Runtime, CLR, is a major component of .NET framework.
CLR provides benefits such as exception handling, security, debugging,
versioning etc. These benefits are available to any language built for CLR.
CLR hosts many languages like VB.NET, VC++.NET, C#.NET, VJ#, Perl,
Python and even COBOL. When compiled for CLR this code is called
“managed code” which takes advantage of services offered by CLR. Metadata
created during compilation is used to locate and load classes, generate native
code and provide security. CLR defines standard type system which provides
language interoperability at design time.
34
3.3.2. Microsoft Intermediate Language
When .NET compiles the source code, it does not compile to native code
instead compilation process translates code into Microsoft Intermediate
Language, MISL. The compiler also creates necessary metadata and
compiles it into a component. This resulting Intermediate Language (IL) is
CPU independent. Compilation to native code occurs via Just in Time (JIT)
complier.
3.3.3. Common Type System
Common Type System, CTS, specifies the types supported by CLR which
include
Classes, definition of what will become an object; includes properties,
methods, and events.
Interfaces, definition of functionality a class can implement, but does
not contain any implementation code.
Value Types, user defined types that are passed by value and
Delegates, similar to functions in C++; delegates are often used for
event handling and callbacks.
35
3.3.4. .NET framework Class Library
.NET framework Class Library types include items such as primitive data
types, I/O functions, data access and security. .NET framework provides a
host of utility classes and members organised within a hierarchy called
namespace. At the root of hierarchy is the “System” namespace. A
namespace groups classes and members into logical nodes such that it can
have same name for method in more than one namespace.
3.3.5. ASP.NET Web Services
ASP.NET web services are implemented using SOAP protocol. ASP.NET web
services allow developers to easily develop SOAP based applications.
ASP.NET web services are simple to build, test and deploy. Web services
protocol stack shown in the Figure 3.2 contains SOAP which uses HTTP and
XML to make remote procedure calls across the network. To create ASP.NET
pages, languages supported by .NET framework like VB.NET, C#, “Managed”
C++ or Jscript.NET should be used because the web page is compiled into a
DLL. To run ASP.NET pages, IIS (Internet Information Services or Server)
web server is required. ASP.NET uses HTML, CSS, Javascript and server
scripting for building websites.
36
Figure 3.2 Web services protocol stack
A programming model of ASP.NET is Model, View and Controller (MVC),
which is a framework for building websites. Three different development
models namely Web Pages, Model View Controller and Web Forms are
supported by MVC. In MVC,
Model represents application core i.e., list of database records
View displays data i.e., database records and
Controller handles input to database records.
This MVC model manages HTML, CSS and JavaScript.
Layer 4 UDDI (Service Discovery)
Layer 3 WSDL (Service Description)
Layer 2 XML Messaging (XML, SOAP)
Layer 1 Transport (HTTP, SMTP, FTP)
37
3.4. J2EE framework
Figure 3.3 J2EE framework (source internet)
J2EE framework shown in the Figure 3.3 contains container in which the Java
Virtual Machine (JVM) exists. JVM aids developers in building enterprise
applications effectively. JVM is essentially a set of Java technologies like
JNDI, JDBC, JAAS, JTA, JMX, Java Mail, EJB, JMS, CORBA, SOAP, RMI,
Servlets, JSP, XML etc.
3.4.1. J2EE Platform
J2EE platform is a Java distributed application server environment which
provides the following:
38
A set of Java extension APIs to build applications. These APIs define
programming model for J2EE applications.
A run-time infrastructure for hosting and managing applications. This is
server runtime in which the application resides.
3.4.2. J2EE Runtime
Server side resources are scarce and require special attention. Some of these
resources include threads, database connections, security, transactions etc.
Custom building is always a challenge in this infrastructure that deals with
these resources. Since these server side requirements are common across
wide variety of applications it is more appropriate to consider a platform that
has built-in solutions. This separates these infrastructure-level concerns from
more direct concern of translating application requirements to software that
works. J2EE runtime addresses such concerns. J2EE does not specify the
nature and structure of runtime. Instead it introduces container and via J2EE
APTs specifies a contact between the containers and applications.
3.4.3. J2EE APIs (Application Program Interfaces)
Distributed applications require access to a set of enterprise services. Typical
services include transaction processing, database access, messaging,
multithreading, etc. J2EE architecture accesses to such services in its
39
enterprise service Application Program Interfaces, APIs. Instead of having to
access these services through proprietary or non-standard interfaces,
application programs in J2EE can access these APIs via a container.
A typical commercial platform of J2EE (or J2EE application server) includes
one or more containers and access to enterprise APIs is specified by J2EE.
Java standard extensions that J2EE 1.3 platform support are:
JDBC 2.0
Enterprise Java Beans (EJB) 2.0
Java Servlets 2.3
Java Server Pages (JSP) 1.2
Java Message Service (JMS) 1.0
Java Transaction API (JTA) 1.0
Java Mail 1.2
JavaBeans Activation Framework (JAF) 1.0
Java API for XML Parsing (JAXP) 1.1
Java Connector architecture (JCA) 1.0 and
Java Authentication and Authorization Service (JAAS) 1.0.
40
J2SE APIs that J2EE 1.3 supports are:
Java Interface Definition Language (IDL) API
JDBC Core API
RMI-IIOP API and
JNDI API.
3.4.4. J2EE Technologies
A collection of technologies that provide mechanics which are needed to build
large distributed enterprise applications are:
Component technologies
These technologies are used to hold most important part of
application – the business logic. There are three types of
components: JSP pages, servlets and Enterprise Java Beans.
Service technologies
These technologies provide application’s components with
supported services to function efficiently.
Communication technologies
These technologies which are mostly transparent to the
application programmer provide mechanisms for communication
among different parts of the application whether they are local or
remote.
41
3.4.5. Java Server Pages
Java Server Pages (JSP) embeds components in a web page such that it is
sent to the client. A JSP page contains Hypertext Markup Language (HTML),
Java code and Java Bean components. JSP pages are an extension of servlet
programming model. When a user requests JSP page the web container
compiles JSP page into a servlet. Web container then invokes the servlet and
returns the resulting content to the web browser. Once servlet has been
compiled from JSP page, web container can simply return the servlet without
having to recompile each time. Thus JSP pages provide a powerful and
dynamic page assembly mechanism that benefits from many advantages of
Java platform.
Compared to servlets which are pure Java code, JSP pages are text-based
documents until web container compiles them into corresponding servlets.
This allows a clearer separation of the application logic from the presentation
logic. This in turn, allows the application developers to concentrate on
business matters and web designers to concentrate on the presentation logic.
A typical architecture of a JSP page is shown in the Figure 3.4.
42
Figure 3.4 Architecture of a JSP page (source: internet)
3.4.6. J2EE Service Technologies
J2EE framework includes the following service technologies:
Java Database Connectivity, JDBC, provides developer with an ability to
connect to relational database systems. JDBC API which comes with J2SE
has features such as connection pooling and distributed transactions.
Java Transaction API (JTA) and Service is a means for working with
transactions and especially distributed transactions independent of transaction
manager’s implementation and Java Transaction Service (JTS).
Java Naming and Directory Interface (JNDI) API in J2EE platform has two
fold roles:
43
Firstly it provides the mean to perform standard operations on
directory service resource such as LDAP, Novell Directory Services
or Netscape Directory Services.
Secondly, J2EE application utilises JNDI to look up interfaces used
to create among other things the EJBs and JDBC connections.
Java Message Service (JMS) is a mechanism of sending data
asynchronously. It provides functionality to send and receive messages
through the use of Message-Oriented Middleware (MOM).
Java Mail is an API that can be used to abstract facilities for sending and
receiving e-mail. Java Mail supports most widely used Internet mail protocols
such as IMAP4, POP3 and SMTP but compared to JMS it is slower and less
reliable.
Java Connector Architecture (JCA) is a standardised means of accessing
variety of legacy applications, typically ERP systems such as SAP R/3 and
PeopleSoft and produces “plug-and-play” components to access legacy
systems.
44
Java Authentication and Authorization Service (JAAS) provides a means
to grant permissions based on who is exceeding the code. JAAS utilises
pluggable architecture of authentication modules so that one can drop in
modules based on different authentication implementations such as Kerberos
or KPI.
3.4.7. Some popular Java editors and IDEs
Some well known Java editors which are used for writing java source code
are, Emacs and JEdit. Popular integrated development environments (IDEs)
are Eclipse, Borland JBuilder and JCreator.
To summarise:
J2EE is a container-centric architecture which provides simple runtime
and provides several levels of abstraction.
J2EE recognizes need for composing components into modules and
modules into applications. This is an attempt to standardise reuse of
application components and modules.
J2EE represents very intuitive approach to build applications. While
design process is top-down, deployment process is bottom-up and is a
composition process composing modules from components and
applications from components.
45
3.5. A Comparative Study of .NET and J2EE
J2EE and Microsoft .NET are directly competing software platforms designed
to build and run complex enterprise applications. To obtain a better
understanding of these platforms a comparative study was done by several
organisations like IBM (2004) etc. Today both are dominating the IT industry
and the superiority over the other varies from time to time and place to place.
Analysts do not believe that there will be one winner. Organisations will deploy
both depending on the type of applications they deliver. Presently J2EE and
.NET are taking 40% of the world market share.
Comparative studies on various features of these two platforms (Iqbal Asad,
Ullah Naeem, (2010)) like web services (Miller Gerry,(2003)) etc. are
summarised in the Table 3.2 which is given below (Samtani Gunjan,
Sadhwani Dimple,(2004)):
46
Table 3.2 Comparative study on various features of .NET and J2EE platforms
Features .NET J2EE
Web presentation ASP.NET, IIS server JSP/Servlets, Tomcat
server, Web logic server
etc.
Business services .NET components EJBs
Web services
XML, WSDL, SOAP,UDDI,
WS-I, compatibility
Full support and
WS-I compatibility in
release 1.4
Mobile applications .NET Compact framework J2ME
DB integration ADO.NET EJB-SQL/JDBC
Messaging integration MSMQ Message EJBs/JMS
Legacy integration Com TI JCA
Programming language C#.NET, VB.NET, C++.NET,
others for CLS
Java
Interpreted language MSIL Java Byte code
Runtime environment CLR JVM/JRE
Class libraries .NET framework Java Class Libraries
Rich client Windows Forms AWT/Swing (JTSE)
47
A few limitations and missing capabilities are given in the Table 3.3
Table 3.3 Limitations and missing capabilities of .NET vs. J2EE
J2EE .NET
Java Transaction Service (JTS) Interoperate with COM+ services
Procedural transactions via JTA Limited declarative-only capabilities
Container-Managed Persistence Program it
Message-Driven Beans Build with queued components
Java Database Connector (JDBC) Different APIs for each ADO.NET
provider
Java Naming & Directory (JNDI) Build it
JCA standard adapters and
services
Build it
JMS to other, non-native platforms Get a bridge
3.6. Business intelligence and Data mining
Business Intelligence (BI) adopts data mining techniques (Chris Rygielski,
Jyun-Cheng Wang, David Yen.C. (2002)) such as classification, clustering,
decision trees, prediction, neural networks etc. BI provides visibility, clarity and
insight into the data. Business intelligence consists of tools like data mining,
data marts and decision support systems. BI provides enterprise integration
48
and web services. BI supports powerful enterprise and web based reporting
features.
Data mining refers to extracting or “mining” knowledge from large amount of
data (Han.J, Kamber.M. (2006)). Data mining is also treated as a synonym for
Knowledge Discovery, KDD ((Imielinski. T. , Mannila. H, 1996)) Knowledge
Discovery (Hong Tzung Pei, Huang Tzu Jung, Chang Chao Sheng (2009)) is
a process which consists of iterative sequence of the following steps:
1. Data cleaning – removing noise and inconsistent data.
2. Data Integration – combining multiple data sources.
3. Data Transformation - consolidation of data into various forms
appropriate for mining by performing operations like summary or
aggregation operations.
4. Data mining – applying intelligent methods in order to extract data
patterns
5. Pattern Evaluation – identifying the truly interesting patterns representing
knowledge based on some interesting measures.
6. Knowledge Presentation – visualisation and knowledge representation of
mined knowledge to the user.
49
Data mining involves integration of techniques from multiple disciplines such
as:
Data base and data warehouse technology
Statistics
Machine learning
High-performance computing
Pattern recognition
Neural networks
Data visualization
Information retrieval
Image and signal processing and
Spatial or temporal data analysis.
Various data mining techniques used in this research are
Association rules
Apriroi algorithm
k-means algorithm
Fuzzy and rough set approach.
These data mining techniques are used in CRM for consumer segmentation
etc. (Tsiptsis Konstantinos, Chorianopoulos Antonios,(2009), Zhang
50
Limei,(2010)). These data mining techniques are part of business intelligence
and are used in CRM (Habul. A.,(2010)), Wu Kun , Liu Feng ying,(2010)).
3.6.1. Association Rules
Let I = { I1, I2, …. , Im } be a set of items, D = set of database transactions and T
= set of items such that T ⊆ I. Let A be a set of items. Transaction A is
contained in T if and only if A⊆ T.
An association rule is an implication of the form A => B, where A ⊂I, B⊂ I,
and A⋂B=Ø.
Support s is % of transactions in D that contain AUB. Confidence c is % of
transactions in D containing A that also contain B. If P (A|B) is the conditional
probability then,
support (A=>B) = P (AUB) and confidence (A=>B) = P (B|A)………... (3.4)
The above equation establishes relation between confidence and support
confidence (A=>B) = P (B|A) = support (AUB) / support (A)
= support_count (AUB) / support_count (A) ……….. (3.5)
51
3.6.2. Association Rule Mining
It is a two step process to determine frequent itemsets :
1. Find all frequent itemsets with predetermined minimum support ,
min_sup
2. Generate strong association rules from frequent itemsets that satisfy
min_sup and minimum confidence, min_conf.
3.6.3. Apriori Algorithm
Apriori property: All non empty subsets of a frequent itemset must also be
frequent. This property is used in Apriori algorithm. Various steps involved in
the algorithm are:
1. The join step: To find Lk, a set of candidate k-itemsets is generated by
joining Lk-1 with itself. This set of candidates is denoted Ck. Let l1 and l2 be
itemsets in Lk-1. Notation li[ j] refers to jth item in li. By convention, Apriori
assumes that items within a transaction or itemset are sorted in lexicographic
order. For the (k-1)-itemset, li, this means that the items are sorted such
that li[1] < li[2] < …..< li[k-1]. Join, Lk-1 on Lk-1, is performed, where members
of Lk-1 are joinable if their first (k-2) items are in common. That is, members l1
and l2 of Lk-1 are joined if (l1[1] = l2[1]) (l1[2] =l2[2])….. ……. (l1[k-2] = l2[k-2])
(l1[k-1] < l2[k-1]). Condition l1[k-1] < l2[k-1] simply ensures that no duplicates
52
are generated. Resulting itemset formed by joining l1 and l2 is l1[1], l1[2], l1[k-2],
l1[k-1], l2[k-1].
2. Prune step: Ck is a superset of Lk, that is, its members may or may not be
frequent, but all frequent k-itemsets are included in Ck. A scan of the database
to determine count of each candidate in Ck would result in determination
of Lk (i.e., all candidates having count no less than min_ sup count are
frequent by definition, and therefore belong to Lk. Ck, however, can be huge,
and this could involve heavy computation. To reduce the size of Ck, Apriori
property is used. Any (k-1)-itemset that is not frequent cannot be subset of
frequent k-itemset. Hence, if any (k-1) subset of candidate k-itemset is not
in Lk-1, then candidate cannot be frequent either and so can be removed
from Ck. This subset testing is done quickly by maintaining hash tree of all
frequent itemsets.
Mining of frequent patterns can also be done without candidate key generation
(Han. J., Pei. J, Yin. Y., (2000)) and also by other techniques like “Pincer
Search” algorithm (Lin D-I. , Kedem Z.M., (2002)).
53
3.6.4. Implementation of Apriori Algorithm
The pseudo code for the implementation of Apriori algorithm is given below:
The process of implementing the Apriori algorithm using sample consumer
transactions given in the Table 3.4 is shown in the Figure 3.6. Apriori
algorithm was improved for mining association rules (Liu.Y., Yang.B., (2007)).
Join Step: Ck is generated by joining Lk-1with itself
Prune Step: Any (k-1)-item set that is not frequent cannot be a subset
of a frequent k-item set.
Pseudo-code:
1. Ck: Candidate item set of size k
2. Lk: frequent item set of size k
3. L1= {frequent items};
4. for(k= 1; Lk != Ф; k++) do begin
5. Ck+1= candidates generated from Lk;
6. for each transaction t in database do
7. increment the count of all candidates in Ck+1that are contained in t
8. Lk+1= candidates in Ck+1with min_support
9. end
10. Return UkLk;
54
Table 3.4 Sample consumer transactions
Figure 3.6 Generating frequent itemsets with min_sup 2 using Apriori Algorithm (source: Han.J. , Kamber.M.)
55
Mining of frequent itemsets is also done using vertical data format and
projection based approach (Lan Guo-cheng, Hong Tzung-Pei and Tseng
Vincent S. , (2012)) . Sample consumer transactions given in the Table 3.5
are arranged in vertical format which is shown in the Table 3.5 (Seno.M. and.
Karypis.G. (2001)). In vertical data format consumer transaction data is
analysed with respect to the nature of transactions. This helps the enterprise
to increase its profitability.
Table 3.5 Consumer transactions in vertical data format (source: Han.J., Kamber.M.)
3.6.5. Generating Association Rules from Frequent Itemsets
Once the frequent itemsets from the transactions in the database D is found,
the next step is to generate strong association rules from them
(where strong association rules satisfy both minimum support and minimum
Itemset TID_set
I1 {T100,T400,T500,T700,T800,T900}
I2 {T100,T200,T300,T400,T600,T800,T900}
I3 {T300,T500,T600,T700,T800,T900}
I4 {T200,T400}
I5 {T100,T800}
56
confidence). This is done using the confidence formulae which is given in
equation (3.6).
confidence (A=>B) = P (B|A) = support (AUB) / support (A)
= support-count (AUB) / support-count (A) ………. (3.6)
Conditional probability is expressed in terms of itemset support_count,
where support-count (AUB) is the number of transactions containing item
sets AUB, and support-count (A) is number of transactions containing item
set A. Based on this equation, association rules are generated as follows:
For each frequent item set l, generate all nonempty subsets of l.
For every nonempty subsets s of l, output the rule s=> (l-s)
If [support-count (l) / support-count(s)] ≥ min_conf, where min_conf is
minimum confidence threshold.
Because these rules are generated from frequent itemsets, each one
automatically satisfies the minimum support. Frequent item sets can be stored
ahead of time in hash tables along with their counts so that they are accessed
quickly.
Association rule mining were also implemented using several techniques like
FP-tree(Saravanabhavan .C. Parvathi .R. M. S.,(2011)) and for mining of rare
rules (Selvi Kanimozhi.C.S., Tamilarasi.A.,(2011)).
57
3.6.6. Correlation Analysis
Correlation measure is used to augment support-confidence framework of
association rules. This leads to correlation rules, which are measured by its
support, confidence and correlation as shown in equation (3.7).
A=> B [support, confidence, correlation]. ……………….. (3.7)
P (AUB) = P (A) P (B) indicates the occurrence of an itemset A which is
independent of the occurrence of an itemset B, else both are dependent and
correlated. Then
lift (A, B) = P (AUB) / P (A) P (B)………………………… (3.8)
If lift < 1 both are negatively correlated, lift > 1 positively correlated and lift= 1
then A and B are independent and there is no correlation between them.
Hence correlation analysis is used to filter out uninteresting association rules.
Correlation analysis is also performed using . From this analysis it can be
known that whether given itemsets are negatively correlated or not.
= ∑ (observed value–expected value) 2 / expected value……… (3.9)
3.5.7. k-means clustering
k-means clustering is an iterative algorithm in which items are moved among
sets of clusters until desired set is reached. As such, it is viewed as a type of
squared error algorithm, although convergence criteria need not be defined
58
based on squared error. A high degree of similarity among elements in
clusters is obtained, while high degree of dissimilarity among elements in
different clusters is achieved simultaneously.
Cluster mean of ki = { ti1, ti2, ...…, tim} is defined as
mi= (1/m) ∑m tij ………………………………….(3.10) j=1 This definition assumes that each tuple has only one numeric value as
opposed to a tuple with many attribute values. k-means algorithm assumes
that some definition of cluster mean exists but not a particular one.
This algorithm assumes that desired number of clusters, k, is input parameter.
k-means algorithm is shown below. Here initial values for means are arbitrarily
assigned. These are assigned randomly or use values from first k input items
themselves. For example, algorithm stops when no or very small number of
tuples are assigned to different clusters. Other termination techniques such as
fixed number of iterations are also considered. Maximum number of iterations
is included to ensure stopping even without convergence.
59
Complexity of k-means is O (tkn) where t is number of iterations. k-means
finds local optimum and actually misses global optimum. k-means does not
work on categorical data because mean must be defined on the type. Only
convex-shaped clusters are found. It also does not handle outliers well. One
variation of k-means, k-modes does handle categorical data. Instead of using
means, it uses modes. Typical value for k varies from 2 to 10. Although k-
means often produces good results, it is not time efficient and does not scale
well. By saving distance information from one iteration to next, actual number
of distance calculations that must be made is reduced.
Input
D= {t1, t2,......,tn} //Set of elements
k //Number of desired clusters
Output:
k //Set of clusters
k-means algorithm:
Assign initial values for means m1, m2…... mk;
Repeat
Assign each item ti to the cluster which has the closest mean;
Calculate new mean for each cluster;
Until convergence criteria is met;
60
The efficiency of k-means algorithm was improved since its implementation on
large databases could not scale well. (Kanungo. T., Mount. D.M., Netanyahu.
N.S., Piatko. C., Silverman. R., Wu A.Y.,(2002), Mack Joun,(2002)). This k-
means algorithm later on was modified several times for different applications
(Bagirov.A.M., Mardaneh.K.,(2006)). For large databases clustering may be
performed using other techniques like BIRCH (Zhang.T., Ramakrishnan. R.,
Livny, (1996)), CURE etc.
3.6. Fuzzy Set Approach
Fuzzy set theory (Zadeh (1975, 1976)) is also known as possibility theory as
an alternative to traditional two-value logic and probability theory. It lets one
work at high level of abstraction and offers means of dealing with imprecise
measurement of data. Most important, fuzzy set theory allows dealing with
vague or inexact data. Unlike the notion of traditional “crisp” sets where an
element either belongs to a set or its complement, in fuzzy set theory,
elements can belong to more than one fuzzy set. Fuzzy logic uses truth values
between 0.0 and 1.0 to represent degree of membership that certain value
has in given category instead of having precise cutoff between categories.
Each category then represents a fuzzy set.
61
3.7. Rough Set Theory
The rough set philosophy originates on an assumption that with every object
of universe of discourse, one associates some information (data, knowledge).
Objects characterised by same information are indiscernible (similar) in view
of available information about them. The indiscerniblity relation generated in
this way is the mathematical basis of rough set theory (Pawlak.Z, (2002)).
Any set of all indiscernible (similar) objects is called an elementary set and
forms basic granule (atom) of knowledge about universe. Any union of some
elementary sets is referred to as a crisp (precise) set, otherwise set rough
(imprecise, vague).
Consequently, each rough set has boundary-line cases, i.e., objects which
cannot be, with certainty, either as members of set or of its complement.
Obviously, crisp sets have no boundary-line elements at all. That means that
boundary line cases cannot be properly classified by employing the available
knowledge.
Thus, the assumption that objects can be seen only through information
available about them leads to a view that knowledge has a granular structure.
Due to the granularity of the knowledge, some objects of interest cannot be
discerned and appear same (or similar). As a consequence, vague concepts,
in contrast to the precise concepts, cannot be characterised in terms of
62
information about their elements. Therefore, in this approach, one assumes
that any vague concept is replaced by a pair or precise concepts called lower
and upper approximation of vague concept. Difference between upper and
lower approximations constitute in boundary region of vague concept.
Approximations are basic operations in the rough set theory ( Silvia Rissino,
Germano Lambert Torres,(2009)).
3.7.1. Approximations
The starting point of the rough set theory is indiscernibility relation, generated
by information about objects of interest. Indiscernibility relation was intended
to express the fact that due to lack of knowledge it was not possible to discern
some objects employing available information. That means that, in general, it
was not possible to deal with single objects, but one had to consider clusters
of indiscernible objects, as fundamental concepts of rough set theory.
The indiscernibility relation is used to define basic concepts of rough set
theory as follows:
B*(X) = {x U: B(x) ⊆ X }………………….(3.11)
B*(X) = {x U: B(x) ∩ X ≠ Ø}……………...(3.12)
Assigning to every subset X of universe U two sets B*(X) and B*(X) called B-
lower and B-upper approximation of X, respectively are derived.
63
BNB(X) = B*(X) – B*(X)………………………(3.13)
This set is referred to as B-boundary region of X. If BNB. (X) = Ø boundary
region of X is empty set X is crisp (exact), w.r.t. to B; in opposite case, i.e., if
BNB.(X) ≠ Ø, set X is referred to as rough (inexact) w.r.t. B.
Four basic classes of rough sets i.e., four categories of vagueness are defined
as follows:
a) B*(X) = Ø and B*(X) ≠ U, if X is roughly B-definable whether they
belong to X or -X, using B i.e., this means that it is possible to decide
for some elements of universal set U.
b) B*(X) = Ø and B*(X) ≠ U, if X is internally B-indefinable
i.e., this means that it is possible to decide whether some elements of U
belong to –X, but it is not possible to decide for any element of U, using
B, whether it belongs to X or not.
c) B*(X) ≠ Ø and B*(X) = U, if X is externally B-definable
i.e., this means that it is possible to decide for some elements of U
whether they belong to X, but not possible to decide, for any element of
U whether it belongs to –X or not, using B.
d) B*(X) = Ø and B*(X) = U, if X is totally B- indefinable
64
i.e., it is not possible to decide for any element of U whether it belong, to
X or –X, using B.
Rough set can also be characterised numerically by the following coefficient
αB (X) = | B*(X) | / | B*(X) | ………………. (3.14)
called accuracy of approximation, where |X| denotes cardinality of X ≠ Ø.
Obviously 0 ≤ αB (X) ≤ 1. If αB = 1, X is crisp w.r.t. B (X is precise w.r.t. B),
otherwise if αB< 1, is rough w.r.t. B (X is vague w.r.t. B).
3.7.2. Reduction and Significance of Attributes and Approximation
Reducts
Rough set theory can be implemented to check the influence of certain
condition attribute on a decision attribute.
Let C, D ⊆ A, be sets of condition and decision attributes respectively, C’ ⊆ C
is D-reduct (reduct w.r.t. D) of C, if C is minimal subset such that
γ (C, D) = γ (C’,D)……………………(3.15)
This concept of reduct is applied to determine influence of each of condition
attribute on the decision attribute. Set of decision rules of form “if... then...
else...” can also been determined based on reducts.
Significance of attributes and approximation reducts is determined as follows:
65
Let C and D be sets of condition and decision attributes respectively, and let α
be a condition attribute, i.e., α C. The number γ (C, D) expresses degree of
dependency between attributes C and D, or accuracy of approximation of U/D
by C. We have to determine how coefficient γ (C, D) changes by removing
condition attribute α i.e., difference between γ (C, D) and γ (C – {α}, D). By
normalising the difference and defining significance of an attribute α with
following equation:
σ (C,D) (α ) =( γ (C,D) - γ (C – { α }, D))/ γ (C,D)
= 1- (γ (C – {α}, D))/ γ (C, D)………… (3.16)
denoted by σ (α), when C and D are given.
Thus σ (α) can be understood as an error classification which occurs when α
is dropped. Significance coefficient is extended as
σ (C, D) (B) = ( γ (C,D) - γ (B, D))/ γ (C,D)
= 1- (γ (C – B, D))/ γ (C, D) ………. (3.17)
denoted by σ (B), if C and D are given, where B is a subset of C. If B is reduct
of C, then σ (B) =1, i.e., removing any reduct from set of decision rules
enables one to make decisions with certainty.
Any subset B of C is called an approximate reduct of C and number
ε(C, D) (B) = (γ (C, D) - γ (B, D))/ γ (C, D) = 1- {(γ (B, D))/ γ (C, D)}
………………………. (3.18)
66
denoted simply as ε (B), is called an error of reduct approximation. It
expresses how exactly set of B attribute approximations affect set of condition
attributes C. Obviously, ε (B) = 1- σ (B) and ε (B) = 1- ε(C - B). For any subset
B of C, ε (B) ≤ ε(C). If B is reduct of C, then ε (B) =0.
The concept of approximation reduct is generalisation of the concept of
reduct. Minimal subset B of condition attributes C, such that or γ (C, D) = γ (B,
D) or ε(C, D) (B) = 0, is reduct. Idea of approximation reduct can be useful in
cases when smaller number of condition attributes is preferred over accuracy
of classification.
Decision rules are generated with reduct attributes such as If “condition” then
“decision”. The variation of “if… then… rules” are determined based on reduct
attributes. In this way rough set theory is quite useful when values in relation
are vague.
3.8. Scapegoat Trees and Max-Heaps
3.8.1. Scapegoat Trees
Scapegoat tree is a self-balancing binary search tree (BST). It was originally
discovered by Arne Andersson and again by Igal Galperin and Ronald L.
Rivest. It provides worst case O (log n) lookup time, and O (log n) amortised
insertion and deletion time.
67
Self-balancing binary search tree provides worst case O (log n) lookup time
but a scapegoat tree has no additional per-node memory overhead compared
to regular binary search tree (Bentley .J.L. (1975)). In the scapegoat tree, a
node stores only key and two pointers to the child node which make
scapegoat trees easier to implement. Due to the data structure alignment, it
reduces node overhead by up to one-third.
When something goes wrong, first thing people tend to do is find someone to
blame. He is called the “scapegoat”. After confirmation of blame, the
scapegoat is left to fix the problem. Structure of the scapegoat tree is based
on this common wisdom.
BST is weight balanced if half the nodes are on left of root, and half on right.
α-weight-balanced is therefore defined as meeting the following conditions:
size (left) <= α*size (node)……….. (3.19)
size (right) <= α*size (node)……… (3.20)
where size can be defined recursively as:
Function size (node)
if node = nil return 0; else
return size (node->left) + size(node->right) + 1;
end.
68
If α = 1, it describes linked list as balanced and if α = 0.5 it matches almost
complete binary trees, CBTs. α-weight-balanced balanced search tree must
also be α-height-balanced, that is
height (tree) <= log1/α(node count)………………………..(3.21)
Scapegoat trees do not keep α-weight-balance at all times, but many times
are loosely α-height-balance such that
height (scapegoat tree) <= log1/α (node count) + 1…………. (3.22)
For this reason scapegoat trees are similar to red-black trees since both have
restrictions on their height. They differ greatly though in their implementations
of determining where rotations (or in case of scapegoat trees, rebalances)
take place. Red-Black trees store additional 'colour' information in each node
to determine location, but scapegoat trees find “scapegoat” which isn't α-
weight-balanced to perform rebalance operation. This is similar to AVL trees,
where actual rotations depend on 'balances' of nodes, but means of
determining balance differs greatly. Since AVL trees check balance value on
every insertion/deletion, it is typically stored in each node. Scapegoat trees
calculate it only as needed, which is only when scapegoat needs to be found.
In contrast to most other self-balancing search trees, scapegoat trees are
entirely flexible as to their balancing. They support any α such that 0.5 < α < 1.
A high α value results in fewer balances, making insertion quicker but lookups
69
and deletions slower, and vice versa for low α. Therefore in all practical
applications, α is chosen depending on how frequently these actions should
be performed. An illustration of scapegoat tree is provided in Figures 3.6 and
3.7.
Figure 3.6 A scapegoat tree with 10 nodes and height 5
inserting 7 into a SGT increases its height to 6, violates the condition log3/2 q≤ log3/22n< log3/2 n+2
since 6>log3/2 11≈5.914.A scapegoat is found at the node containing 10.
Figure.3.7 Finding a scapegoat and inserting 7 at node 10
70
Scapegoat Tree (SGT) is a binary search tree (BST) (Bentley.J.L.(1975)) that,
in addition to keeping track of number, n, of nodes in the tree also keeps a
counter, int q, which maintains an upper-bound on number of nodes. At all
times, n and q obey inequalities q/2≤n≤q. In addition, SGT has logarithmic
height at all times. Height of scapegoat tree does not exceed log3/2 q≤
log3/22n< log3/2 n+2. Even with this constraint, SGT will look unbalanced. The
tree in Figure 3.9 has q=n=10 and height 5< log3/210≈5.679.
Finding a node operation (find(x)), in a scapegoat tree is done using standard
algorithm for searching in BST. This takes time proportional to height of tree
which is O (log n).
To implement add(x) operation (adding a node), first increament n and q and
then use standard algorithm for adding x to binary search tree; search for x
and then add a new leaf u with u.x=x. At this point, depth of u must not exceed
log3/2 q.
If depth (u) > log3/2 q, reduce the tree height. This is done as follows: there is
only one node, namely u, whose depth exceeds log3/2 q. To fix u, walk from u
back up to root looking for scapegoat, w. Scapegoat, w, is unbalanced node. It
has the property that (size (w.child) / size (w)) > (2/3) where w.child is child of
71
w on path from root to u. Next prove that a scapegoat exists. For simplicity, it
is taken for granted. Once the scapegoat w is found, completely destroy the
sub tree rooted at w and rebuild it into a perfectly balanced binary search tree.
Even before addition of u, w’s sub tree was not complete binary tree.
Therefore, when w is rebuilt, height decreases by at least 1 so that the height
of SGT is once again at most log3/2 q.
If the cost of finding a scapegoat w is ignored and the sub tree rooted at w is
rebuild, then running time of add(x) is dominated by initial search, which takes
O(log q) =O(log n) time. Cost is found using amortised analysis.
Implementation of remove(x) in SGT is as follow: search for x and remove it
using algorithm for removing a node from BST. It is observed that this can
never increase height of tree. Next, decreament n, but leave q unchanged.
Finally, check if q>2n and, if so, then rebuild the entire tree into a perfectly
balanced BST and set q=n. Again, if cost of rebuilding is ignored, the running
time of remove(x) operation is proportional to height of tree, and is therefore
O(log n). Subroutines for adding a node add(x) and removing a node
remove(x) is given below:
72
3.8.2. Max-Heaps
A max-heap is shown in the Figure 3.9. It is a complete binary tree in which
value in each internal node is greater than or equal to values in children of that
node. Min-heap is defined similarly. In case of max-heap, if a node is stored at
index k, then its left child is stored at index 2k+1 and its right child at index
2k+2.
Figure 3.8 A complete binary tree depicting max-heap
boolean add(T x) {
// first do basic insertion keeping track of
depth
Node<T> u = new Node(x);
int d = addWithDepth(u);
if (d > log32(q)) {
// depth exceeded, find scapegoat
Node<T> w = u.parent;
while (3*size(w) <= 2*size(w.parent))
w = w.parent;
rebuild(w.parent);} return d >= 0; }
boolean remove(T x) {
if (super.remove(x)) {
if (2*n < q) {
rebuild(r);
q = n;
}
return true;
}
return false;
}
73
Mapping elements of heap into an array, as given below, is trivial:
3.8.2.1. Building a Heap
A heap is a Complete Binary Tree (CBT). It is efficiently represented using
simple array. Given array of N values, a heap is built by “shifting” each internal
node down to its proper location as shown in the Figure 3.10. Various steps to
build a heap are as follows:
Start with the last internal node
Swap the current internal node with its larger child, if necessary
Follow swapped node down
Continue until all internal nodes are done start with last internal node
Swap current internal node with its larger child, if necessary
Follow swapped node down - continue until all internal nodes are done
Figure 3.10 Building a heap
74
3.8.2.2. Cost of Building a Heap
Start with a CBT having N nodes; number of steps required for shifting values
down will be maximised if tree is full, in which case N = 2d-1 for some integer d
= [log N]. Cost of building a heap is illustrated in the Figure 3.11.
Figure 3.11 Cost of building a heap
It is proved that in general, level k of full and complete binary tree will contain
2k nodes, and that those nodes are d – k – 1 levels above the leaves. Thus in
worst case, the number of comparisons BuildHeap() will require in building
heap of N nodes is given by
………….(3.23)
Since, at the worst, there is one swap for each two comparisons, maximum
number of swaps is N – [log N]. Hence, building heap of N nodes is O (N) in
both comparisons and swaps.
75
3.8.2.3. Heap Sort
A list is sorted by first building it into a heap, and then iteratively deleting root
node from heap until heap is empty. If deleted roots are stored in reverse
order in an array they are sorted in ascending order, if max heap is used. The
subroutine for heap sort is given below:
void HeapSort (int* List, int Size) {
HeapT<int> toSort (List, Size);
toSort.BuildHeap ();
int Idx = Size - 1;
while (! toSort.isEmpty ()) {
List [Idx] = toSort.RemoveRoot();
Idx--; } }
3.8.2.4. Cost of Heap Sort
Adding in cost of building heap total comparisons are given as,
Total Comparisons = (2N - 2[log N]) + (2N [log N] + 2[log N] - 4N
= 2N [log N]-2N …………….. (24)
Total Swaps = N [log N] – N. ………………(25)
So, in the worst case, Heap Sort is Θ (N log N) in both swaps and
comparisons.
76
3.9. MUlticriteria Satisfaction Analysis (MUSA)
MUlticriteria Satisfaction Analysis, MUSA, is an ordinal regression method to
evaluate consumer satisfaction (Grigoroudis.E. and. Siskos.Y. Christina
Diakaki (2001)). Basis of this approach is in the field of multi criteria analysis
(Nikolaos.F, Matsatsinis.E., Ioannidou.E., Grigoroudis (1999)). This method is
used for assessment of set of trivial satisfaction functions in such a way that
overall satisfaction criterion becomes as consistent as possible with
consumer’s judgments. Thus, main objective of MUSA method is aggregation
of individual judgment into collective value function.
MUSA method evaluates global and partial satisfaction functions (Joao Isabel
M, Costa Carlos A Bana e, Figueria Jose Rui (2007)) Y* and X*i respectively,
given the consumers’ judgments Y and Xi (for the i-th criterion). Ordinal
regression analysis equation has following form:
……..(3.26)
……..(3.27)
where value functions Y* and X*i are normalised in the interval [0,100], n is
the number of criteria, and bi is the positive weight of i-th criterion. It is useful
77
to assume value or tree like structure of criteria, also called as “value tree” or
“value hierarchy”.
In MUSA method an additive collective value function Y* and set of partial
satisfaction functions X*i are assumed. The main objective of this method is to
achieve maximum consistency between value function Υ* and consumers’
judgments Υ. In order to reduce the size of the mathematical program,
removing monotonicity constraints for Y * and X *i following transformation
equations are used:
…………(3.28)
…………(3.29)
This preference disaggregation methodology takes account of also post
optimality analysis stage in order to overcome the problem of model stability
(Corazza .M., Funari. S., Gusso. R., (2012)). The final solution is obtained by
exploring polyhedron of multiple or near optimal solutions, which is generated
by constraints of previous linear program. This solution is calculated by using
n linear programs which are equal to number of criteria.
78
3.9.1. Satisfaction Indices
Estimation of a performance norm is very useful in consumer satisfaction
analysis. Average global and partial satisfaction indices are used for this
purpose and are evaluated through the equations:
……………………………….(3.30)
. …….. (3.31)
where
S and Si are average global and partial satisfaction indices,
and pm and p ki are frequencies of
Consumers belonging to the ym and xik satisfaction levels, respectively.
From equation 3.30 and 3.31 it is concluded that average satisfaction indices
are basically the mean value of global and partial satisfaction functions.
Hence, these indices give average level of satisfaction value globally and per
criterion.
79
3.9.2. Demanding Indices
Global and partial satisfaction functions indicate consumers’ demanding level.
Average global and partial demanding indices, D and Di respectively, are
predictable through equations:
………………………..(3.32)
.….... (3.33)
where
α and αi are number of satisfaction levels in global and partial
satisfaction functions, respectively.
When these indices are normalized in the interval [-1, 1], the following
possible cases hold:
If D = 1 or Di = 1, then consumers have highest demanding index.
If D = 0 or Di = 0, then this case refers to “neutral” consumers.
If D = −1 or Di = −1, then consumers have lowest demanding index.
80
Demanding indices correspond to average deviation of estimated value
functions from “normal”, i.e., linear function. Average demanding indices are
used for (enhancing) the consumer behavior analysis. They can also specify
the extent of company’s improvement efforts, i.e., higher the value of
demanding index, more the satisfaction level should be improved in order to
fulfill consumers’ expectations.
Normalized variables b′i and S′i are evaluated as follows:
……….……..(3.34)
where
b = mean values of criteria weights and
S = average satisfaction indices, respectively.
Average Satisfaction Index (ASI) indicates the extent to which a consumer is
satisfied with the enterprise and Average Demanding Index (ADI) indicates the
extent to which an enterprise needs to improve to satisfy the consumer’s
demands. By improving global and sub criteria the profitability of an enterprise
is improved. Also these two indices provide new buying and selling
opportunities thereby increasing the loyalty of the consumers and extending
better services to consumers.
81
Apart from the above concepts Bayesian classification and accuracy
measures like bagging and boosting algorithms ((Han.J, Kamber.M. (2006))
were implemented in our research.
3.10. Summary:
Consumer satisfaction and loyalty are important factors of CRM for improving
the profitability of a company. For building effective CRM software .NET and
J2EE are considered as these two platforms are being widely in building
enterprise software. A comparative study is taken up to analyse the features
of these software platforms. Based on user’s preference appropriate platform
(J2EE) is considered to build CRM software, To analyse consumer buying
behavior k-means algorithm and Apriori algorithm play an important role in
business intelligence. k-means algorithm is used for consumer segmentation
and Apriori algorithm is used to study consumer buying behaviours.
Integrating MUSA method and rough set theory consumer satisfaction is
analysed in detailed manner. Fuzzy set theory is implemented to analyse
consumer loyalty.
.
82
4. METHODOLOGY
4.1. Overview
To study the identified objectives, Reliance fresh markets located in three
places of Hanamakonda city of Andhra Pradesh state in India were taken into
consideration. The research was divided into survey process and applying
data mining algorithms on the sample data. Data was collected through
various forms designed for each purpose. Depending on the applicability data
was either stored in databases, excel spreadsheets or different file formats
like .csv (comma separated values), .arff (attribute-relation file format) etc. To
analyse this data various data mining techniques of business intelligence or
visualisation techniques like line graphs, pie charts etc. were implemented.
Few algorithms were modified to improve their efficiencies.
4.2. Sampling procedures
A random sample of size 100 or more consumers were taken into
consideration in each case depending on applicability and willingness to
respond to the survey process. MUlticriteria Satisfaction Analysis (MUSA)
method, to analyse consumer satisfaction, was implemented on more than
200 consumers in 3 places of Hanamkonda city. Survey was done on
randomly selected consumers in 3 super markets of Reliance fresh located in
3 places of the city. To analyse consumer loyalty and service 120 randomly
selected consumers were considered.
83
4.3. Data Collection Techniques
Consumer profiles were collected by designing a consumer profile form shown
in Figure 4.1. A consumer website was designed to collect the consumer data
online. Consumer transaction data was collected from Reliance fresh staff and
was stored in an Excel spreadsheet.
Figure 4.1 Consumer profile form
MS Excel spreadsheet 2007, MS Access 2007 and Oracle 10g databases
were used to store the data.
To analyse the user satisfaction on .NET and J2EE platforms, a comparative
study on various parameters which are common to both platforms, was done.
Separate forms were designed to gather data on consumer satisfaction and
consumer service.
84
Which is your satisfaction level about the company? (Tick appropriate)
CS-Completely Satisfied, VS-Very Satisfied, D-Dissatisfied, S-Satisfied,
CD-Completely dissatisfied
Figure 4.2 shows the survey questionnaire designed to collect consumer
satisfaction levels for global (main) criteria of the enterprise namely the 4Ps -
personnel, product, physical appearance and place. Figure 4.3 shows survey
questionnaire form designed to collect consumer satisfaction levels for sub
criteria of the each of the global criterion.
Figure 4.2 Survey questionnaire for main satisfaction criteria
Satisfaction Criteria CS VS S D CD
1.Personnel
2.Product
3.Physical
Appearance
4.Place
85
Which is your satisfaction level about the following of the company? (Tick appropriate)
CS VS S D CD Skills/Knowledge Responsiveness Friendliness Quality Quantity Variety Prices Appearance of stores Waiting time (busy hours)
Waiting time (non-busy hours)
Service time Cleanliness Location of stores Number of stores Parking
CS-completely satisfied, VS-very satisfied, S-satisfied, D-Dissatisfied
CD-completely dissatisfied
Figure 4.3 Survey questionnaire form for sub criteria satisfaction
Opinion on consumer service data was collected using the consumer service
form shown in Figure 4.4.
86
Figure 4.4 Consumer Service Survey Form
Consumer Service Survey Customer Name : Address: Mobile Number: Email 1 2 3 4 5 Staff was available in a timely manner Staff greeted you and offered to help you Staff was friendly and cheerful throughout Staff answered your questions Staff showed knowledge of the products/services
Staff offered relevant advice Staff was well-mannered throughout Overall, how would you rate our customer service?
What did you like best about our customer service?
How could we improve our customer service?
Is there a staff person you would like to commend?
Name: Reason:
Do you prefer on line consumer support? YES/NO
Thank you for completing our customer service survey. 1. Excellent 2.Good 3.Average 4. Fair 5. Poor
87
4.4. Research Methodology
To identify “Media mix” for effective reach and quality networking, opinion of
around 200 consumers visiting Reliance fresh super market was taken. The
availability of better internet connectivity and wide use of mobile phones made
Reliance fresh opt for e-mail, SMS, live chat, e-newsletter and face to face
communication.
There are various communication channels used by businesses and these
channels are of vital importance in creating and sustaining the business. For a
business, physical presence is essential to present friendly, contactable, open
face interaction so that consumer feels comfortable. To gain the trust of a
consumer, various online channels of communication that replace face to face
were considered. As a substitute for face to face interaction with consumers,
opinion on some of the most common channels was taken which are listed
below:
e-mail is the most common and easiest way to communicate with the
consumers. When a potential consumer is interested in the products, he will e-
mail a query. Response is given in the form of text or images or attachments
of a file. Through e-mail orders can also be booked which also serves as
evidence.
88
Short Message Service (SMS) is also an important channel for the
businesses to communicate and interact with the consumers. Mobile phones
revolutionised the communication habits of the individuals and organisations
transformed it as a fundamental part of their daily activity. SMS messaging
has become a communication channel that is fast, reliable, highly effective
and instant. Some key benefits of SMS messaging are establishing one to
one communication channel with the consumer, fast, reliable and personal.
The message is instantaneously sent and received. Also, 70% of the SMS
messages are read instantly. Messages are sent to one or thousands of
recipients at same time in minutes via bulk or group SMS. It has a lower cost
than any other comparative communication medium. It saves money, time and
improves consumer experience.
Newsletters provide free information and encourage consumers to buy more
products. It works as a great customer service and a retention tool, giving
customer the satisfaction. Enterprises include new improvements to product
and intelligent content to pull in customers.
Live Chat is a novel and effective way to make selling and buying online
through company’s website. It encourages consumers with an idea of being
able to communicate immediately and get response to their queries. It
presents business as one that is proactive and technology savvy. Additionally,
89
it converts a casual web window shopper into a serious buyer more quickly
due to the time he or she spends on the site. However, with this channel it is
necessary that someone has to constantly be available at the other end. If
available only at certain times, one must put that time on the website so that
people know when to come back and do not feel frustrated if they try to chat
and find no one there.
In order to build these channels of communication and a consumer website for
storing the data online, user (here the developer) choice of software platform
was identified through the comparative study. Satisfaction opinion on various
parameters of J2EE and. NET platform was collected from various user
segments namely software executives, students and faculty who use these
two platforms. To identify the preferred platform, performance of J2EE
and.NET was compared in our lab using Load Runner tool by creating 100
virtual users on a sample application.
For identifying consumer requirements and opportunities that facilitate
increase in terms of profit margins, revenues, buying and selling, consumer
segmentation was performed with Weka tool.
90
k-means algorithm
Input:
D= {t1, t2... tn} //Set of elements
k //Number of desired clusters
Output:
k //Set of clusters
k-means algorithm:
Assign initial values for means m1, m2…... mk;
Repeat
Assign each item ti to the cluster which has the closest mean;
Calculate new mean for each cluster;
Until convergence criteria is met;
k-means algorithm shown in Figure 4.5 was implemented for segmentation.
Consumer segmentation was done using Weka tool’s Explorer and
Knowledge Flow features. For Explorer option .csv file format was used and
for explorer .arff file format was used.
Figure 4.5 k-means clustering algorithm
For large databases the efficiency of k-means clustering algorithm was
improved by using scapegoat trees and max-heaps. This was essential since
91
consumer transaction data is increasing day by day. For large databases as
the clusters grew, the running time of traditional k-means algorithm increased
and providing fast and better quality clusters became difficult. This is because
of frequent calculation of euclidean distance which is necessary whenever a
new cluster is to be formed or to classify an object into a cluster. Hence there
was a need to improve the efficiency of k-means algorithm in case of large
databases. This was achieved using scapegoat trees and max-heaps.
The main idea behind introducing max-heaps was to sort the products in
descending order of sales so as to find the most profitable items. Then to
introduce a new product or to replace a product which was not performing
well, scapegoat tree concept was introduced.
The proposed algorithm “KCUSTMH” (k-means clustering using scapegoat
tree and max heap), shown in Figure 4.6, aimed at reducing the computational
overhead arising out of unnecessary calculation of distances between data
objects and clusters in each of the iterations. The following procedure was
adopted to implement the modified algorithm:
92
Proposed Algorithm
Initially, k data objects are chosen to serve as centroids of k initial clusters.
Then euclidean distances of each data object from these centroids are
calculated. In the next step, each data object is assigned to its nearest cluster
based on calculated euclidean distance. Then an empty scapegoat tree is
initialised. Thereon, into this tree, labels of objects as keys and max-heap
corresponding to each key as corresponding values are inserted. The max-
heap in turn contained pairs of labels of clusters and distances of their
centroids from data object (key) as its values. If in the iteration, an object
moved from one cluster to another cluster, centroids of these two clusters are
recalculated. The new distances between these two clusters and data objects
is calculated. Then the old distances saved in the max-heaps are replaced
with these new distances. This process is continued. The new distances
corresponding to only those clusters which are altered due to the movement of
data objects is then calculated. In the next iteration, maximum element of
each max-heap corresponding to each object put in scapegoat tree as key is
popped out. This popped out element is a pair of cluster label and distance of
its centroid from the object. Now cluster corresponding to this class label will
act as a new cluster for the object. Thus no recalculation of distances between
objects and clusters is required. Assume that a run of k-means algorithm
consists of only one iteration and this iteration in turn consists of only one
movement of single object. The proposed algorithm in this case, calculates the
new distances corresponding to only those two clusters which are altered due
93
to the movement of data objects. However traditional version of k-means
algorithm calculates the distances of each object with each cluster. As a
result, this new version of k-means algorithm provides huge advantage in
terms of time over traditional k-means algorithm. Final step of this new
algorithm ends in the same way as the traditional k-means algorithm i.e. when
no object moves from one cluster to other cluster in the iteration.
KCUSTMH algorithm
Input
D= {t1, t2... tn} //Set of elements
k //Number of desired clusters the key;
Output:
k //Set of clusters
KCUSTMH algorithm:
Assign initial values for means m1, m2…... mk;
Repeat
Initialise an empty scapegoat tree;
Fill the tree with object labels as key and max-heaps as value;
Fill the max-heaps with the pairs of cluster labels and distance between the
cluster and
For each object
Repeat
Pop the topmost element i.e. the maximum element of its corresponding max heap;
If the cluster label contained in this element = the present cluster label of the object;
then
do nothing;
else
94
Figure 4.6 KCUSTMH algorithm – (k-means clustering using scapegoat tree and max heap)
Machine possessing 1 GB main memory and 1.83 GHz dual core processor
with windows XP service pack 2 as operating system was used to test the
efficiency of KCUSTMH over traditional k-means algorithm.
Move the object into the cluster corresponding to the cluster label obtained;
Calculate new centroids i.e., a scapegoat of the two clusters which have
suffered alteration i.e. the original and the new cluster of the object just
moved;
Calculate the distances of each object from these two clusters centroids;
Replace the old ones with these just calculated distances;
Until no more objects;
Pop out the maximum element of each max-heap corresponding to the object
put in the scapegoat tree as key;
This popped out element is a pair of a cluster label and distance of its
centroid from the object;
Check the cluster corresponding to this class label;
If this cluster is the same as the original cluster of the object;
then
do nothing;
else
move the object to the new cluster;
Until no object moved between clusters i.e., convergence is met;
95
Once, consumer segmentation was done it was treated as a collection of
clustered association rules (“if…then…” rules) which are used in decision
making.
Consumer buying behaviours were studied using Apriori algorithm given in
Figure 4.7. It was implemented using Weka data mining tool. Pseudo code to
implement Apriori algorithm is given in Figure 4.8.
Figure 4.7 Apriori algorithm for finding frequent itemsets
Pseudo code for implementing Apriori algorithm is as follows:
Apriori algorithm
Various steps involved in Apriori algorithm are as follows:
•Join Step: Ck is generated by joining Lk-1with itself
•Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a
frequent k-itemset.
96
Figure 4.8 Pseudo code to implement Apriori algorithm
By identifying frequent itemsets, consumer requirements and opportunities
that facilitate increase in terms of profit margins, revenues, buying and selling
are identified.
To improve consumer service, online consumer support systems and to build
loyalty to position as a competitive advantage, MUlticriteria Satisfaction
Analysis (MUSA) and rough set theory were implemented for an extensive
study of consumer satisfaction on product, personnel, place and physical
•Pseudo-code:
Ck: Candidate itemset of size k
Lk: frequent itemset of size k
L1= {frequent items};
for (k= 1; Lk! =∅; k++) do begin
Ck+1= candidates generated from Lk;
For each transaction T in database do
increament the count of all candidates in Ck+1that are contained in T
Lk+1= candidates in Ck+1with min_support
end
return ∪kLk;
97
appearance (4Ps) attributes of the enterprise. Consumer loyalty was adjudged
using grid partition method of fuzzy set theory, normalising the data values. By
analysing consumer satisfaction and loyalty opportunities that facilitate
increase in terms of profit margins, revenues, buying and selling were
identified.
Initially consumer satisfaction with the enterprise was studied using MUSA
method. Then, rough set theory was implemented for analysing consumer
satisfaction with on reduct attributes. The flowchart in Figure 4.9 shows the
implementation of MUSA method and rough set theory to study consumer
satisfaction. Once the consumer satisfaction on the enterprise was explored
consumer loyalty was next analysed.
98
NO
YES
NO
YES
YES NO
Start MUSA
Identify Global and Sub
criteria for Consumer
Satisfaction
Develop Questionnaire for
Global and Sub citeria
Satisfaction
Obtain Global Satisfaction
Satisfied?
Obtain Sub criteria
Satisfaction
Satisfied?
Consumer Satisfied
Need
Improvement? Stop MUSA
Apply Rough Set Theory
Figure 4.9 - Flowchart of MUSA method integrating with rough set theory
99
Analysing consumer loyalty
To analyse consumer loyalty age and income data were discretised and then
normalised through data transformation. Consumer loyalty was obtained from
the number of visits made and satisfaction obtained from a scale. Consumer
satisfaction and consumer loyalty were fuzzified, and linguistic values are
shown in Figure 4.10.
Based on grid partition method, k-various linguistic values with triangle-
shaped membership functions were interpreted as low (L), medium (M), and
high(H) and their fuzzy values were interpreted as very low(VL), low(L),
medium(M), high(H) and very high(VH).
Consumer satisfaction Consumer loyalty
Figure 4.10 Consumer satisfaction and Consumer loyalty
obtained from a scale
100
Data preprocessing was done in order to clear noisy and inconsistent data.
Consumer satisfaction and loyalty are fuzzified, and the linguistic values are
shown in Figure 4.10. On minimum confidence (min_conf, equation 5) and
support (min_sup) “if…then… rules” were derived to determine customer
loyalty. To identify potential consumers linear regression or Spearman’s rank
correlation (rho-ρ) is applied to find correlation between number of visits made
by consumer and value purchased by him.
To extend better consumer service opinion of 120 consumers was taken in
consumer service Survey form shown in Figure 4.4. Using Bayesian
classification (Bayes’ theorem) it was determined that “Media mix” was also
useful to extend online consumer support.
Bayes’ theorem
Let X is a data tuple. In Bayesian terms, X is considered “evidence”. It is a set
of n attributes. Let H be some hypothesis, such as that the data tuple X
belongs to a specified class C. For classification problems P(H/X) is
determined. The probability that the hypothesis H holds given the “evidence “
or observed data tuple X.P(H/X) is the posterior probability of H conditioned
on X. P(H) is prior probability. The posterior probability P(H/X), is based on
more information (e.g. customer information) than the prior probability , P(H),
which is independent of X. Similarly, P(X/H) is the posterior probability of X
101
conditioned on H.P(X) is the prior probability of X. Bayes’ theorem is useful in
that it provides a way of calculating the posterior probability, P(H/X), from
P(H),P(X/H) and P(X).Bayes’ theorem is
P(H/X)=P(X/H) P(H) / P(X) ………………………(4.1).
Accuracy Measures:
Accuracy of classifications in the above results is determined by drawing a
confusion matrix for positive and negative tuples as shown below:
Predicted class
Accuracy class
To know how well the classification is performed sensitivity measures and
specificity measures are used.
Sensitivity is true positive (recognition) rate. It is measured as proposition of
positive tuples that are correctly identified.
sensitivity = t_pos / pos…………………….(4.2)
where t_pos=number of true positives that are correctly classified.
pos = number of positive tuples…………..(4.3)
C1 C2
C1 True positives False negatives
C2 False positives True negatives
102
Specificity is true negative rate. It is measured as proposition of negative
tuples that are correctly identified.
specificity = t_neg/neg……………………..(4.4)
where t_neg=number of true negatives that are correctly classified.
neg= number of negative tuples…………………………………………….(4.5)
From above two equations precision is calculated as
precision = t_pos / (t_pos +f_pos)…………………………………………….(4.6)
where f_pos = number of false positives
Hence accuracy is defined as
accuracy = [sensitivity (pos/ (pos + neg)]+[specificity( neg (pos+ neg)]
……….(4.7)
True positives, true negatives, false positives, and false negatives are also
useful in assessing costs and benefits (or risks and gains) associated with
classification model.
Increasing the accuracy of classification
Classifier and predictor accuracy is further improved using the ensemble
methods – Bagging and Boosting.
103
Bagging Algorithm:
The bagging algorithm - create ensemble of models (classifiers or predictors)
for learning scheme where each model gives equally-weighted prediction.
Input:
D, a set of d training tuples;
K, the number of models in the ensemble;
A learning scheme (e.g., decision tree algorithm, back propagation, etc.)
Output: A composite model, M*.
Method:
1) For i=1 to k do// create k models;
2) Create bootstrap sample, Di, by sampling D with replacement;
3) Use Di to derive model, Mi;
4) End for
Using composite model on tuple X;
1) If classification then
2) Let each of the k models classify X and return the majority
vote;
3) If prediction then
4) Let each of the k models predict a value for X and return
average predicted value;
104
Boosting Algorithm: Adaboost
Boosting algorithm - create an ensemble of classifiers. Each one gives
weighted vote.
Input:
D, set of d class-labeled training tuples;
K, number of rounds (one classifier is generated per round);
Classification learning scheme.
Output: Composite model.
Method:
1) Initialise weight of each tuple in D to 1/d;
2) For i= 1 to k do// for each round;
3) Sample D with replacement according to the tuple weights to obtain
Di.
4) Use training set Di to derive a model, Mi
5) Compute error(Mi), the error rate of Mi
6) if error(Mi) > 0.5 then
7) reinitialise the weights to 1/d
8) go back to step 3 and try again;
9) end if
10) for each tuple in Di that was correctly classified do
105
11) multiply the weight of the tuple by error(Mi) /(1- error(Mi));//
update weights
12) normalise the weight of each tuple;
13) end for
Use composite model to classify tuple X:
1) initialise weight of each class to 0;
2) for i= 1 to k do// for each classifier;
3) wi = log[(1- error(Mi)/ /error(Mi)];//weight of the classifier’s vote
4) c=Mi(X);// get class prediction for X from Mi;
5) add wi to weight for class c
6) end for
7) return class with largest weight;
Class with highest sum is the “winner” and is returned as class prediction for
tuple X.
4.5. Tools Used:
a) Automation Tools :
i. MS Excel Spreadsheet 2007 For storing data
ii. Weka For implementing data
mining techniques.
106
iii. JCreator Pro V4 (Screen Shot 1) For writing and executing Java
Programs, JSP & HTML pages.
iv. Apache Tomcat V 5.0 Web Server
v. MS Access 2007, Oracle 10g Databases for storing data
vi. MS Visual Studio 2005 ASP.NET Web pages
b) Graphical Tools : Pie Chart, Bar Chart, Line graph
c) Questionnaire : Multiple Choice, Open ended
107
5. RESULTS AND DISCUSSION
To identify better communication channels i.e. “Media mix”, for new buying
and selling opportunities, effective reach and quality networking, e-mail, SMS,
newsletters and live chat were considered. These were developed in a user
friendly and customized manner using .NET/ J2EE.
Establishing good communications with consumers need a mix of better
communication channels. To identify “Media mix” for effective reach and
quality networking, opinion of around 200 consumers visiting Reliance fresh
super market was taken. The opinion of the consumer on the “Media mix” is
shown in Figure.5.1. The availability of better internet connectivity and wide
use of mobile phones made Reliance fresh opt for e-mail, SMS, Live Chat, e-
newsletter and face to face communication.
Figure.5.1 Preferred communication channels, “Media mix”, by consumers
108
Due to a growth in the use of internet and mobile phones by consumers, 62%
of them opted for e-mail, 43% of SMS, 25% Live Chat and 16 % News letter.
Consumers were willing to share their email-Ids and mobile numbers without
any fear. Still 49% of them felt that face to face interaction was the best.
Consumers opined that, these can as well be used to extend online consumer
support.
To create techniques for maintaining database of consumers with J2EE/.NET
software and also to build customised communication channels comparative
study was done on J2EE and .NET platforms. 250 students, 50 faculty
members and 50 software executives (Table 5.1 A and B) participated in the
study. User opinion on various parameters of J2EE and. NET platform was
collected and stored in Table 5.2. All of them expressed nearly the same
opinion that both are equally good. But slight majority went in favour of J2EE.
Information in Tables.5.1 A & B indicates this fact. The gaining popularity of
open source software states that results are justifiable. Hence in this research
work J2EE was preferred to build web applications.
109
Table 5.1 Responses summary on J2EE and .NET platforms
A. Overall satisfaction analysis
B. Group-wise % satisfaction
.NET J2EE Total
Complete 103 110 213
Partial 66 57 123
Not at all 2 8 10
Total 171 175 346
Platform Students Software Executives
Faculty
J2EE 90 92 97
.NET 91 89 94
110
Table 5.2 User satisfaction opinion of J2EE and .NET platforms
Average Satisfaction Index
Average Demanding Index
(%) (%) S NO
PARAMETERS Wt (%)
J2EE .NET J2EE .NET
1 Simplicity of the language 53.33 90.6 93.1 -53.2 -51.3
2 Architecture 43.33 92.5 83.6 -64.2 -62.1
3 Object oriented Concepts 86.67 95.2 94.8 -21.3 -22
4 Support technologies 73.33 91.8 90.4 -69.2 -64.4
5 Presentation tier technologies 50.6 86.2 88.3 -92 -90.9
6 Middle tier technologies 46.67 91.6 90.9 -92 -88.5
7 Data tier technologies 63.33 71.8 71.7 -20 -33.7
8 Frame Work Technologies 40.7 88.9 90.8 -13.3 -11
9 Maturity 80.23 71.5 68.6 -30.19 -29.8
10 Interoperability and Web Services
46.67 88.5 90.9 -70.2 -69.4
11 Scalability of applications 65.3 96.9 94.1 -89.9 -28.8
12 Portability 68.2 95.5 93.3 -94.8 -62.7
13 Client device independence 57.3 90.4 90.7 -71.5 -93.4
14 Cost of developing
applications
63.08 91.9 80.1 -54.4 -42.3
15 Performance level of
applications
66.67 74.6 73.9 -19.1 -12.9
Overall 87.86 86.35 -57.66 -50.88
111
The user satisfaction opinion of J2EE and .NET on various parameters is
shown in Table 5.2. Analysis of the information in Table 5.2 revealed the
following:
Average satisfaction index is calculated as the mean value of various
parameters. This indicated the extent of satisfaction on each of the
platforms. Higher the value more is the satisfaction level.
Higher the value of demanding index more the satisfaction level should
be improved to fulfill expectations of the users. That means that users
demanded more improvement on these platforms.
Based on the users’ opinion a customer website was built using Java Server
Pages (JSP code) to store consumer data. Provision to send an e-mail and
SMS to the consumers was also provided in the website.
With JSP, as a developer, it had been easy to develop web pages without
having to know Java programming language or to know anything about writing
servlet code. Hence it was possible to concentrate on writing HTML code
while concentrating on creating objects and application logic.
112
The following observations were made while using JSP to build the consumer
website:
Using HTML and XML with JSP code was easy.
Compiling JSP code and making updates to the presentation code
was easy.
Invoking Java Bean components managed these components
completely shielding the complexity of application logic.
Changing and editing of fixed template portions of web pages was
possible without affecting the application logic.
Similarly, changing the logic without editing JSP code was possible
at the component level.
One major advantage of JSP was its platform independent feature whereas
ASP.NET was attached to the Microsoft technology. ASP.NET pages run only
on IIS but JSP pages run with Tomcat Apache web server, Web Logic,
Glassfish (Net Beans) etc. JSP response time was significantly faster than
ASP.NET especially when the number of user requests was increasing.
ASP.NET runs only on IIS whereas it was possible to host JSP pages on
different web servers. IIS was not compatible with some browsers like Mozilla
etc whereas JSP was compatible with nearly almost all browsers. Drivers
needed to be installed and connectivity needed to be established for building
database support applications. Data type errors were difficult to identify while
building .NET or JSP applications.
113
After a thorough study of J2EE and .NET platforms, following advantages
were identified in the Java Mail over .NET to build a consumer website with
communication channels like e-mail, SMS, live chat etc.:
Using Java Mail receiving and sending e-mails through website was
easy.
Writing e-mail programs using SMTP, POP and IMAP protocols was
easy.
Creating framework, sending and receiving messages was done without
much difficulty using set of abstract classes in API.
Accessing mail folders, downloading and sending messages with
attachments was done without much difficulty using Java Mail methods
and classes.
Without an in-depth knowledge of e-mail it was easy to create cross-
platform mail application using framework.
For accessing mail folders, downloading the messages and sending
messages with attachments and filter mail, there were corresponding
methods and classes.
The following Java Mail API Packages were used to develop customised
mails:
114
javax.mail Java Mail API provided classes that model mail system
javax.mail.event Provided listeners and events for Java Mail API
javax.mail.internet Consisted classes specific to Internet mail systems
javax.mail.search Contained message search terms for Java Mail API
javax.mail.util Java Mail API utility classes
Using JSP had a number of advantages over many of its alternatives like
PHP, Cold Fusion, Flex etc. These advantages are as follows:
JSP code was best suitable to implement the presentation page layer
components.
Business logic and presentation logic was separated without much
difficulty.
Presentation skills were sufficient and in-depth java knowledge was not
required.
If any changes were made to JSP, there is no need to recompile and
reload.
Development time was also reduced.
As web developers and designers it was easy to maintain information-rich,
dynamic web pages with JSP technology. Web based applications which were
platform independent were developed rapidly using JSP technology. JSP
technology enabled the changes to overall page layout without altering the
underlying dynamic content and by separating user interface from the content
generation.
115
JSP uses XML-like tags that encapsulate the logic. Application logic resides in
server-based resources and JSP page was accessed with HTML/XML tags.
HTML or XML tags were passed directly back to the response page. By
separating page logic from its design, display and supporting reusable
component-based design JSP technology makes it faster and easier to build
web-based applications.
It was observed that JSP was well suited for building enterprise applications.
Being an open source to the developer community, JSP interface supported
many web and application servers. JSP pages had the property of "Write
Once Run Anywhere" (WORA).
The advantages of using JSP over competing technologies like PHP,
ASP.NET, Flex, Cold Fusion, etc., are summarised as follows:
Business logic and presentation logic were separated from one another.
Javascript was not limited to a specific platform.
It had full access to the server-side resources that are an integral part of
J2EE architecture.
116
The performance of J2EE and.NET was compared in our lab using Load
Runner tool by creating 100 virtual users on a sample application. The front
end for J2EE was a JSP form and for .NET it was an ASP.NET form. As back
end, MS Access database (and Oracle 10g) was used. Table 5.3 shows the
performance results of ASP.NET and J2EE.
Table 5.3 Memory utilization and response time of ASP.NET and J2EE
Performance results of J2EE and ASP.NET in Table 5.3 proved that J2EE
was a better choice when compared to ASP.NET for building web
applications.
Virtual users
Memory utilisation
Response time
(MB) (msec) ASP.NET J2EE ASP.NET J2EE
1 1012 649 3.018 1.074
20 1105 685 3.341 3.322
40 1181 738 49.121 3.282
60 1202 833 56.975 3.161
80 1295 937 71.741 3.052
100 1314 1065 88.415 5.368
117
For identifying consumer requirements and opportunities that facilitate
increase in terms of profit margins, revenues, buying and selling, as an initial
step, consumer segmentation was performed on the gender attribute of
consumer data with Weka tool. k-means algorithm was implemented on
sample consumer transaction data given in Figure 5.2 as .csv file.
Figure.5.2 Consumer transaction data in .csv file format
Prod_no,Prod_name,Quantity,Gender,Date_of_ purchase,Brand_name 1,Shampoo,7,F,28/12/2012,Clinic Plus 2,Shampoo,5,M,28/12/2012,Garnier 3,Hair Conditioner,6,F,28/12/2012,Dove 4,Sugar,7,M,28/12/2012,Reliance 5,Flour,10,F,28/12/2012,Ashirwad 6,Toothpaste,5,M,29/12/2012,Meswak 7,Toothpaste,6,M,29/12/2012,Colgate 8,Toothbrush,5,F,30/12/2012,Colgate 9,Shampoo,7,M,30/12/2012,Meera 10,Hair Conditioner,6,F,30/12/2012,Sunsilk 11,Toothpaste,5,F,30/12/2012,CloseUp 12,Toothbrush,5,M,30/12/2012,OralB 13,Biscuits,10,M,31/12/2012,Good-day 14,Chocolates,25,M,31/12/2012,Amul 15,Chocolates,12,F,31/12/2012,5 star 16,Shampoo,5,F,31/12/2012,Garnier 17,Hair Conditioner,5,F,31/12/2012,Dove 18,Chewing gum,5,M,1/1/2013,Boomerag 19,Oil,5,F,1/1/2013,Saffola 20,Salt,4,M,1/1/2013,Ashirwad
118
Consumer details were stored in a comma separated value (.csv) file format.
Using Weka tool the customer data containing prod_no, prod_name, quantity,
gender and date_of_purchase was analysed. k-means clustering and Apriori
algorithms were implemented using the Weka data mining tool. The run
information of k-means algorithm for gender attribute is shown Figure 5.3.
== Run information ===
Scheme: weka.clusterers.SimpleKMeans -N 2 -A
"weka.core.EuclideanDistance -R first-last" -I 500 -S 10
Relation: purchases-
weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-last
Instances: 20
Attributes: 6
Prod_no
Prod_name
Quantity
Date_of_ purchase
Brand_name
Ignored:
Gender
Test mode: Classes to clusters evaluation on training data === Model and evaluation on training set === kMeans ====== Number of iterations: 2 Within cluster sum of squared errors: 68.0 Missing values globally replaced with mean/mode
119
Figure 5.3 Run information of k-means clustering performed on gender
attribute using Weka
Cluster centroids:
Cluster#
Attribute Full Data 0 1
(20) (14) (6)
=======================================================
Prod_no 1 6 1
Prod_name Shampoo Toothpaste Shampoo
Quantity 5 5 7
Date_of_ purchase 8/12/2012 11/12/2012 8/12/2012
Brand_name Garnier Colgate Clinic Plus
Time taken to build model (full training data) : 0.17 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 14 ( 70%)
1 6 ( 30%)
Class attribute: Gender
Classes to Clusters:
0 1 <-- assigned to cluster
7 3 | F
7 3 | M
Cluster 0 <-- F
Cluster 1 <-- M
Incorrectly clustered instances: 10.0 50 %
120
In Figure 5.3 cluster centroids were Prod_no 1 and 6. Two clusters- cluster 0
with 14 instances and 1 with 6 instances was generated. In cluster 0 there
were 7 males and 7 females and in cluster 1 there were 3 males and 3
females. In cluster 1 Colgate toothpaste was primary product whereas in
cluster1 it was clinic plus shampoo.
k-means clustering was also implemented using knowledge flow feature of
Weka and the run information is given in Table 5.4. The number of clusters
was predefined as 6 and clustering was performed on date of purchase using
sample data in Figure 5.3. The input data file had an extension of .arff
@relation custtran_clustered
@attribute Instance_number numeric
@attribute Prod_no numeric
@attribute Prod_name {Shampoo 'Hair Conditioner' Sugar Flour Toothpaste Toothbrush Biscuits Chocolates 'Chewing gum' Oil Salt}
@attribute Quantity numeric
@attribute Gender {F M}
@attribute 'Date_of_ purchase' {28/12/2012 29/12/2012 30/12/2012 31/12/2012 1/1/2013}
@attribute Brand_name {'Clinic Plus' Garnier Dove Reliance Ashirwad Meswak Colgate Meera Sunsilk CloseUp OralB Good-day Amul '5 star' Boomerag Saffola}
@attribute Cluster {cluster0 cluster1 cluster2 cluster3 cluster4 cluster5}
@data
121
Table 5.4 Consumer data segmented into 6 clusters
S NO
PROD NO
PRODUCT NAME
QTY GENDER PURCHASE DATE
CLUSTER CENTROID
CLUSTER
4 5 Flour 10 F 28/12/2012 Ashirwad cluster0
12 13 Biscuits 10 M 31/12/2012 Good-day cluster1
13 14 Chocolates 25 M 31/12/2012 Amul cluster1
14 15 Chocolates 12 F 31/12/2012 '5 star' cluster1
7 8 Toothbrush 5 F 30/12/2012 Colgate cluster2
10 11 Toothpaste 5 F 30/12/2012 CloseUp cluster2
15 16 Shampoo 5 F 31/12/2012 Garnier cluster2
18 19 Oil 5 F 1/1/2013 Saffola cluster2
2 3 'Hair Conditioner'
6 F 28/12/2012 Dove cluster3
9 10 'Hair Conditioner'
6 F 30/12/2012 Sunsilk cluster3
16 17 'Hair Conditioner'
5 F 31/12/2012 Dove cluster3
0 1 Shampoo 7 F 28/12/2012 'Clinic Plus' cluster4
1 2 Shampoo 5 M 28/12/2012 Garnier cluster4
3 4 Sugar 7 M 28/12/2012 Reliance cluster4
5 6 Toothpaste 5 M 29/12/2012 Meswak cluster4
6 7 Toothpaste 6 M 29/12/2012 Colgate cluster4
8 9 Shampoo 7 M 30/12/2012 Meera cluster5
11 12 Toothbrush 5 M 30/12/2012 OralB cluster5
17 18 'Chewing gum'
5 M 1/1/2013 Boomerag cluster5
19 20 Salt 4 M 1/1/2013 Ashirwad cluster5
122
The efficiency of k-means algorithm was increased to reduce running time on
large databases and also to get quality clusters. The KCUSTMH algorithm
reduced the running time from 25 seconds to approximately 17 seconds on a
database containing around 3500 transactions with 6 attributes. Machine
possessing 1 GB main memory and 1.83 GHz dual core processor with
windows XP service pack 2 as operating system was used. Figure 5.4 shows
the efficiency of KCUSTMH over traditional k-means algorithm.
Figure 5.4 Graph showing efficiency of KCUSTMH and
traditional k-means algorithm
Once, consumer segmentation was done it was treated as collection of
clustered association rules which are used in decision making. A two-
dimensional grid was formed with a set of two-attributes (in this case age and
salary) and corresponding association rules were determined as shown in
Figure 5.5. The goal was to find clusters that cover association rules within
123
this grid. These clusters represented association rules and also defined the
segmentation. Once association rules were discovered for a particular level of
support and confidence, grid of only those rules that gives information about
this group was formed.
The following four association rules were considered where RHS attribute
“Group label" was given value “1".
R1. (age = 27) ^ (salary = 41450) => (Group label = 1)
R2. (age = 28) ^ (salary = 55865) => (Group label = 1)
R3. (age = 28) ^ (salary = 47553) => (Group label = 1)
R4. (age = 27) ^ (salary = 51378) => (Group label = 1)
Age bins are assigned a1, a2,…, an and salary bins are assigned s1,
s2,…,sn, then these rules were binned to form corresponding binned
association rules:
R1. (age = a3) ^ (salary = s5) => (Group label = 1)
R2. (age = a4) ^ (salary = s6) => (Group label = 1)
R3. (age = a4) ^ (salary = s5) => (Group label = 1)
R4. (age = a3) ^ (salary = s6) => (Group label = 1)
Rules R1 through R4 were represented with a grid as shown in Figure 5.6.
Linear, adjacent cells were combined to form line segment, and this idea was
extended to rectangular regions. Representing the rules in the rectangular
grid transformed all four association rules into cluster of one rule as:
124
(a3 ≤ age < a4) ^ (s5 ≤ salary < s6) => (Group label = 1)
Figure 5.5 Clustering association rules using 2-D grid
Assuming the bin mappings shown in Figure 5.5, the final clustered rule output
was:
(27 ≤ age < 29) ^ (40000 ≤ salary < 60000) => (Group label = 1)
These associaton rules were then converted into a decsion tree for making
effective decisions.
Once consumer data was segmented next step was to study consumer buying
behaviours using Apriori algorithm which is given below:
salary
s7 70-80K
s6 50-60K
s5 40-50K
s4 30-40K
s3 20-30K
s2 10-20K
s1 <10 K
25 26 27 28 29 30
a1 a2 a3 a4 a5 a6
age
125
All data that is recorded in transaction database is fed as input for Apriori
algorithm which was implemented using Weka. Association rules were
generated on a given support and confidence measures. Association rules are
adopted to discover interesting relationship of purchased products and to gain
knowledge of transactions in a large dataset. Apriori is designed to operate on
databases containing transactions. Analysis of the run information of Weka
given in Figure 5.6 had given knowledge of frequent itemsets purchased by
the consumer for a given support and confidence.
=== Run information ===
Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
Relation: purchases-weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-last
Instances: 20
Attributes: 6
Prod_no
Prod_name
Quantity
Gender
Date_of_ purchase
Brand_name
=== Associator model (full training set) ===
Apriori
=======
Minimum support: 0.1 (2 instances)
Minimum metric <confidence>: 0.9
126
Figure 5.6 Run information of Apriori algorithm implementation using Weka
Number of cycles performed: 18
Generated sets of large itemsets:
Size of set of large itemsets L (1): 20
Size of set of large itemsets L (2): 33
Size of set of large itemsets L (3): 7
Generated sets of large itemsets:
Size of set of large itemsets L (1): 20
Size of set of large itemsets L (2): 33
Size of set of large itemsets L (3): 7
Best rules found:
1. Prod_name=Hair Conditioner 3 ==> Gender=F 3 conf :(1)
2. Brand_name=Garnier 2 ==> Prod_name=Shampoo 2 conf :(1)
3. Brand_name=Dove 2 ==> Prod_name=Hair Conditioner 2 conf :(1)
4. Date_of_ purchase=29/12/2012 2 ==> Prod_name=Toothpaste 2 conf :(1)
5. Prod_name=Toothbrush 2 ==> Quantity=5 2 conf :(1)
6. Prod_name=Toothbrush 2 ==> Date_of_ purchase=30/12/2012 2 conf :(1)
7. Prod_name=Chocolates 2 ==> Date_of_ purchase=31/12/2012 2 conf :(1)
8. Brand_name=Garnier 2 ==> Quantity=5 2 conf :(1)
9. Brand_name=Dove 2 ==> Gender=F 2 conf :(1)
10. Date_of_ purchase=29/12/2012 2 ==> Gender=M 2 conf :(1)
127
Analysis of consumer buying behaviours enabled the company to improve
support of their consumer oriented business processes, which was aimed to
improve overall performance of the enterprise.
Data mining methodology is very useful to extract hidden knowledge and
information. Significant product association rules are also identified within
each segment by applying correlation analysis using . Product association
correlation analysis using is determined to know whether associations are
positively or negatively correlated with one another. Product association rules
are used to motivate consumers to increase their purchases and keep them
loyal to the company. Behaviour of consumers is easily identified. Association
mining of frequent itemsets is also performed using vertical data format as
shown in Table 3.5. Mining is performed on consumer data to find which of
the transactions involve 2-itemsets, 3-itemsets etc. and largest itemsets for a
given support and confidence. Transactions are then analysed for effective
decision making.
Apriori algorithm was helpful to find best association rules. Most frequent
itemsets were easily found from the consumer database. It helps in product
and price bundling. A dashboard given in Figure 5.7 was built to identify items
generating best sales. This dashboard also identified which items had better
buying and selling opportunities. The dashboard was built using MS Excel
2007 spreadsheet using traffic light signals option. Green lights indicated best
128
selling items and red lights indicated items that were not selling well. Average
selling items were indicated by orange lights.
Figure 5.7 Dashboard projecting the behavior of sales of different items
To improve consumer service, online consumer support systems and to build
loyalty to position as a competitive advantage consumer satisfaction was
initially studied with the enterprise using MUSA method. MUSA (MUlticriteria
Satisfaction Analysis) method was integrated with the rough set theory for
analysing consumer satisfaction with the enterprise on reduct attributes.
129
MUSA method was implemented on more than 200 consumers in 3 places of
Hanamkonda city. Survey was done on randomly selected consumers in 3
super markets of Reliance fresh located in 3 places of the city.
Initially a set of global satisfaction criteria was identified on which the
consumers were pondering upon to express their satisfaction on the
enterprise. The global criteria identified here were product, personnel, physical
appearance of malls and place which were termed as 4Ps.
Global consumer satisfaction criteria considered here were 4Ps which had the
following characteristics or sub criteria, namely:
1. Personnel: This criterion included all characteristics relating to
personnel, (their skills and knowledge, responsiveness, friendliness,
communication and collaboration with consumers, etc).
2. Product: This criterion refers mainly to offered products (quality and
quantity, variety of products, and prices).
3. Physical Appearance: This criterion refers to service offered to
consumers. It includes appearance and cleanliness of the stores,
waiting time during busy and non-busy hours, and service time.
130
4. Place: Location and number of stores and parking availability are
included in this criterion.
A survey questionnaire form as shown in Figure.4.2 with 5 levels of
satisfaction was designed to identify the global satisfaction levels on 4Ps. The
5 levels of satisfaction were CS-Completely Satisfied, VS-Very much
Satisfied, D-Dissatisfied, S-Satisfied and CD-Completely dissatisfied.
Table 5.5 shows the satisfaction opinion of 20 sample consumers obtained on
1.Personnel, 2.Product, 3.Physical Appearance and 4.Place criteria.
131
Table 5.5 Survey data of sample 20 consumers on global criteria
Consumer 1 2 3 4 1 CD S S S
2 VS D S S
3 D VS VS VS
4 S S S S
5 D S S S
6 S VS VS VS
7 CD CS D VS
8 S CS VS CS
9 S VS S VS
10 D VS S S
11 CS CD S VS
12 VS VS D VS
13 D VS S S
14 S CS D VS
15 D VS S S
16 S S S S
17 VS S S VS
18 S VS S VS
19 S S S S
20 S VS S S
132
Table 5.6 Overall satisfaction results on global criteria
Using simple additive formulas as mentioned in MUSA method, average
satisfaction index (ASI) and average demanding index (ADI) were calculated
as shown in Table 5.6. From the results obtained from Table 5.6 it was
concluded that average global satisfaction index is approximately 91%, while
company’s performance according to whole set of criteria varied between 87%
and 92%. Even though satisfaction values were encouraging, considering the
high competitive conditions of the market, this performance was not relatively
high.
Sno Criteria Weight (%) ASI
(%) ADI (%)
1 Product 45.20 86.56 -68.61
2 Physical Appearance 25.00 88.08 -66.21
3 Personnel 22.00 92.44 -75.20
4 Place 10.70 87.19 -85.01
Global satisfaction 90.81 -73.20
133
The following are the findings from analysing the above results:
Average global satisfaction index was approximately 90%, while
company’s performance according to whole set of criteria varied
between 86% and 92%.
Given high competitive conditions of market, this performance could not
be considered relatively high.
Consumers considered “Product” as most important criterion, with
significant importance level of 45.2%. They did not give much
importance to remaining criteria.
Low weight for “Place” criterion stated that consumers were least
bothered about parking facilities and location and also main competitors
had no better performance in this particular criterion.
Consumer satisfaction opinion on each of the sub criteria of the global criteria
is given in Table 5.7. Along with the global criteria consumers were also
adjudged for satisfaction on sub criteria for which a separate questionnaire on
sub criteria was given to them. From the results on sub criteria satisfaction the
following conclusions were made on the consumer satisfaction.
“Personnel’s friendliness” added up to competitive advantage of company.
One personnel sub criteria that was most appealing to most consumers
was “dress code” through which sales executives were easily
distinguishable.
134
“Quality” of product was one strongest point of the company, although
consumers did not seem to be satisfied according to quantity of product.
This result was related to low satisfaction index appearing for “Price”
criterion.
Major consumers opined that attention should be paid to waiting time
during busy hours (between 7:30 AM and 10:30 AM) and service time as
well.
Alternatively, appearance of malls (infrastructure and arrangement of
products) seemed to be the competitive advantage for company.
Satisfaction level with respect to “Place” criterion could have been higher, if
consumers were provided proper parking facilities (there is no proper
parking facility).
135
Table 5.7 Sub criteria satisfaction results
Sno Sub criteria Weight (%)
ASI (%)
ADI (%)
1 Skills/Knowledge 34.00 93.10 -71.30
2 Responsiveness 15.90 83.60 -62.10
3 Friendliness 50.10 94.80 -82.00
4 Quality 49.80 90.40 -84.40
5 Quantity 48.30 88.30 -40.90
6 Variety 25.00 90.90 -68.50
7 Prices 11.90 71.70 -33.70
8 Appearance of stores 42.80 90.80 -81.00
9 Waiting time (busy hours) 8.50 68.60 -29.80
10 Waiting time (non-busy
hours)
19.20 90.90 -69.40
11 Service time 8.30 74.10 -28.80
12 Cleanliness 21.20 93.30 -62.70
13 Location of stores 87.10 90.70 -93.40
14 Number of stores 6.80 81.10 -42.30
15 Parking 6.10 43.90 -12.90
136
Overall analysis of results on main and sub criteria proved that consumers
were very much satisfied with Reliance fresh but owing to high competitive
conditions of the market especially from malls like Spencer’s, Aditya Birla’s
More, etc. this performance could not be considered relatively high. Average
demanding index of the results revealed this fact.
Rough set theory was implemented on the consumer data shown in Table 5.8.
It was initialised in the following way: Given data was considered as a set of
all consumer objects B.
Given the set
B = {C1, C2, C3, C4, C5,C6, C7, C8, C9, C10, C11, C12, C13, C14, C15,
C16, C17, C18, C19, C20}
Set conditional attributes are represented by
C= {Personnel, Product, Physical Appearance} and
Set D represented decision attribute,
where D= {Satisfaction}.
137
Table 5.8 Sample consumer data with condition and decision attributes
Conditional Attributes Decision Attribute
Consumer Personnel Product Physical Appearance
Satisfaction
C1 D D G No
C2 D D VG No
C3 D D E Yes
C4 D S VG Yes
C5 D S E Yes
C6 S S VG Yes
C7 S S E Yes
C8 D D VG No
C9 S D E Yes
C10 S D VG No
C11 S D E No
C12 D S G No
C13 D S VG Yes
C14 D S G No
C15 S S G No
C16 S D G No
C17 S D VG No
C18 S S E Yes
C19 S D G No
C20 D S G No
138
Nominal values of given attributes are specified in Table 5.9:
Table 5.9 Nominal values of the consumer sample data
Indiscernibility relation is the relation between two objects or more, where all
values are identical in relation to subset of considered attributes. In the Table-
5.9, it was observed that set C was composed of attributes that were directly
related to consumers’ preferences, namely C= {Personnel, Product, Physical
Appearance}. Indiscernibility relation is given by INDA(C). Data in the Table
5.8 was then broken down into 3 conditional attributes. Because of low weight
4th attribute was not considered.
Attributes Nominal Values Conditional Attributes
Personnel Dissatisfied (D),
Satisfied (S)
Product D Dissatisfied,
Satisfied (S)
Physical Appearance
Good(G),
Very Good(VG),
Excellent (E)
Decision Attributes
Satisfaction Yes
No
139
Personnel attribute generated two indiscernibility elementary sets:
INDA ({Personnel}) = {{C1, C3, C4, C5, C8, C12, C13, C14, C20},
{C6, C7, C9, C10, C11, C15, C16, C17, C18, C19}}.
Product attribute generates two indiscernibility elementary sets:
INDA ({Product}) = {{C1, C2, C3, C8, C9, C10, C11, C16, C17, C19},
{C4, C5, C6, C7, C12, C13, C14, C15, C18, C20}}
Physical Appearance attribute generates three indiscernibility elementary sets:
INDA ({Physical Appearance}) = { {C2, C4, C6, C8, C10, C13, C17},
{C3, C5, C7, C9, C11, C18},
{C1, C12, C14, C15, C16, C19, C20}}
Data in the Table 5.8 was rearranged based on decision attribute as shown in
Table 5.10.
140
Table 5.10 Sample consumer data organized w.r.t. decision attribute
Customer Personnel Product Physical Appearance
Satisfaction
C1 D D G No
C2 D D VG No
C8 D D VG No
C10 S D VG No
C11 S D E No
C12 D S G No
C14 D S G No
C15 S S G No
C16 S D G No
C17 S D VG No
C19 S D G No
C20 D S G No
C3 D D E Yes
C4 D S VG Yes
C5 D S E Yes
C6 S S VG Yes
C7 S S E Yes
C9 S D E Yes
C13 D S VG Yes
C18 S S E Yes
141
Lower and upper approximations of the set were interior and closure
operations in the topology generated by indiscernibility relation.
Approximations were applied on Table 5.8, from which it was observed as
follows:
Lower Approximation set B*
- Lower Approximation set (B*) of consumers who were definitely satisfied
were identified as B* = {C3, C4, C5, C6, C7, C13, C18}
- Lower Approximation set (B*) of consumers who certainly had no satisfaction
were identified as B* = {C1, C2, C8, C10, C12, C14, C15, C16, C17, C19,
C20}
Upper Approximation set B*
- Upper Approximation set (B*) of consumers who had satisfaction were
identified as B* = {C3, C4, C5, C6, C7, C9, C13, C18}
- Upper Approximation set (B*) of consumers who had no satisfaction were
identified as B*= {C1, C2, C8, C10, C11, C12, C14, C15, C16, C17, C19, C20}
Boundary Region (BNB(X))
- Boundary Region (B*) of consumers who had no satisfaction were identified
as:
BNB(X) ={C1,C2,C8,C10,C11,C12,C14,C15,C16,C17,C19,C20} –
{C1, C2, C8, C10, C12, C14, C15, C16, C17, C19, C20} = {C11};
142
Boundary Region (B*), set of consumers who had satisfaction were identified
as:
BNB(X) = {C3,C4,C5,C6,C7, C9, C13,C18} - {C3,C4,C5,C6,C7,C13,C18}
= {C9}
Boundary Region (BNB(X)), set constituted by elements C9 and C11, could
not be classified, since they possessed same characteristics, but differ in
decision attribute.
Two coefficients of quality of approximation are calculated as follows:
Imprecision coefficient -
• with likelihood of satisfaction αB(X) = 7/8;
• with likelihood of no satisfaction αB(X) = 8/12.
Quality coefficient of upper approximation
• αB (B*(X)) =8/20, for consumers who had likelihood of satisfaction.
• αB (B*(X)) =11/20, for consumers who had likelihood of no satisfaction.
Quality coefficient lower approximation
• αB (B*(X)) =7/20, for consumers who had likelihood of satisfaction;
• αB (B*(X)) =8/20, for consumers who had likelihood of no satisfaction.
143
Observations:
Consumers with satisfaction: αB (B*(X)) =7/20, that is, 35% of
consumers were with likelihood of satisfaction.
Consumers who don't have satisfaction: αB(B*(X)) = 11/20, that is,
approximately 55% of consumers with likelihood of no satisfaction.
10% of consumers (C9 and C11) could be classified neither with
satisfaction nor with no satisfaction, since characteristics of all attributes
were same, with only decision attribute (satisfaction) not being identical.
This generated inconclusive analysis for satisfaction.
`
Data reduction
Data presented in Table 5.8 must guarantee that redundancy is avoided as it
involves minimisation of complex computations in relation to creation of rules
to aid knowledge extraction. Redundancies in Table 5.8 were treated using
reduct concept, without altering indiscernibility relations. A reduct is a set of
necessary minimum data, given that original proprieties of the system or
information table were maintained. Hence, reduct has the capacity to classify
objects, without altering the form of representing knowledge.
Verifying inconclusive data
Analysis of data contained in Table 5.8 showed that preferences of consumer
C9 and consumer C11 are both inconclusive, since they possess equal values
144
of conditions attributes together with value of decision attribute that was
different. Therefore, data of consumer C9 and consumer C11 was excluded
from Table- 5.8.
Verifying equivalent information
Analysis of data contained in Table5.8 showed that it possessed equivalent
information as given below:
C2 D D VG No
C8 D D VG No
C4 D S VG Yes
C13 D S VG Yes
C7 S S E Yes
C18 S S E Yes
C10 S D VG No
C17 S D VG No
C12 D S G No
C14 D S G No
C20 D S G No
C16 S D G No
C19 S D G No
145
Hence, by analysing above data it was concluded that Table 5.8 is presented
in the form of reduced data as shown in Tables 5.11 through 5.14:
Process of reduction of Table 5.8 is presented below and, it was observed that
data was of discrete type.
Table 5.11 Reduct Information of sample consumer data
Customer Personnel Product Physical Appearance
Satisfaction
C1 D D G No
C2 D D VG No
C3 D D E Yes
C4 D S VG Yes
C5 D S E Yes
C6 S S VG Yes
C7 S S E Yes
C8 D D VG No
C10 S D VG No
C12 D S G No
C15 S S G No
C16 S D G No
C19 S D G No
146
Table 5.12 Analysis of condition attributes with Personnel criteria
Customer Personnel Satisfaction C1 D No
C2 D No
C3 D Yes
C4 D Yes
C5 D Yes
C6 S Yes
C7 S Yes
C8 D No
C10 S No
C12 D No
C15 S No
C16 S No
C19 S No
147
Table 5.13 Analysis of condition attributes with Product criteria
Customer Product Satisfaction C1 D No
C2 D No
C3 D Yes
C4 S Yes
C5 S Yes
C6 S Yes
C7 S Yes
C8 D No
C10 D No
C12 S No
C15 S No
C16 D No
C19 D No
148
Table 5.14 Analysis of Physical Appearance attribute
Customer Physical Appearance
Satisfaction
C1 G No
C2 VG No
C3 E Yes
C4 VG Yes
C5 E Yes
C6 VG Yes
C7 E Yes
C8 VG No
C10 VG No
C12 G No
C15 G No
C16 G No
C19 G No
From this analysis of reduct attributes as tabulated in Tables 5.12, through
5.14 it was concluded that no data was excluded.
Analysis of condition attributes in Table 5.8, revealed that same data existed
in Tables 5.15 through 5.21.
149
Table 5.15 Analysis of attributes Personnel and Product
Customer Personnel Product Satisfaction C1 D D No
C2 D D No
C3 D D Yes
C4 D S Yes
C5 D S Yes
C12 D S No
C6 S S Yes
C7 S S Yes
C8 D D No
C16 S D No
C19 S D No
C10 S D No
C15 S S No
150
Table 5.16 Analysis of Personnel and Physical Appearance
Customer Personnel Physical Appearance
Satisfaction
C1 D G No
C12 D G No
C2 D VG No
C8 D VG No
C3 D E Yes
C5 D E Yes
C7 S E Yes
C4 D VG Yes
C6 S VG Yes
C10 S VG No
C15 S G No
C16 S G No
C19 S G No
151
Table 5.17 Analysis of Attributes Product and Physical Appearance
Customer Product Physical Appearance
Satisfaction
C1 D G No
C16 D G No
C19 D G No
C2 D VG No
C8 D VG No
C10 D VG No
C3 D E Yes
C4 S VG Yes
C6 S VG Yes
C5 S E Yes
C7 S E Yes
C12 S G No
C15 S G No
The reduct information of tables 5.15 through 5.17 is generated as given in
Tables 5.18through 5.20 and the entire reduct of all the data is given in Table
5.21.
152
Table 5.18 Reduct of Personnel and Product
Customer Personnel Product Satisfaction C1 D D No
C3 D D Yes
C4 D S Yes
C6 S S Yes
C10 S D No
C15 S S No
Table 5.19 Reduct of Personnel and Physical Appearance
Customer Personnel Physical Appearance
Satisfaction
C1 D G No
C2 D VG No
C3 D E Yes
C4 D VG Yes
C6 S VG Yes
C10 S VG No
C15 S G No
153
Table 5.20- Reduct of Product and Physical Appearance
Customer Product Physical Appearance Satisfaction
C1 D G No
C2 D VG No
C3 D E Yes
C4 S VG Yes
C5 S E Yes
C12 S G No
Table 5.21 Final reduct information of sample consumer transactions
Decision rules
The information reduct in Table 5.21., generated necessary decision rules
R1,R2 and R3 to aid satisfaction analysis of consumers.
Customer Personnel Product Physical Appearance
Satisfaction
C1 D D G No
C3 D D E Yes
C4 D S VG Yes
154
Rule-1
R1: If (for) Consumer, Personnel = “D” and Product = “D” and Physical
Appearance = “G” then Satisfaction= “No”.
Rule-2
R2: If (for) Consumer, Personnel = “D” and Product = “D” and Physical
Appearance = “E” then Satisfaction = “Yes”.
Rule-3
R3: If (for) Consumer, Personnel = “D” and Product = “S” and Physical
Appearance = “G” then Satisfaction = “Yes”.
Rules R1 and R2 are nearly one and the same except for the variation in the
satisfaction levels on the physical appearance of the malls. Rules R1 and R3
states the importance of product as identified in the MUSA method.
In this way rough set theory was helpful to identify influence of reduct
attributes (3 P’s namely Personnel, Product, and Physical Aranceppe) on
consumer satisfaction and the effect of product attribute proved to be
significant. 4th P was omitted because of low weight.
155
Analysing consumer loyalty
Transaction data of about 120 consumers was considered and following
attributes of the consumer data set were taken into account:
1. Name 2. Gender 3. Age 4. Mobile number 5. Address 6. Average
yearly income 7. Category.
Normalised value for consumer loyalty and satisfaction was calculated as Xi=
(Xi – Xmin) / (Xmax– Xmin), where Xi is normalised value of consumer
loyalty/satisfaction, Xmin minimum number of visits and Xmax is maximum
number of visits. Sample consumer data taken for this study is shown in Table
5.22.
Table 5.22 - Sample data set transaction format
Gender Age Income Customer Satisfaction
Customer Loyalty
No.of consumers
M 41 to 50 30,000 ~35,000 0.75 0.75 42
M 21 to 25 25,000 ~30,000 0.65 0.84 45
F 25 to 42 21,5000~25,000 0.77 0.69 18
M 26 to 32 20,000 ~26,500 0.51 0.84 15
156
Data preprocessing was done in order to clear noisy and inconsistent data.
Based on grid partition method, consumer satisfaction and consumer loyalty
were obtained from fuzzy values.
On minimum confidence (min_conf, equation 5) of 30% and support (min_sup,
equation 4) of 40% the following “if…then… rules” were derived where CF(R)
is certainty grade of R and R is called fuzzy classification rule.
1. IF (Age >= 41 AND Gender = “M” AND Average yearly income = 30K ~
35K AND Customer satisfaction = “VS” AND Customer loyalty = “H”)
THEN CF = 0.77.
2. IF (Age = 26 ~30 AND Gender = “F” AND Average yearly income = 20K
~ 25K AND Customer satisfaction = “VS” AND Customer loyalty = “M”)
THEN CF = 0.66.
3. IF (Age = 21 ~ 25 AND Gender = “M” AND Average year income = 26K
~ 30K AND Customer satisfaction = “VS” AND Customer loyalty = “M”)
THEN CF = 0.62.
157
4. IF (Age = 21 ~ 25 AND Gender = “M” AND Average year income = 20K
~ 25K AND Customer satisfaction = “S” AND Customer loyalty = “H”)
THEN CF = 0.58.
5. IF (Age = 36 ~ 40 AND Gender = “F” AND Average year income = 20K
~ 25K AND Customer satisfaction = “S” AND Customer loyalty = “M”)
THEN with CF = 0.49.
From the above rules it was identified that consumer loyalty varied between
“High” and “Medium” for different income groups of “Male” and “Female”.
Since overall satisfaction is good there is no low level consumer loyalty in
either case. This consumer loyalty placed Reliance fresh on a competitive
advantage over the other private super markets like Spencer’s, More etc. of
Hanamkonda City.
Analysing Consumer Service
In order to extend a better consumer service opinion of 120 consumers was
taken in Consumer Service Survey form shown in 4.4. The opinion of these
consumers is given in the Table 5.23.
158
Table 5.23 Opinion of consumer service survey
SNO Customer Service
No. of respondents
%
1 Excellent 42 35
2 Good 45 38
3 Average 18 15
4 Fair 10 8
5 Poor 5 4
Study of consumer service survey given in Table 5.23 revealed the following
facts:
1. 35% of consumers expressed that service was Excellent, 38% as Good,
15% as Average, 8% as Fair, and 4% as Poor.
2. Since major consumers (73%) opined that service extended was
Excellent/Good, it revealed that consumers were very much contented with
the existing service.
By using Bayesian classification (Bayes’ theorem) it was determined that
“Media mix” was also useful to extend online consumer support. This is
attributed to the fact more than 75% consumers had e-mail IDs and more than
90% consumers had mobile numbers.
159
From the initial studies (objective 1) it was observed that 62% preferred e-mail
as better communication channel. From Bayes theorem it was estimated that
75% of the consumers who preferred e-mail as communication channel also
opted it for online consumer support (0.75 * 0.62 / (0.75 * 0.62 + 0.25 *0.62) =
0.465/ (0.465+0.155) =0.465 /0.62 =75%). Similar is case mobile phones.
Hence it was concluded that online support can as well be extended to
consumers using “Media mix”.
By employing survey methods and applying various data mining techniques of
business intelligence like k-means clustering, Apriroi algorithm, association
rules, rough set theory, fuzzy logic and Bayes’ theorem objectives of this
research were analysed.
160
6. CONCLUSIONS AND FUTURE WORK
6.1. Conclusions
In this research an attempt was made to study the influence of business
intelligence (BI) techniques in Consumer Relation Management (CRM). Data
mining techniques of business intelligence were productive in understanding
the consumer buying behaviors and also to determine the consumer
satisfaction and loyalty towards the enterprise. By applying data mining
techniques of business intelligence it is concluded that better decisions are
made to improve the businesses.
This research also established that two major platforms namely .NET and
J2EE are immensely useful in building customized CRM software. User’s
opinion on these platforms ascertained that they are satisfied with the existing
features of these platforms but demanded improvements. Their average
satisfaction and demanding index revealed this fact.
The findings of this research are summarized as follows:
1. Preferred communication “Media mix” of the consumers was in the order -
e-mails (63%), face to face (49%), SMS (43%), live chat (25%) and news
161
letter (16%). Inclination towards e-mails and SMS is attributed to the
growth in use of internet and mobile phones.
2. A comparative study of .NET and J2EE platforms revealed that 59% of the
users of these platforms favoured the use of .NET and 63% favored J2EE
for building CRM software. This increased use of J2EE is attributed to the
growing popularity of free and open source software (FOSS) and J2EE
being one among them. User friendliness of .NET over J2EE makes the
users also to prefer .NET equally with J2EE.
3. Extensive study of buying behaviors is essential to find consumer
requirements and opportunities that facilitate increase in terms of profit
margins, revenues, buying and selling. For this, consumer segmentation,
association rules and identifying frequently bought itemsets are important.
Weka data mining tool was very useful for consumer segmentation using
k-means algorithm and also in finding frequently bought itemsets using
Apriori algorithm.
To identify opportunities that facilitate increase in terms of profit margins,
revenues, buying and selling extensive study of consumer satisfaction on
the enterprise was essential. MUSA method and rough set theory were
found valuable in exploring the consumer satisfaction.
162
Average global consumer satisfaction index was approximately 90%, while
company’s performance according to whole set of satisfaction criteria
(Product, Personnel, Physical Appearance and Place – 4Ps) varied
between 86% and 92%. Because of the high competitive conditions of the
market this performance could not be considered relatively high. Most
important criterion with a significant importance level of 45.2% was the
“product”. Consumers did not consider important rest of criteria.
Average demanding index of 73% indicated that consumers demand more
improvement in the business process of the company. Higher value of
demanding index indicates that consumer’s expect a better business
process than the existing. Hence the business process must be improved
further to make the consumer more satisfied.
4. Fuzzy set theory and grid partition methods were useful in determining the
consumer loyalty.
High consumer loyalty puts an enterprise in a competitive advantage over
similar enterprises in city. Consumer loyalty, good consumer service and
support help in consumer retention. This retention helps the enterprise to
calculate consumer life time value with which better decisions are made to
improve the profitability of the company.
163
Consumers were satisfied with the present services and support extended
to them. Advancements in information technology are forcing enterprises
and consumers to explore new opportunities in online selling and buying
(trends are growing towards e-business and e-CRM).
The above conclusions are liable to change from time to time and place to
place since business process is highly agile in nature.
Finally to conclude, in this research work, attempt to implement business
intelligence techniques in CRM to attain profitability of the enterprise and to
position it as a competitive advantage had proved worthwhile and productive.
6.2. Limitations of the study:
The main limitation in this study was consumer segmentation using k-means
algorithm. Ideal value of k ranges between 2 to 10. But for large databases the
efficiency of k-means algorithm got reduced due to the frequent calculation of
euclidean distances to form new clusters. This was overcome by the use of
scapegoat tree and max heap which resulted in a new algorithm namely
KCUSTMH algorithm.
164
Use of scapegoat tree also has its limitation. In contrast to self-balancing
search trees, scapegoat trees are entirely flexible as to their balancing. They
support any value of α such that 0.5 < α < 1. High α value results in fewer
balances, making insertion quicker but lookups and deletions slower, and vice
versa for low values α. Therefore in practical applications, α is chosen
depending on how frequently these actions should be performed.
6.2. Future Work and Suggestions
1. Consumer satisfaction analysis is effective if study is based on marketing
mix and their individual factors.
2. Attempt is also made to study effect of physical attributes of consumers
such as height, weight, body color, blood group, eye color, hair style on
their buying behaviours and profitability of company. Biometric and RFID
devices are useful for collecting such data.
3. Future information technology relies on advanced technologies like
analytics, big data, android platform etc. Future studies are aimed to
implement these concepts in CRM and business intelligence.
165
4. Consumer data is growing day by day. For databases involving big data,
mining algorithms such as Balanced Iterative Reducing and Clustering
using Hierarchies (BIRCH), Clustering Using Representatives (CURE), will
be effective. They overthrow benefits of k-means clustering. Hence future
studies may be proposed to implement and improve their efficiencies.
166
7. BIBLIOGRAPHY / REFERENCES
[1] AgrawaI. R. and Srikant. R., “Mining sequential patterns”, proceedings
of International. Conference on Data Engineering, Taipei, Taiwan, 1995.
[2] AgrawaI. R., Imielinski. T., Swami. A. , “Mining association rules
between sets of items in large databases”, Proc. ACM-SIGMOD,
International Conference on Management of Data, pp. 207-216,
Washington, D.C., May 1993.
[3] Agrawal. R. , Srikant. R., “Fast Algorithms for Mining Association Rules
in Large Databases”, Proceedings of the 20th International Conference
on Very Large Data Bases, (VLDB'94), pp. 478-499, 1994.
[4] Bagirov.A.M., Mardaneh.K., “Modified global k-means algorithm for
clustering in gene expression datasets”, WISB’06, Australian Computer
Society, Inc., Darlinghurst, Australia, pp.23–28, 2006.
[5] Bentley.J.L, “Multidimensional Binary Search Trees Used for
Associative Searching”, Comm. ACM, vol. 18, pp. 509-517, .1975.
[6] Chris Rygielski, Jyun-Cheng Wang, David Yen. C.,“Data Mining
Techniques For Customer Relationship Management”, Issue 2, Vol., 24,
, Elsevier Science Ltd., ISSN: 0160-791X, pp483-502, 2002.
[7] Christian Borgelt,, “Efficient Implementations of Apriori and Eclat”,
proceedings 1st IEEE ICDM Workshop on Frequent Item Set Mining
Implementations(FIMI) Melbourne, pp 1-9, 2003.
167
[8] Corazza .M., Funari. S., Gusso. R., “An evolutionary approach to
preference disaggregation in a MURAME-based credit scoring
problem”, ISSN: 2239-2734, 2012.
[9] E.W.T. Ngai , Li Xiu, D.C.K. Chau, “Application of data mining
techniques in customer relationship management: A literature review
and classification, Expert Systems with Applications”, Issue 36,
,Elsevier Ltd. ISSN: 0957-4174, 2009.
[10] Gangadhara Rao. N.V.B., Sirisha Aguru ,“A Hash based Mining
Algorithm for Maximal Frequent Item Sets using Double Hashing”,
Journal of Advances in Computational Research: An International
Journal, Vol. 1 No. 1-2, pp1-6, 2012.
[11] Grigoroudis.E. ,. Siskos.Y. Christina Diakaki , “Preference
Disaggregation For Measuring And Analysing Customer Satisfaction:
The MUSA Method”, European Journal of Operational Research, pp 1-
41, 2001.
[12] Habul. A., “Business intelligence and customer relationship
management”, IEEE Conference Publications, ISSN: 1330-1012, pp
169 – 174, 2010.
[13] Han Jiawei , Yongjian Fu, “Discovery of Multiple-Level Association
Rules from Large Databases”, Proceedings of the 21st VLDB
Conference, Zurich, Switzerland, pp 420-431, 1995.
[14] Han. J., Pei. J,Yin. Y., “Mining Frequent Patterns without Candidate
168
Generation”, ACM SIGMOD, pp. 1-12, 2000.
[15] Han.J., Kamber.M., ”Data Mining Concepts and Techniques”, Morgan
Kaufmann Publishers, San Francisco, 2006.
[16] Hong Tzung Pei, Huang Tzu Jung ,Chang Chao Sheng , “Mining
Multiple-level Association Rules Based on Pre-large Concepts, Data
Mining and Knowledge Discovery in Real Life Applications”, ISBN 978-
3-902613-53-0, pp. 438, 2009.
[17] Hsieh Nan-Chen , Chu Kuo-Chung ,” Enhancing Consumer Behavior
Analysis by Data Mining Techniques”, International Journal of
Information and Management Sciences, Vol.17, No. 2, pp 39-53, 2009.
[18] Imielinski. T. , Mannila. H,” A database perspective on knowledge
discovery”, Communications of ACM., 39:58-64, 1996.
[19] Iqbal Asad, Ullah Naeem, “J2EE vs. Microsoft.NET- A Comparison of
two platforms for component-based development of web applications”,
pp 1-60, 2010.
[20] Isakki P. , Rajagopalan .S.P., “Mining Unstructured Data using
Artificial Neural Network and Fuzzy Inference Systems Model for
Customer Relationship Management”, IJCSI International Journal of
Computer Science, Vol. 8, Issue 4, No. 1, ISSN: 1694-0814, pp 630-
634, 2011.
[21] Isakki.P. , Rajagopalan. S.P., “Analysis of Customer Behavior using
Clustering and Association Rules”, International Journal of Computer
Applications, Vol. 43, No. 23, , ISSN: 0975 – 8887, pp 19-26, 2012.
169
[22] Ishibuchi. H., Nakashima.T., Yamamoto. T., “Fuzzy association rules
for handling continuous attributes,” proceedings of IEEE International
Symposium on Industrial Electronics, Pusan, Korea, pp.118-121, 2001.
[23] Ishibuchi. H., Yamamoto. T., Nakashima,.T.,“Fuzzy data mining: effect
of fuzzy discretization”, proceedings of the 1st IEEE International
Conference on Data Mining, San Jose, USA, pp.241-248, 2001.
[24] Ishibuchi.H, Nakashima.T., Murata.T., “Performance evaluation of
fuzzy classifier systems for multidimensional pattern classification
problems”, IEEE Transactions on Systems, Man, and Cybernetics, Vol.
29, no. 5, pp.601-618, 1999.
[25] Ishibuchi.H., Nozaki. K., Yamamoto. N., Tanaka.H, “Selecting fuzzy if-
then rules for classification problems using genetic algorithm”, IEEE
Transactions on Fuzzy Systems, Vol. 3, No. 3, pp.260-270, 1995.
[26] Jain. A. K, Murty. M. N., and Flynn. P. J., “Data Clustering: A Review,”
ACM Computing Survey, Vol. 31, No. 3, pp. 264-323, 1999.
[27] Joao Isabel M, Costa Carlos A Bana e, Figueria Jose Rui, “An
alternative to MUSA method for customer satisfaction analysis”, Vol.20
ISSN-1646-2955, pp 1-28, 2007.
[28] Jones. T. O., Jr. Sasser. W. E., “Why Satisfied Customer Defect,”
Harvard Business Review, Vol. 73, No. 6, pp. 88-99, 1995.
[29] Jong Soo Park, Ming-Syan Chen, Pilip S Yu, “An Effective Hash
Based Algorithm for Mining Association Rules”, ACM SIGMOD Record,
Vol. 24, Issue 2, pp: 175- 186, ISSN: 0163-5808, 1995.
170
[30] Kalra Shipra, Gupta Rachika, “Data Mining:A Tool for the
Enhancement of Banking Sector”, IFRSA, International Journal of Data
Warehousing & Mining, IIJDWM, Vol.1, pp.204-208, 2011.
[31] Kanungo. T., Mount. D.M., Netanyahu. N.S., Piatko. C., Silverman. R.,
Wu A.Y., ,” An efficient k-means clustering algorithm: Analysis and
implementation”, IEEE Transaction on Pattern Analysis and Machine
Intelligence,Vol.24, 2002.
[32] Khan Aurangzeb, Baharudin Baharum, Khan Khairullah,” Mining
Customer Data For Decision Making Using New Hybrid Classification
Algorithm”, Journal of Theoretical and Applied Information Technology
,Vol. 27, No. 1, ISSN: 1817-3195, pp 54-61, 2011.
[33] Krishnamurthy M., Kannan. A. , Baskaran .R., Deepalakshmi .R.,
“Frequent Item set Generation Using Hashing-Quadratic Probing
Technique” , “European Journal of Scientific Research ISSN 1450-216X
Vol.50 No.4 pp. 523-532, 2011.
[34] Kumar Rajeev, Puran Rajeshwar , Dhar Joydip, “Enhanced K-Means
Clustering Algorithm Using Red Black Tree and Min-Heap”,
International Journal of Innovation, Management and Technology, Vol.
2, No. 1, ISSN: 2010-0248, pp 49-54, 2011.
[35] Lan Guo-cheng, Hong Tzung-Pei , Tseng Vincent S. , “A Projection-
Based Approach For Discovering High Average-Utility Item sets”,
Journal of Information Science And Engineering, Vol.28, pp193-209,
2012.
171
[36] Lin D-I. , Kedem Z.M., “Pincer-Search: A New Algorithm For
Discovering The Maximum Frequent Set,” IEEE Transactions on
Knowledge and Data Engineering, , Vol. 14, No. 3,pp. 553-566, 2002.
[37] Ling Amy Poh Ai, Saludin Mohamad Nasir, Mukaidono Masao,
“Deriving Consensus Rankings via Multicriteria Decision Making
Methodology”, Emerald Journals Business Strategy Series, Volume 13,
Issue 1, ISSN: 1751-5637, pp 3-12, 2012.
[38] Liu Ying, Liao Wei-keng, Choudhary Alok , “A Two-Phase Algorithm
for Fast Discovery of High Utility Item sets”, LNAI 3518, Springer-Verlag
Berlin Heidelberg, pp. 689 – 695, PAKDD 2005.
[39] Liu.Y., Yang.B., “Research of an Improved Apriori Algorithm in Mining
Association Rules”,Journal of Computer Applications, vol. 27, pp. 418-
420, 2007.
[40] Lu Dai, Arun Kumar. S.,”Fuzzy Evaluation Model for Customer
Relationship Management”, International Journal of Emerging Trends in
Engineering and Development, Vol.7, Issue 2, ISSN: 2249-6149, pp
266-279, 2012.
[41] Mack Joun.,” An Efficient k-Means Clustering Algorithm, Analysis and
Implementation”, IEEE Transactions on Pattern Analysis And Machine
Intelligence, Vol. 24, No. 7., 2002.
[42] Meng Qingliang, Kong Qinghua, Han Yuqi, Chen Jie ,”Neural
Networks Based Integrated Evaluation Method for the Effectiveness of
CRM”, Proceedings of the Fourth International Conference on
172
Electronic Business (ICEB) / Beijing , pp 320-321, 2004.
[43] Miller Gerry, “The Web Services Debate, .NET vs. J2EE”,
Communications of the ACM, June,Vol. 46, No. 6 ,pp 64-67, 2003.
[44] Nikolaos.F, Matsatsinis.E., Ioannidou.E., Grigoroudis,“Customer
satisfaction using data mining techniques”, European Journal of
Operational Research, pp 1-4, 1999.
[45] Pawlak.Z,“Rough Set Theory and Its Applications,” Journal of
Telecommunications and Information Technology, Vol. 3, , pp. 7-10,
2002.
[46] Pillai Jyothi , Vyas .O.P. , “CSHURI – Modified HURI algorithm for
Customer Segmentation and Transaction Profitability”, International
Journal of Computer Science, Engineering and Information Technology
(IJCSEIT), Vol.2, No.2, pp 79-89, 2012.
[47] Pillai Jyothi, Vyas .O.P., “High Utility Rare Itemset Mining (HURI): An
approach for extracting highutility rare item sets”, i-manager’s Journal
on Future Engineering and Technology (JFET), ISSN Online: 2230-
7184, ISSN Print: 0973 – 2632, 2011.
[48] Rada Rexhep , Ruseti Bashkim, “Artificial Neural Networks in CRM”,
ICT Innovations, Web Proceedings, ISSN 1857-7288, pp 595-598,
2012.
[49] Rahman Zubair.A.M.J. Md., Balasubramanie.P. and Venkata
Krihsna.P., “A Hash based Mining Algorithm for Maximal Frequent
Itemsets using Linear Probing”. Info comp Journal of Computer Science
173
Vol.8, No.1,pp.14-19, 2009.
[50] Raorane Abhijit, Kulkarni .R.V., “Data Mining Techniques: A Source
For Consumer Behavior Analysis”, International Journal of Database
Management Systems, Vol.3,No.3, pp.45-56, 2011.
[51] Russell K.H., Ching, Chen Ja-Shen, Lin Yi-Shen, “A Proposed
Clustering Method for Customer Segmentation in CRM Practices”,
Journal of Business Research, Vol.44, No.2, pp.75-92., 2002.
[52] Samtani Gunjan, Sadhwani Dimple, “Web Services and Application
Frameworks (.NET and J2EE)” ,pp 1-4, 2004
http://www.nws.noaa.gov/oh/hrl/hseb/docs/ApplicationFrameworks.pdf.
[53] Saravanabhavan .C., Parvathi .R. M. S., “Utility FP-Tree: An Efficient
Approach to Mine Weighted Utility Itemsets”, European Journal of
Scientific Research,Vol.50 No.4 pp.466-480, 2011.
[54] Seddawy Bahgat El Ahmed, Moawad Ramadan, Dr. Hana Maha
Attia, “Applying Data Mining Techniques in CRM”, online publication of
Research article from AASTMT, pp 1-11, 2010.
[55] Selvi Kanimozhi.C.S., Tamilarasi.A., “Mining of High Confidence Rare
Association Rules”, European Journal of Scientific Research ISSN
1450-216X Vol.52 No.2 pp.188-194, 2011.
[56] Seno.M. ,. Karypis.G., “LPMiner: An Algorithm For Finding Frequent
Itemsets Using Length- Decreasing Support Constraint” ,IEEE ICDM, ,
pp. 505-512. 2001.
174
[57] Senthil Kumar.A.V. , Wahidabanu.R.S.D., “DHFI-tree mining: A new
approach for frequent itemset mining”, Advances in Computer Science
and Engineering (ACSE), Vol.2, No.2, pp 115-132, 2008.
[58] Senthil Kumar.A.V. , Dr.Wahidabanu.R.S.D., “Mining Frequent
Itemsets: Efficient Hashing and Tree-Based Approach”, International
Journal of Computer Science and Software Technology (IJCSST),
Vol.1, No.1, pp.1-7, January-June 2008.
[59] Senthil Kumar.A.V. , R.S.D. Wahidabanu, “An Effective Algorithm for
Mining Association Rules”, Journal of Computer Science, pp: 174-183,
Nov- Dec 2006.
[60] Senthil Kumar .A.V. , R.S.D. Wahidabanu, “Discovery of Frequent
Itemsets: Frequent Item Tree-Based Approach”, ITB Journal, ICT Vol. 1
C, No. 1, pp: 42-55, May 2007.
[61] Silvia Rissino , Germano Lambert Torres, “Rough Set Theory –
Fundamental Concepts, Principals, Data Extraction, and Applications,
Data Mining and Knowledge Discovery in Real Life Applications”, ISBN
978-3-902613-53-0, pp. 35-58, 2009.
[62] Siskos Yannis ,Grigoroudis Evangelos, “Measuring Customer
Satisfaction for Various Services Using Multicriteria Analysis”, Springer
US, International Series in Operations Research & Management
Science Volume 44, , ISSN:0884-8289, pp 457-482, 2002.
[63] Teng Shaohua, Su Jiangyu, Zhang Wei, Fu Xiufen, Chen Shuqing,
China. P. R.
175
“An Algorithm of Mining Frequent Itemsets in Pervasive Computing”,
proceedings of IEEE ICDM, pp.559-563, 2009.
[64] Tsiptsis Konstantinos, Chorianopoulos Antonios, “Data Mining
Techniques in CRM: Inside Customer Segmentation”, John Wiley &
Sons, Ltd, ISBN: 978-0-470-74397-3. pp.373, 2009.
[65] Vanitha.K. , Santhi.R., “Using Hash Based Apriori Algorithm To
Reduce The Candidate 2- Item sets For Mining”, Journal of Global
Research in Computer Science, Vol. 2, No. 5, ISSN-2229-371x, pp 79-
80, 2011.
[66] Wang Chien Hua ,Pang Chin Tzong , “Applying Fuzzy Data Mining for
an Application CRM”, Bulletin of Networking, Computing, Systems, and
Software, Vol. 1, No. 1, ISSN 2186–5140, pp 46–51, 2012.
[67] Wu Kun & Liu Feng ying, “Application of Data Mining in Customer
Relationship Management”, IEEE Conference Publications, ISBN: 978-
1-4244-5325-2, pp 1– 4, 2010.
[68] Yuan.F, Meng.Z.H, Zhang.H. X .and Dong .C. R. , “A New Algorithm to
Get the Initial Centroids,” Proc. of the 3rd International Conference on
Machine Learning and Cybernetics, pp. 26–29, 2004.
[69] Zadeh. L.A., “The concept of a linguistic variable and its application to
approximate reasoning”, Information Science (part 1), Vol.8, No.3,
pp.199-249, 1975.
[70] Zadeh. L.A., “The concept of a linguistic variable and its application to
approximate reasoning”, Information Science (part 2), Vol.8, No.4,
176
List of Publications
International Journals
pp.301-357, b, 1975.
[71] Zadeh. L.A., “The concept of a linguistic variable and its application to
approximate reasoning”, Information Science (part 3), Vol.9, No.1,
pp.43-80,. 1976.
[72] Zhang Limei ,”Data mining application in customer relationship
management”, IEEE Conference Publications, ISBN: 978-1-4244-7235-
2, pp 171 – 174, 2010.
[73] Zhang.T., Ramakrishnan. R., Livny, “BIRCH an efficient data
clustering method for very large databases”. ACM. SIGMOD, 1996.
[74] Zu Qiaohong, Wu Ting, Wang Hui, “A Multi-Factor Customer
Classification Evaluation Model”, Journal of Computing and
Informatics”, Vol. 29, No.24, pp 509–520, 2010.
[75] Vijayarani..S., Ms.Sathya.P., “An Efficient Algorithm for Mining
Frequent Items in Data Streams”, International Journal of Innovative
Research in Computer and Communication Engineering Vol. 1, Issue 3,
ISSN : 2320 – 9798, pp 742-747, 2013.
[76] “A comparison of J2EE and .NET as platforms for teaching Web
services”, Proceedings of IEEE Conference on Frontiers in Education,
(FIE) 34th Annual, Vol. 3 pp 1- 17, ISSN : 0190-5848,2004.
177
[1] Narendra Kumar .V. V., Dr. RSD Wahidabanu., “Customer Relationship
Management on J2EE and .NET using Business Intelligence (A
comparative Study on J2EE and .NET Platforms on various parameters
and features)”, International Journal of Datamining Emerging
Technologies, Vol.2, No.1, ISSN: 2249-3212 pp 41-48, 2012.
[2] Narendra Kumar .V. V., Dr. RSD Wahidabanu.,” Customer Relationship
Management with J2EE and .NET using Business Intelligence”,
International Journal of Advanced Research in Computer Science and
Applications, Vol.1,Issue 3, September 2013, ISSN 2321-872X, 2013.
[3] Narendra Kumar .V. V., Dr. RSD Wahidabanu.,”The role of Business
Intelligence with support of J2EE or .NET in consumer relation
management”, International Journal of Datamining Emerging
Technologies,Vol.3, Issue 1,pp 1-15, Print ISSN: 2249-3212, Online ISSN :
2249-3220,2013.
National Journals
[1] Narendra Kumar .V. V. ,“Datamining in various disciplines of
Management”, The Osmania Journal of Management, Vol.V, No.12, ISSN
No. 0976-4208, pp 77-89, 2009.
[2] Narendra Kumar .V. V., “Business Intelligence- A new vision for HR”,
NSHM Journal of Management Research & Application, NJRMA, Vol. 1,
ISSN 0975-2510, pp 72-76, 2009.
178
[3] Narendra Kumar .V. V., “Year wise Dasavatars of Business Intelligence
through BI2.0”, PRATIBHIMBA, the journal of IMIS Vol.8, No.1, ISSN-
0079-2541, pp 49-60, 2008.
Conference papers
[1] Narendra Kumar.V.V., “Customer Satisfaction Using Data mining”
Proceedings of International Conference held at Sai Ram Institutions (Sri
Sai Ram Engineering College), 2007.
[2] Narendra Kumar .V. V., “Data Mining – Visualization Tools (Clementine
work Bench)” Proceedings of the National Conference ETA-2005 , CS-
Dept-Saurastra University, Rajkot, jointly with Amoghasidhi Educational
Society, Sangli, 2005.
[3] Narendra Kumar .V. V., “Clustering (k-means clustering) using Clementine
workbench”, Proceedings of the National Conference on “Recent Trends in
Data Mining and its Applications (NCDMA-2006)” ,Department of Computer
science and Engineering, Faculty of Engineering and Technology,
Annamalai University, 2006.
[4] Narendra Kumar .V. V., “Data cleansing using Oracle ware house &
Managing the knowledge workers: SAPTHASUTHRAS”, National
conference on Business Intelligence ,Dept of Computer Science and
Commerce, P.B. Siddartha college of Arts & Science, Vijayawada and
sponsored by UGC, New Delhi, 2007.
179
[5] Narendra Kumar .V.V., “Banking upon Business Intelligence in Banks”,
National Conference on “Organization and working of Financial Sector in
India”, Alluri Institute of Management Sciences, Warangal.A.P, 2010.