CONSUMER RELATION MANAGEMENT OF .NET AND J2EE USING ... fileSALEM, TAMILNADU, INDIA February 2014 ....

i

CONSUMER RELATION MANAGEMENT OF .NET AND J2EE

USING BUSINESS INTELLIGENCE

Thesis submitted in partial fulfillment for the award of

Degree of Doctor of Philosophy in INFORMATION TECHNOLOGY

By

VANGALA V NARENDRA KUMAR

Guide

Dr. R.S.D.WAHIDA BANU, Ph.D.

VINAYAKA MISSIONS UNIVERSITY

SALEM, TAMILNADU, INDIA

February 2014

ii


DECLARATION

I, VANGALA V NARENDRA KUMAR, declare that the thesis entitled

CONSUMER RELATION MANAGEMENT OF .NET AND J2EE USING

BUSINESS INTELLIGENCE submitted by me for the Degree of Doctor of

Philosophy is the record of work carried out by me during the period from

JULY 2008 to FEBRUARY 2014 under the guidance of Dr.R.S.D.WAHIDA

BANU, PRINCIPAL GOVT. COLLEGE OF ENGINEERING, SALEM and that

not formed the basis for the award of any degree, diploma, associateship,

fellowship, titles in this or any other University or other similar institutions of

higher learning.

Place: Signature of the Candidate

Date:

iii


CERTIFICATE BY THE GUIDE

I, Dr. R.S.D.WAHIDA BANU, PRINCIPAL, GOVT. COLLEGE OF

ENGINEERING, SALEM, TAMILNADU certify that the thesis entitled

CONSUMER RELATION MANAGEMENT OF .NET AND J2EE USING

BUSINESS INTELLIGENCE submitted for the Degree of Doctor of Philosophy

by Mr. VANGALA V NARENDRA KUMAR is the record of research work

carried out by him/her during the period from JULY 2008 to FEBRUARY

2013 under my guidance and supervision and that this work has not formed

the basis for the award of any degree, diploma, associateship, fellowship or

other titles in this University or any other University or Institution of higher

learning.

Place: Signature of the Supervisor with designation

Date:

iv

ACKNOWLEDGEMENT

This thesis is a milestone in my career. I express my sincere gratitude to my

Supervisor Dr.R.S.D. Wahidabanu madam, Principal Govt. College of

Engineering, Salem, Tamilnadu, who guided me throughout this research.

This research would not have been possible without her valuable help and

support. I would also thank Prof.Dr.K.Rajendran, Dean, Research, and the

committee members of Vinayaka Missions University, Salem, Tamilnadu for

providing the opportunity to do this research.I express my profound

gratefulness to Dr.Banda Prakash, Secretary and Correspondent, Alluri

Institute of Management Sciences, Warangal, Andhra Pradesh, for

encouraging me to do this work and providing all the necessary facilities in the

institute. I am eternally grateful to the remarkable contributions of our fellow

faculty Dr.G.Ravi, Mr.Md.Nayeemuddin, Mr.K.Ravi, Mr.K.Anil Kumar and also

to all the software executives, students, faculty, reliance fresh consumers and

staff for extending their support in conducting various research studies. I am

grateful to each and everyone who supported me throughout this research.

Finally I am thankful to my entire family for their encouragement and

emotional support to complete my thesis. I bestow this entire effort to Lord Sri

Venkateswara of Seven Hills.

Vangala V Narendra Kumar

v

TABLE OF CONTENTS

TITLE PAGE NO.

LIST OF FIGURES viii

LIST OF TABLES x

LIST OF ABBREVIATIONS xii

LIST OF SYMBOLS USED xv

ABSTRACT 1

CHAPTER

1. INTRODUCTION 2-13

1.1. Overview 2

1.2. Consumer Relation Management 4

1.3. .NET Platform 5

1.4. J2EE Platform 6

1.5. Business Intelligence 8

1.6. Need for the Study 9

1.7. Objective of Research 10

1.8. Methodology 10

1.9. Organization of thesis 11

2. REVIEW OF LITERATURE 14-23

2.1. Related Research in CRM and Data Mining 14

2.2. CRM software in IT sector 19

2.3. Statistics supporting the use of .NET and J2EE in software

industry

21

2.4. Business Intelligence and CRM 22

2.5 Summary 23

3. CONSUMER RELATION MANAGEMENT OF .NET AND J2EE USING BUSINESS INTELLIGENCE

24-81

3.1 Overview 24

vi

3.2. Consumer Relation Management 24

3.2.1. Consumer Satisfaction 24

3.2.2. Consumer Loyalty 25

3.2.3. Mobile CRM 27

3.2.4. CRM in Cloud Computing 27

3.2.5. Big data and CRM – future CRM 29

3.3. The .NET Framework 30

3.3.1.The Common Language Runtime (CLR) 33

3.3.2. Microsoft Intermediate Language (MISL) 34

3.3.3. Common Type System (CTS) 34

3.3.4. .NET Framework Class Library 35

3.3.5. ASP.NET Web Services 36

3.4. The J2EE Framework 37

3.4.1. The J2EE Platform 37

3.4.2. The J2EE Runtime 38

3.4.3. The J2EE APIs 38

3.4.4. J2EE Technologies 40

3.4.5. Java Server Pages 41

3.4.6. J2EE Service Technologies 42

3.4.7. Some popular Java editors and IDEs 44

3.5. A Comparative Study of .NET and J2EE 45

3.6. Business Intelligence and Data Mining 47

3.6.1. Association Rules 50

3.6.2. Association Rule Mining 51

3.6.3. Apriori Algorithm 51

3.6.4. Implementation of Apriori Algorithm 53

3.6.5. Generating Association Rules from Frequent Item sets 55

3.6.6. Correlation Analysis 57

3.6.7. k-means Clustering 57

3.6.8. Fuzzy Set Approach 60

3.7. Rough Set Theory 61

vii

3.7.1. Approximations 62

3.7.2. Reduction and Significance of Attributes and

Approximation Reducts

64

3.8. Scapegoat Trees and Max Heaps 66

3.8.1. Scapegoat trees 66

3.8.2. Max-Heaps 72

3.8.2.1. Building a Heap 73

3.8.2.2. Cost of Building a Heap 74

3.8.2.3. Heap Sort 75

3.8.2.4. Cost of Heap Sort 75

3.9. MUlticriteria Satisfaction Analysis (MUSA) 76

3.9.1. Satisfaction Indices 78

3.9.3. Demanding Indices 79

3.10. Summary 81

4. METHODOLOGY 82-106

4.1. Overview 82

4.2. Sampling procedures 82

4.3. Data Collection Techniques 83

4.4. Research Methodology 87

4.5. Tools Used 105

5. RESULTS AND DISCUSSION 107-159

6. CONCLUSIONS AND FUTURE WORK 160-165

6.1. Conclusions 160

6.2. Limitations of the Study 163

6.2. Future Work and Suggestions 164

7. BIBLIOGRAPHY / REFERENCES 166-179

List of Publications 177

viii

LIST OF FIGURES

FIGURE TITLE PAGE NO

1.1 Java Language Environment 7

3.1 .NET framework architecture 31

3.2 Web services protocol stack 36

3.3 The J2EE framework 37

3.4 Architecture of a JSP page 42

3.5 Generating frequent item sets with min_sup 2 using

Apriori Algorithm

54

3.6 A scapegoat tree with 10 nodes and height 5 69

3.7 Finding a scapegoat and inserting 7 at node 5 69

3.8 A complete binary tree depicting max-heap 72

3.9 Building a heap 73

3.10 Cost of building a heap 74

4.1 Consumer Profile form 83

4.2 Survey questionnaire for main satisfaction criteria 84

4.3 Survey Questionnaire form for sub criteria satisfaction 85

4.4 Consumer service survey Form 86

4.5 k-means clustering algorithm 90

4.6 KCUSTMH-(k-means clustering using scapegoat trees

and max heaps)

94

4.7 Apriori algorithm for finding frequent item sets 95

ix

4.8 Psuedo code to implement Apriori algorithm 96

4.9 Flowchart of MUSA method integrating with rough set

theory

98

4.10 Consumer satisfaction & Consumer loyalty obtained from a scale

99

5.1 Preferred communication channels, “Media mix”, by consumers

107

5.2 Consumer transaction data in .csv file format 117

5.3 Run information of k-means clustering performed on

gender attribute using Weka

119

5.4 Graph showing efficiency of KCUSTMH vs. traditional k-means algorithm

122

5.5 Clustering association rules using 2-D grid 124

5.6 Run information of Weka apriori algorithm implementation

126

5.7 Dashboard projecting the behavior of the sales of different items

128

x

LIST OF TABLES

TABLE TITLE PAGE NO

2.1 List of top 10 open-sources CRM software 20

3.1 Overview of .NET Framework release history 32

3.2 Comparative study on the various features of .NET

and J2EE platforms

46

3.3 Limitations and missing capabilities of .NET vs. J2EE 47

3.4 Sample consumer Transactions 54

3.5 Consumer transactions in vertical data format 55

5.1 Responses summary on J2EE and .NET platforms 109

5.1.A Overall satisfaction analysis 109

5.1.B Group-wise % satisfaction 109

5.2 User satisfaction opinion of J2EE and .NET 110

5.3 Memory utilization and response time of ASP.NET and

J2EE

116

5.4 Consumer data segmented into 6 clusters 121

5.5 Survey data of sample 20 consumers on global criteria 131

5.6 Overall satisfaction results for global criteria 132

5.7 Criteria satisfaction results 135

5.8 Sample Consumer data with condition and decision attributes

137

5.9 Nominal values of the sample consumer data 138

xi

5.10 Sample consumer data organized w.r.t. decision

attribute

140

5.11 Reduct of Information of sample consumer data 145

5.12 Analysis of condition attributes with Personnel criteria 146

5.13 Analysis of condition attributes with Product citeria 147

5.14 Analysis of Physical Appearance attributes 148

5.15 Analysis of attributes Personnel and Product 149

5.16 Analysis of Personnel and Physical Appearance 150

5.17 Analysis of Attributes Product and Physical Appearance 151

5.18 Reduct of Personnel and Product 152

5.19 Reduct of Personnel and Physical Appearance 152

5.20 Reduct of Product and Physical Appearance 153

5.21 Final reduct information of the sample consumer data 153

5.22 Sample data set transaction format 155

5.23 Opinion of consumer service survey 158

xii

LIST OF ABBREVIATIONS

ABBREVIATION .NET Network Enabled Technology

4P’S Personnel, Product, Place, Physical Appearance

ADI Average Demanding Index ADO Active X Data Objects AJAX Asynchronous JavaScript and XML API Application Programming Interface APT Automatically Programmed Tool ARFF Attribute-Relation File Format ASI Average Satisfaction Index ASP Active Server Pages AVL Adelson-Velskii and Landis' AWT Abstract Window Toolkit BCL Base Class Library BI Business Intelligence

BIRCH Balanced Iterative Reducing and Clustering using Hierarchies

BPM Business Process Management BST Binary Search Tree CAAS /CRAAS Customer Relationship As Service Software CBT Complete Binary Tree CLR Common Language Runtime CLV Consumer Life Time Value COBOL Common Business Oriented Langugae CORBA Common Request Broker Architecture CPU Central Processing Unit CRM Consumer Relation Management CSS Cascading Style Sheets CSV Comma Separated Values CTS Common Type System CURE Clustering Using REpresentatives DBMS Database Management Systems DLL Dynamic Link Libraries EJB Enterprise Java Beans ELKI Environment for DeveLoping KDD-Applications

Supported by Index-Structures ERP Enterprise Resource Planning ETL Extract Transform Load

xiii

FOSS Free and Open Source Software GC Gross Contribution GUI Graphical User Interface HTML Hyper Text Markup Language IDC International Data Corporation IDE Integrated Development Environment IDL Interface Definition Language IL Intermediate Language IMAP Internet Mail Access Protocol ISS Intelligent Software Solutions IT Information Technology J2EE Java 2 Platform, Enterprise Edition J2ME Java 2 Platform Micro Edition J2SE Java 2 Platform Standard Edition JAAS Java Authentication and Authorization Service JAF JavaBeans Activation Framework JAXP Java API For XML Parsing JCA Java Connector Architecture JDBC Java Database Connectivity JDK Java Development Kit JIT Just In Time JMS Java Message Service JMX Java Mail Xtension JNDI Java Naming Directory Interface JRE Java Runtime Environment JSP Java Server Pages JTA Java Transaction API JVM Java Virtual Machine KDD Knowledge Discovery From Data KNIME Konstanz Information Miner LDAP Light Weight Directory Access Protocol MB Mega Bytes MISL Microsoft Intermediate System Language MOM Message-Oriented Middleware MS Microsoft MSEC Milli Seconds MSIL Microsoft Intermediate Language MSMQ Microsoft Message Queue MUSA Multicriteria Satisfaction Analysis MVC Model View Controller MYSQL My Structured Query Language NT Network Technology OLAP Online Analytical Processing OLAP Online Analytical Processing

xiv

OSS Open Source Software PC Personnel Computer PERL Program Evaluation and Report Language PHP Hypertext Pre Processor

PL/SQL Procedural Language/Structured Query Language

POP Post Office Protocol R2 Release 2 RFID Radio Frequency Identification RMI Remote Method Invocation SAAS Software as a Service SCAVIS Scientific Computation and Visualization

Environment SCM Supply Chain Management SGT Scapegoat Tree SMS Short Message Service SMTP Simple Mail Transfer Protocol SOAP Service Oriented Architecture Protocol SPSS Statistical Package for Social Sciences UDDI Universal Description, Discovery, and

Integration VB Visual Basic VJ# Visual Java Sharp W3 World Wide Web WCF Windows Communication Foundation WEKA Waikato Environment for Knowledge Analysis WORA Write Once Run Anywhere WPF Windows Presentation Foundation WSDL Web Services Description Language XHTML eXtensible Hyper Text Markup Language XML eXtensible Markup Language XSLT eXtensible Stylesheet Language

Transformations

xv

LIST OF SYMBOLS USED

Correlation Coefficient, Chi square

set membership

Intersection or wedge product

ρ Spearman correlation coefficient

ᴛ Kendall Coefficient

⊆ subset

⋂ Set- theoretic intersection

=> implies

⊂ subset

P probability

Ф or Ø Null set

∑ summation

U Set- theoretic union

γ degree of dependency

σ selection

ε approximate reduct

≤ Less than or equal to

1

ABSTRACT

Consumer buying behaviors often change for various reasons. Advancements

in technology, changing life styles and need are few factors that change

buying and selling of consumer behaviours. Changing behaviours show a

substantial impact on the profitability and survival of business. Hence

companies need to adopt appropriate decision making techniques to

withstand these changing situations. In this research, an attempt was made to

expound the effectiveness of business intelligence techniques on consumer

relation management. Consumer satisfaction and loyalty are two essential

factors in CRM for improving consumer relations and profitability of any

company. To analyse these two factors, consumer data was segmented using

k-means algorithm. Efficiency of k-means algorithm is improved with

scapegoat tree and max heaps which resulted quality clusters. New

opportunities in buying and selling are identified by analysing consumer

buying behaviours with Apriori algorithm. Consumer satisfaction has been

explored on product, personnel, place and physical appearance (4Ps) of the

enterprise integrating MUlti-criteria Satisfaction Analysis (MUSA) and rough

set theory. Fuzzy set theory is deployed to analyse consumer loyalty.

Exploratory study of consumer satisfaction and loyalty resulted in “if…then…”

rules. A comparative study of .NET and J2EE identified J2EE as a preferred

platform to build customised consumer relation management software. This

research established the dominant role of business intelligence and J2EE in

consumer relation management for effective decision making.

2

1. INTRODUCTION

1.1. Overview

Consumer relation management (CRM) identifies a need for maintaining good

consumer relations to increase the profitability of an enterprise. Maintaining

good consumer relations ensures consumer satisfaction and loyalty.

Consumer satisfaction and better service ensures consumers stay loyal to

enterprise. Such consumers are retained for a longer duration. Consumer

retention is essential since acquiring new consumers is a costly process than

retaining old ones.

An extensive study of consumers buying behaviours is important to make

them satisfied with an enterprise (Hsieh Nan-Chen, Chu Kuo-Chung (2009)).

Enterprises need to collect consumer personal profiles and transaction data to

analyse the buying behaviors. Consumer segmentation is done to study their

buying behaviours. Segmentation is performed on demographic factors like

age, gender or on other factors such as geographical location, occasion,

behaviour etc. using various techniques like k-means algorithm, Customer

Segmentation using High Utility Rare Itemset Mining, CSHURI (Pillai Jyothi,

Vyas .O.P., 2012) and so on.

3

Consumer information is stored in databases or data warehouses. Commonly

used databases are MS-Access, My-SQL, MS-SQL Server etc. Based on

factors such as quantity, ease of use and so on, other sources of data like

MS-Excel spreadsheets, comma separated value (.csv) files, attribute-relation

file format (.arff) files etc. are also preferred. Data from these sources is

cleaned, integrated and mined to identify interesting patterns representing

knowledge based on interesting measures. Software tools like Weka (Waikato

Environment for Knowledge Analysis), SPSS (Statistical Package for Social

Sciences) Clementine, Minitab, Intelligent miner, Siebel etc. are utilised for

data analysis.

Data is gathered using CRM software built either with J2EE or .NET or other

similar software. This data is analysed using data mining techniques

(Seddawy Bahgat El Ahmed, Moawad Ramadan, Hana Maha Attia (2010)) of

business intelligence like classification (Zu Qiaohong, Wu Ting, Wang Hui

(2010)), clustering (Jain.A. K., Murty. M. N.,Flynn.P.J. (1999)), fuzzy logic,

decision trees, prediction, neural networks (Meng Qingliang, Kong Qinghua,

Han Yuqi, Chen Jie (2004)) etc. Enterprises use business intelligence to gain

in-depth knowledge on consumer data and thereby help them to take quick

and well-versed decisions. Business intelligence is also used in decision

making, querying, reporting, online analytical processing (OLAP), statistical

analysis and forecasting.

4

Microsoft .NET and Sun J2EE offer exceptional features to build customized

CRM software. Microsoft .Net offers various applications such as Windows

applications, Web Services, mobile applications, cloud applications etc. Its

framework consists of Silverlight, Windows Communication Foundation

(WCF), Windows Presentation Foundation (WPF), AJAX controls, Windows

Azure etc. Similarly, J2EE framework offers different applications like Swings,

Struts, Java Server Faces, AJAX controls, Android applications etc. Both

these platforms provide effective solutions for e-commerce, business process

management (BPM), supply chain management (SCM) etc. These two

platforms are well supported in mobile devices and cloud environment.

These platforms are used in developing various CRM applications and also in

various web services like websites, e-mails, short message services (SMS),

live chat, alerts, dashboards etc.

1.2. Consumer Relation Management

Consumer or Customer Relationship Management (CRM) is a strategy

adopted by companies to acquire, retain, partner, improve consumer loyalty

and provide consumer satisfaction with potential consumers to create value

for consumer and company. Pareto's Principle or the 80/20 rule can be

applied for CRM which states that 20% of satisfied and loyal consumers can

generate more than 80% of revenues. The main elements of CRM include

5

consumer identification, attraction, retention and development. The important

factors of CRM are consumer satisfaction and loyalty.

Advancements in information technology and growing utility of web changed

CRM strategies of the enterprises and buying behaviours of consumers.

Several companies started offering new electronic communication channels

like e-mail, SMS, live chat, e-brochure, e-newsletter etc. for interacting with

consumers. Many CRM systems largely rely on technology. A good CRM

system collects consumer transaction data from different sources and

processes it using data mining techniques of business intelligence. Using this

information, companies analyze consumer buying behaviours for profitability.

CRM dashboards are also utilised to analyse enterprise performance.

Microsoft .NET and Sun J2EE are two major software platforms which offer

technology support to CRM systems.

1.3. .NET Platform

.NET (also referred as Network Enabled Technology) is a software framework

developed by Microsoft that runs mainly on Windows operating system. .NET

includes vast library of various features and provides language interoperability

because of which every language can use the code written in other

6

languages. Programs written for .NET framework execute in software

environment known as Common Language Runtime (CLR). CLR is an

application virtual machine that provides services such as memory

management, exception handling, security etc .NET framework essentially

constitutes class library and CLR together.

.NET framework's Base Class Library (BCL) provides user interface, data

access, database connectivity, cryptography, web application development

and network communications etc. Users develop software by combining

source code with .NET framework and other libraries. .NET framework is

being used by numerous applications created under Windows platform.

Microsoft Visual Studio is an integrated development environment (IDE) used

for .NET software.

1.4. J2EE Platform

Java is a most matured and commonly used programming language for

building enterprise software. Java is meant for developing applets to be run in

browsers. It evolved as a programming model capable of driving enterprise

applications. To address distinct set of programming needs, Java has three

different platform editions namely Java 2 Platform Standard Edition (J2SE),

Java 2 Platform Enterprise Edition (J2EE) and Java 2 Platform Micro Edition

7

(J2ME).Out of these J2SE is the most commonly used form of Java

technology. J2SE is usually referred to as Java Development Kit (JDK). Figure

1.1 shows the architecture of Java language environment.

Figure 1.1 Java Language Environment

J2EE replaced several proprietary and non-standard technologies as a

preferred choice for building e-commerce and other web-based enterprise

applications like BPM, SCM etc., other alternative being Microsoft’s .NET

based technologies. Sun and its associates made Java a credible platform for

distributed applications.

8

1.5. Business Intelligence

Companies implement business intelligence to analyse their performance and

create effective strategies to withstand their competitors (Habul.A.(2010)).

Business intelligence techniques are implemented to improve business

processes, strengthen consumer relations and collaborations for profitability of

enterprise. Business intelligence helps enterprise not only in making rapid and

improved decisions but also in identifying various business challenges and

opportunities. With business intelligence enterprises create, manage and

deliver valuable reports on internet.

In business intelligence, data is gathered from various sources and stored in a

database or data warehouse. In a database, data is stored in the form of

tables whereas in data warehouse data is stored in data cubes. Enterprises

analyse this data for making better decisions. Business intelligence consists of

various activities like decision making, querying, reporting, online analytical

processing, statistical analysis, forecasting etc. Business intelligence applies

data mining techniques like classification, clustering, decision tress, prediction,

neural networks etc. for analysing data to provide visibility, clarity and better

insight. One main objective of business intelligence is to improve timeliness

and quality of information so as to keep businesses on right track for all times.

Business Intelligence is very useful especially during economical downturns.

9

1.6. Need for the study

Buying behaviours of consumers often change depending on life style. Varied

buying behaviours show significant impact on profitability of enterprise which

in turn leads to economic downturns of the enterprise. Market conditions are

changing day by day and many competitors are entering into consumer

market. These changing market conditions pose a challenge for enterprises to

withstand stiff competition from other companies. Hence enterprises should

implement suitable techniques to overcome various challenges that arise in

consumer market. It is found that business intelligence provide enterprises a

variety of solutions to such environments.

The role of CRM software in maintaining consumer data is very important for

the reason that an enterprise should store precise data to take important

decisions and generate accurate reports. Therefore there is a need of suitable

software platform to build customised CRM packages and analyse this data.

Hence in this research, .NET and J2EE software platforms are considered to

build customised CRM packages and business intelligence techniques are

adopted to analyse consumer data.

10

1.7. Objective of this research

In this research an attempt was made to analyse the following objectives of

consumer relation management. This research is aimed to

1. Identify better communication channels, “Media mix”, for new

selling and buying opportunities, effective reach and quality net

working.

2. Create techniques for maintaining database of consumers with

J2EE or .NET software.

3. Identify consumer requirements and opportunities that facilitate

increase in terms of profit margins, revenues, buying and selling.

4. Improve consumer service, online consumer support systems and

to build loyalty to position as a competitive advantage.

1.8. Methodology

After identifying these objectives, consumer survey was conducted on

preferred “Media mix”. Opinion of consumers who are using .NET and J2EE is

explored on various factors and J2EE was adjudged as users’ choice to build

required CRM software. Then, business intelligence techniques were

implemented to explore consumer data. Consumer segmentation was done

using k-means clustering. Consumer buying behaviors were analysed using

Apriori algorithm. MUlticriteria Satisfaction Analysis (MUSA) along with rough

set theory was implemented to analyse consumer satisfaction and fuzzy

11

classification for analysing consumer loyalty. k-means and Apriori algorithms

were implemented using open source data mining tool Weka. Further an

attempt was made to improve efficiency of some of these algorithms to

implement them on large databases.

1.9. Organisation of the thesis

Chapter 1- Introduction - In this chapter importance of consumer relation

management is discussed. An outline of .NET and J2EE platforms is given.

The role of business intelligence and various data mining techniques along

with methodology implemented in this research is briefed. It also cites the

need for this research.

Chapter 2 - Review of Literature- This chapter is divided into 3 sub sections.

In first subsection various data mining techniques of business intelligence

used in CRM in various research studies are described. It also mentions the

use of MUSA method in analysing consumer satisfaction. The next subsection

identifies the various CRM software used in IT sector. In third subsection

statistics supporting use of .NET and J2EE in software industry and also the

present and future prospects of BI in CRM industry are given.

12

Chapter 3 - Consumer relation management of .NET and J2EE using

business intelligence – This chapter elaborately explains various aspects

used in this research. It explains the importance of consumer satisfaction and

loyalty in CRM. Features of .NET and J2EE for building enterprise applications

are also elaborated. Apart from that, this chapter clearly explains various data

mining techniques of BI like k-means algorithm, association rules and Apriori

algorithm. This chapter also explains MUSA method, rough set theory, fuzzy

set theory, max-heaps and scapegoat trees used in this research.

Chapter 4 – Methodology - In this chapter research methodology that is

implemented to study identified objectives of CRM is mentioned. This chapter

also contains how and where survey is conducted to identify “Media mix”,

consumer opinion on .NET and J2EE in building customized CRM software

and consumer service. Apart from the survey process this chapter also

mentions methodology that is involved in improving efficiency k-means

algorithm and integration technique of MUSA method and rough set theory to

understand consumer satisfaction in an exploratory way.

Chapter 5 – Results and Discussion- This chapter elaborately explains

process involved in solving identified objectives of CRM using surveys; data

mining techniques of business intelligence like consumer segmentation using

k-means algorithm and Apriori algorithm for analysing consumer behaviors;

13

Integration of MUSA and rough set theories for an extensive study of

consumer satisfaction; implementation of fuzzy techniques to analyse

consumer loyalty and finally the consumer service survey. The proposed

KCUSTMH algorithm is also outlined for increasing the efficiency of k-means

algorithm for large databases.

Chapter 6 – Conclusions and Future Work- This chapter summarises

findings of this research in the order of identified objectives and its limitations.

It also mentions future scope of study and trends that are likely to dominate

CRM sector.

At the end, references that are used in this research are mentioned in

Bibliography. Appendix contains sample screen shots, dash boards that are

utilised in decision making process and sample programming code that is

used in this research

14

2. REVIEW OF LITERATURE

2.1. Related Research in CRM and Data Mining

Consumer satisfaction was judged using preference disaggregation of ordinal

values popularly known as MUlticriteria Satisfaction Analysis (MUSA) method.

MUSA method was adopted using preference disaggregation model following

the principles of ordinal regression analysis (Grigoroudis.E.,. Siskos.Y.

Christina Diakaki (2001)). This integrated methodology evaluated the

satisfaction level of a set of individuals like customers, employees, etc. based

on their values and expressed preferences. (Siskos Yannis, Grigoroudis

Evangelos (2002)). Using this satisfaction survey data, MUSA method

aggregated different preferences into unique satisfaction functions. This

aggregation and disaggregation process was achieved with minimum possible

errors. The main advantage of MUSA method was that it fully considered the

qualitative form of customer’s judgments and preferences. Development of a

set of quantitative indices and perceptual maps made it possible to evaluate

consumer satisfaction. Finally reliability analysis made MUSA method

undisputable. Consumer satisfaction studies were also conducted using

several other techniques (Ling Amy Poh Ai, Saludin Mohamad Nasir,

Mukaidono Masao,(2012)).

15

Data mining techniques like k-means clustering which was originally proposed

by James MacQueen (1967); Hugo Steinhans, (1957); Forgy. E.W. (1965);

Hartigan, Wong (1975/79) was used for consumer segmentation (Seddawy

Bahgat El Ahmed, Moawad Ramadan, Hana Maha Attia (2010)). Using

Association rules (Rakesh Agarwal et al (1993,1994,1995)) consumer buying

behaviors and their requirements were analysed by various researches till

date. Apriori algorithm finds frequent itemsets in market basket analysis (Teng

Shaohua, Su Jiangyu, Zhang Wei, Fu Xiufen, Chen Shuqing, China. P. R.

(2009)). Several studies were also conducted to improve their efficiencies.

Consumer segmentation using k-means clustering (Russell K.H., Ching, Chen

Ja-Shen, Lin Yi-Shen ( 2002)) was time consuming for large databases. Its

efficiency got reduced due to frequent calculation of euclidean distances to

cluster data. Efficiency of this method was enhanced by using Red-Black trees

originally proposed by Rudolf Bayer (1972); Leonida Guibas.J. & Robert

Sedgewick (1978); and min heaps (Rajeev Kumar, Rajeswar Puran, & Joydip

Dhar (2011)). k-means clustering algorithm was implemented using Red-Black

trees and min heaps in order to reduce the number of iterations of k-means

algorithm which occur because of the repeated calculation of distances to find

cluster centroids (Yuan.F, Meng.Z.H, Zhang.H. X .and Dong.C.R.(2004)). Use

of these data structures helped to reduce the running time of k-means

algorithm. Implementation of this new algorithm provided quality clusters for

large databases. These data structures are readily available in programming

16

languages like C++ and Java as tree maps. This improved version of k-means

algorithm was superior over traditional one as it improved running time of the

algorithm for large databases.

Only satisfied consumers remain loyal to enterprise and thereby place it in a

competitive advantage position over other firms. Consumer loyalty was

estimated by using fuzzy set theory (Lotfi Zadeh (1975), Isakki.P.,

Rajagoplan.S.P.(2011)). Consumer behaviours were analysed to maintain

good relationships with them (Raorane Abhijit, Kulkarni .R.V. (2011)).

Maximizing consumer satisfaction improves their loyalty and retention. Based

on the previous transactions of consumers, prediction of their buying

behaviours was made and data was analysed using clustering and association

rules. From customer profiles and transaction records segmentation was

performed using k-means algorithm and then Apriori algorithm was applied to

identify consumer behavior which was then followed by identification of

product associations within different consumer segments. Consumer

transaction data was analysed to develop new trend and launch new series of

products (Isakki.P. , Rajagoplan.S.P. (2012)).

Several studies related to tree based approaches for mining frequent itemsets

were conducted on mining of frequent itemsets (Senthil Kumar. A.V.,

Wahidabanu. R.S.D. (2007, 2008)). Effective algorithms for mining association

17

rules were also developed (Senthil Kumar. A.V., Wahidabanu. R.S.D. (2006,

2007)). An improved version of Apriori algorithm using hashing technique was

used to reduce large item sets into candidate 2-itemsets (Vanitha.K.,

Santhi.R.(2011)). A hybrid classification algorithm was proposed for mining

the customer data which was used for decision making (Aurangzeb khan,

Baharum baharudin, Khairullah khan (2011)). Fuzzy data mining was applied

in order to estimate customer loyalty (Chien Hua Wang, Chin Tzong Pang,

(2011)). Several studies were also conducted to analyse the customer buying

behaviours, satisfaction and loyalty using data mining techniques. Hash based

Mining Algorithm for Maximal Frequent Itemsets using Linear Probing were

also developed (Rahman Zubair.A.M.J. Md., Balasubramanie.P., Venkata

Krihsna.P., in (2009), Gangadhara Rao. N.V.B., Sirisha Aguru

,(2012)).Efficient algorithms for mining of frequent items were also developed

by researchers (Vijayarani..S., Ms.Sathya.P.,2013).

“A literature review and classification of application of data mining techniques

in CRM” had given a comprehensive picture of data mining techniques used in

about 87 research articles from 2004 to 2006 (Ngai.E.W.T, Li Xiu,

Chau.D.C.K., (2009)). Among four CRM dimensions namely customer

retention, attraction, identification and development customer retention was

the most common dimension for which data mining was used to support

decision making (54 out of 87 articles, 62.1%). There were 13 articles on

customer identification and customer development covering various

18

aspects of CRM. Of the 54 customer retention articles 51.9% (28 articles) and

44.4% (24 articles) related to one-to-one marketing and loyalty programs

respectively. One-to-one marketing and loyalty programs also ranked first (28

articles out of 87 articles, 32.2%) and second (24 articles out of 87 articles,

27.6%) in terms of subject matter. Data mining and CRM was dealt in these

articles. In one-to-one marketing, 46.4% (13 out of 28 articles) used

association models to analyse customer data, followed by 25.0% (7 out of 28

articles) which used classification models. With regard to loyalty programs

83.3% (20 out of 24 articles) used classification models to assist in decision

making. Among 34 data mining techniques which were applied in CRM, neural

networks were the most commonly used technique. Decision tree and

association rules were described in 21 (24.1%) and 20 (23.0%) articles

respectively. Several studies related to the use of data mning techniques in

CRM like artificial neural networks (Rada Rexhep, Ruseti Bashkim, 2012),

fuzzy evaluation models (Lu Dai, Arun Kumar. S.,(2012), Wang Chien Hua,

Pang Chin Tzong(2012)), were also conducted.

Studies were conducted on mining of association rules using hash based

algorithms (Jong Soo Park, Ming-Syan Chen, Pilip S Yu,(1995)) and on large

databases (Han Jiawei and Yongjian Fu,(1995)). Studies related to fuzzy “if-

then” rules (Ishibuchi.H., Nozaki. K., Yamamoto. N., Tanaka.H,(1995)) and

were also were done and their performance was also evaluated on decision

making (Ishibuchi.H, Nakashima.T., Murata.T.,(1999)) . Several studies

19

related to fuzzy data mining and fuzzy association rules were conducted to

handle data discretisation and continuous attributes (Ishibuchi. H., Yamamoto.

T., Nakashima,.T.,(2001)).

In our research, an attempt was made to study effect of 4P’s namely product,

personnel, physical appearance, and place on consumer satisfaction using

rough set theory (Zdzislaw Palwak.I. (2002)). These were major attributes on

which consumers were pondering upon to express their satisfaction on the

enterprise.

2.2. CRM Software in IT Sector

Open source software (OSS) is widely used to build software applications

including CRM applications. As per Gartner (American information technology

research and advisory firm) report in 2011, 46% used OSS in specific

departments and projects, 22% were adopting OSS consistently in all

departments of a company and 21% were in process of evaluating

advantages of OSS usage.

Few open source CRM websites are mentioned in this research work. These

were developed using either Java or .NET as front end. Source Forge Inc., a

web-based source code repository, lists 369 active open-source CRM projects

20

out of which 10 open source CRM software are on top of the list are shown in

Table 2.1.

Table 2.1 List of top 10 open source CRM software (source:internet)

SNO CRM Software Name

Founded Year

Software Used

1 Sugar CRM Inc.

2004 PHP & MySQL

2 Splendid CRM

2005 .NET2.0 with AJAX & SQL Server (Windows, ISS, SQL Server, C# and ASP)

3 Centric CRM 2007 Java & MySQL

4 Hipergate 2009 Java and JSP, compatible with Microsoft SQL Server, MySQL, Oracle and PostgresSQL

5 Compiere Inc 2006 Java, JavaScript and PL/SQL, and it is compatible with JDBC and Oracle databases

6 Vtiger CRM 1996 JavaScript, PHP and Visual Basic. It is compatible with ADOdb, MySQL and PostgresSQL databases. is built upon the LAMP/WAMP (Linux/Windows, Apache, MySQL and PHP) architecture

7 CentraView Inc.'s

2004 Java and JSP and is compatible with MySQL databases

8 XRMS CRM 2006(last updated)

Written in an interpreted language (PHP). Compatible databases include ADOdb, SQL-based, Microsoft SQL Server, MySQL and other network-based DBMS

9 Cream CRM It is written in Java and JavaScript

10 Tustena CRM Written in C#, ASP.NET and JavaScript. It is compatible with Microsoft SQL Server

21

Table 2.1 clearly states that top CRM websites prefer Java or .NET. Being an

open source many CRM websites prefer java. Open source solutions are

proving to be popular among businesses with limited costs and unique needs.

Several open source java based data mining tools (Kalra Shipra, Gupta

Rachika (2011)) like ELKI, SCaViS, KNIME, Orange, Rapid Miner, Scriptella

ETL, Weka, Jasper Soft etc. are available to analyse the data.

2.3. Statistics supporting the use of .NET and J2EE in software

industry

In a market survey conducted by W3Techs.com on 24th May 2012 Java was

used by 3.9% of all websites. Other languages like PHP, ASP.NET and Cold

Fusion were also used to build websites. As per a Java market report in 2013,

growth rates of Java were high when compared to all other server-side

programming languages. In 2012 W3Techs.com survey stated that ASP.NET

was used by 21.4% of all websites and till April 2013, 18.9% of websites used

ASP.NET and 89.5% used JavaScript. As per International Data Corporation

(IDC) 78% universities teach Java and 50% universities require Java.

In 2008 novell.com connection magazine stated that the two stacks ASP.NET

and J2EE had equal shares of 45% each in the world market share and only

about 10% of the world market was driven by other stacks which are mostly

open source application servers.

22

2.4. Business Intelligence and CRM

As per Gartner report major advantage of business intelligence application in

consumer relationship management was to provide a better understanding of

consumer needs.

In Forbes report on 18/6/2013, Gartner predicts that by 2017 CRM revenues

will cross $36 million and BI revenues will cross $18 million, which clearly

indicates the growing contribution of CRM and BI in the world market share.

Gartner report 2013 also predicts that mobile CRM applications that can be

downloaded from app stores will grow from over 200 in 2012 to 1200 by 2014.

It also predicts that total CRM software applications will be delivered as SaaS

(Software as a Service) during 2016 and Salesforce.com will remain the

largest vendor in terms of revenue in 2013.

Until 2012 several categories of business intelligence tools like spreadsheets

were used for analysing the data. Reporting and querying software tools that

extract, sort, summarise and present selected data were also used for data

analysis. Several other applications used OLAP (Online analytical

processing), digital dashboards, data mining, data warehousing, decision

engineering, process mining, business performance management and local

information systems. Except for spreadsheets these tools are sold as

standalone tools, suites of tools, components of ERP systems or as

23

components of software targeted to specific industry. These tools are

sometimes packaged into data warehouse appliances. Several free open-

source data mining software and applications which are based on these

categories are available in the present day software industry.

2.5. Summary

In this chapter various researches related to CRM which implemented data

mining techniques of business intelligence are discussed. The usage of IT in

CRM sector is also briefed along with the growing importance of free and

open source software. Statistics supporting use of .NET and J2EE in software

industry and the increasing importance of CRM and BI in the world market

share are also discussed.

24

3. CONSUMER RELATION MANAGEMENT OF .NET AND J2EE

USING BUSINESS INTELLIGENCE

3.1. Overview

Consumer relation management (CRM) is a model for managing a company’s

interactions with consumers. CRM involves technology usage to organise and

automate sales, marketing, consumer service and technical support. CRM

includes relationship management and automation of consumer transaction

processes using CRM software and opportunity management

3.2. Consumer Relation Management

The main elements of consumer relation management include consumer

Identification, consumer attraction, consumer retention and consumer

development. Consumer satisfaction and loyalty have considerable impact on

these elements. Hence, study of these two factors is essential to improve any

business.

3.2.1. Consumer Satisfaction

Consumer satisfaction is the extent with which products are purchased and

services offering taking place. Consumer satisfaction changes the buying

behaviours of consumers (Hsieh Nan-Chen , Chu Kuo-Chung (2009)).

Satisfied consumers have a greater desire to purchase more products. This

25

strengthens the enterprise relations with existing consumers. It is a cost-

saving approach for enterprises to encourage repurchasing tendency of the

consumer. Through their word-of-mouth enterprises gain new consumers who

improve profitability of the company. Satisfied consumers never bother to pay

even more prices if necessary. Few factors that influence consumer

satisfaction and retention are products, service and corporate Images.

Consumer retention enables to calculate life time value of a consumer which

in turn helps to improve the profitability of an enterprise.

Consumer life time value (CLV) is calculated as

CLV = GC * [(1+ d) / (1+d-r)] ………… (3.1)

In equation 3.1, GC is the yearly gross contribution, d is the yearly discount

rate and r is the yearly retention rate.

3.2.2. Consumer Loyalty

Consumer loyalty is the willingness of consumer to buy specific products or

services of a company. There are two types of consumer loyalty. Short term

loyalty is the consumer waver when there are better alternatives. Long term

loyalty makes the consumer stay with company for a longer duration.

Enterprises believe that long term consumer loyalty is obtained through better

service and novelty (Jones.T. O. , Jr. Sasser. W. E. (1995)).

26

Consumer segmentation strengthens relations with the consumers and makes

them stay loyal towards the enterprise (Pillai Jyothi, Vyas .O.P. (2012)). Good

consumers create an intense feeling of loyalty towards the enterprise, its

products and services. Consumer potential is calculated from the number of

visits made by a consumer and the value of items purchased on each visit,

using Spearman’s rho (ρ) or Kendall’s tau.

Spearman correlation coefficient is defined as the Pearson correlation

coefficient between the ranked variables. For a sample of size n, the n raw

scores Xi, Yi are converted to ranks xi, yi and ρ is computed from these:

…………(3.2)

Kendall τ coefficient for n values is defined as:

………….(3.3)

.

Advancements in the mobile technology, vast availability of mobile devices,

growth in the internet speeds and also their declining costs makes the

enterprises to focus on mobile consumer relation management (mobile CRM)

for their profitability.

27

3.2.3. Mobile CRM

Mobile devices such as tablet-PC, smart-phone, i-phone etc. are utilised in

consumer relation management for a better profitability. Companies use these

devices to interact with consumers and also for accessing their data from

wherever they are located. Since mobile apps define security and access

preferences to individuals and group of consumers, any executive can interact

with a consumer or data without any difficulty.

Mobile CRM helps enterprises to improve consumer conversion rates. These

in turn help gaining a competitive advantage and also explore new buying and

selling business opportunities.

Consumer relation management utilises cloud computing platform for

enhancing the profitability of an enterprise. Latest technological

advancements in the cloud computing are forcing enterprises to shift towards

the cloud environment.

3.2.4. CRM in Cloud Computing

There are multiple benefits of CRM under cloud computing. CRM under cloud

creates new buying and selling opportunities. Cloud platforms enhance repeat

purchases and thus increase the profitability of a company. Because of the

28

virtualisation technology various benefits like rapid deployment, easy

upgradation, reduced cost and better security is achieved.

CRM under cloud helps companies to improve relations with existing

consumers and thus lead to better marketing of products and services. It

enhances consumer satisfaction, retention and places company in a

competitive advantage.

The best known example of Software as a service (SaaS) cloud model is

consumer relationship management offered by Salesforce.com whose solution

offers sales, service, support, marketing, content, analytical analysis and even

collaboration through a platform called “Chatter”.

This type of software is referred to as CaaS or CRaaS for Consumer

Relationship as service software. Sales force website extended its SaaS

offering to allow developers to create add-on applications, essentially turning

the SaaS service into Platform as a Service (PaaS) offering called

“Force.com” platform. Applications built on “”Force.com” are in the form of

Java variant called “Apex” which uses XML syntax for creating user interfaces

in HTML, AJAX and Flex. Nearly thousand applications exist for this platform

from hundreds of vendors.

29

Latest technological advancements in IT achieved a number of benefits to

CRM technology. Various aspects of cloud computing like pay per use,

virtualisation, and easy changeover made CRM less expensive. Social media

such as twitter, orkut, facebook etc., radically changed vendor marketing and

consumer services. Mobile devices have opened up new sales and marketing

channels.

3.2. 5. Big Data and CRM – future CRM

CRM uses technology for automating, organising and synchronising consumer

related activities like buying and selling, consumer service and support. These

activities involve collecting huge consumer data in the form of text, pictures,

audio, video etc. This data is called the “Big Data”.

Big data involves collection of data sets that form huge quantity of data. Such

data is difficult to process using conventional data processing techniques. Big

data enhances business opportunities and improves CRM. Big data analytics

is transforming vendor-consumer relations and interactions. With big data

impingement even business models are getting transformed.

Future “Big data” analytics will offer businesses powerful tools capable of

identifying sales opportunities. Using these tools responses or comments on

30

products on social media can be combined with internal data to understand

consumers’ preferences.

Gartner states that big data will renovate the way companies manage their

relationships with consumers. Hence companies must be prepared to face

such an environment. Gartner also states that big data shall be the next major

emerging technology of CRM to enhance consumer relations with enterprises.

3.3. .NET framework

The architecture of .NET framework is shown in Figure 3.1 It has a Base

Class Library, (BCL) which provides

User interface

Data access

Database connectivity

Cryptography

Web application development and

Network communications etc.

31

Figure 3.1 .NET framework architecture

Developers construct software by combining source code with .NET

framework and other libraries. .NET framework is being used by several new

applications created for Windows platform. Microsoft.NET has an integrated

development environment (IDE) called Visual Studio. Microsoft (MS) Visual

Studio supports programming languages such as C++, C#, J#, Visual Basic,

Visual C++. It supports these programming languages by means of language

services which allow the code editor and debugger to support nearly any

programming language. Other languages such as M, Python and Ruby are

also supported through support software. Web services of .NET platform

support XML/XSLT, HTML/XHTML, Javascript, CSS and SOAP protocol.

VB.NET C++ C# J#

Common Language Specifications (CLS)

Others

ASP.NET Web Forms and Services

Windows Forms

ADO.NET and XML

Base Class Library (BCL)

Common Language Runtime (CLR)

Operating System (OS)

Vis

ual

Stu

dio

. NET

32

Since the release of .NET in 2002, Microsoft released several versions of

.NET platform. Table 3.1 presents an overview of .NET release history. Visual

Studio 2013 was released on October 17, 2013 along with .NET 4.5.

Table 3.1 Overview of .NET framework release history (Source internet)

Version Released date

Development tool Supported OS

1.0 2002-02-13 Visual Studio .NET

1.1 2003-04-24 Visual Studio .NET

2003

Windows Server 2003

2.0 2005-11-07 Visual Studio 2005 Windows Server 2003

R2

3.0 2006-11-06 Expression Blend Windows Vista, Windows Server 2008

3.5 2007-11-19 Visual Studio 2008 Windows 7, Windows Server 2008 R2

4.0 2010-04-12 Visual Studio 2010

4.5 2012-08-15 Visual Studio 2012 Windows 8, Windows Server 2012

4.5.1 2013-10-17 Visual Studio 2013 Windows 8.1, Windows Server 2012 R2

33

A typical .NET platform has various design features such as

Interoperability

Common Language Runtime (CLR) engine

Language Independence

Base Class Library (BCL)

Simplified deployment

Security and

Portability.

3.3.1. Common Language Runtime

Common Language Runtime, CLR, is a major component of .NET framework.

CLR provides benefits such as exception handling, security, debugging,

versioning etc. These benefits are available to any language built for CLR.

CLR hosts many languages like VB.NET, VC++.NET, C#.NET, VJ#, Perl,

Python and even COBOL. When compiled for CLR this code is called

“managed code” which takes advantage of services offered by CLR. Metadata

created during compilation is used to locate and load classes, generate native

code and provide security. CLR defines standard type system which provides

language interoperability at design time.

34

3.3.2. Microsoft Intermediate Language

When .NET compiles the source code, it does not compile to native code

instead compilation process translates code into Microsoft Intermediate

Language, MISL. The compiler also creates necessary metadata and

compiles it into a component. This resulting Intermediate Language (IL) is

CPU independent. Compilation to native code occurs via Just in Time (JIT)

complier.

3.3.3. Common Type System

Common Type System, CTS, specifies the types supported by CLR which

include

Classes, definition of what will become an object; includes properties,

methods, and events.

Interfaces, definition of functionality a class can implement, but does

not contain any implementation code.

Value Types, user defined types that are passed by value and

Delegates, similar to functions in C++; delegates are often used for

event handling and callbacks.

35

3.3.4. .NET framework Class Library

.NET framework Class Library types include items such as primitive data

types, I/O functions, data access and security. .NET framework provides a

host of utility classes and members organised within a hierarchy called

namespace. At the root of hierarchy is the “System” namespace. A

namespace groups classes and members into logical nodes such that it can

have same name for method in more than one namespace.

3.3.5. ASP.NET Web Services

ASP.NET web services are implemented using SOAP protocol. ASP.NET web

services allow developers to easily develop SOAP based applications.

ASP.NET web services are simple to build, test and deploy. Web services

protocol stack shown in the Figure 3.2 contains SOAP which uses HTTP and

XML to make remote procedure calls across the network. To create ASP.NET

pages, languages supported by .NET framework like VB.NET, C#, “Managed”

C++ or Jscript.NET should be used because the web page is compiled into a

DLL. To run ASP.NET pages, IIS (Internet Information Services or Server)

web server is required. ASP.NET uses HTML, CSS, Javascript and server

scripting for building websites.

36

Figure 3.2 Web services protocol stack

A programming model of ASP.NET is Model, View and Controller (MVC),

which is a framework for building websites. Three different development

models namely Web Pages, Model View Controller and Web Forms are

supported by MVC. In MVC,

Model represents application core i.e., list of database records

View displays data i.e., database records and

Controller handles input to database records.

This MVC model manages HTML, CSS and JavaScript.

Layer 4 UDDI (Service Discovery)

Layer 3 WSDL (Service Description)

Layer 2 XML Messaging (XML, SOAP)

Layer 1 Transport (HTTP, SMTP, FTP)

37

3.4. J2EE framework

Figure 3.3 J2EE framework (source internet)

J2EE framework shown in the Figure 3.3 contains container in which the Java

Virtual Machine (JVM) exists. JVM aids developers in building enterprise

applications effectively. JVM is essentially a set of Java technologies like

JNDI, JDBC, JAAS, JTA, JMX, Java Mail, EJB, JMS, CORBA, SOAP, RMI,

Servlets, JSP, XML etc.

3.4.1. J2EE Platform

J2EE platform is a Java distributed application server environment which

provides the following:

38

A set of Java extension APIs to build applications. These APIs define

programming model for J2EE applications.

A run-time infrastructure for hosting and managing applications. This is

server runtime in which the application resides.

3.4.2. J2EE Runtime

Server side resources are scarce and require special attention. Some of these

resources include threads, database connections, security, transactions etc.

Custom building is always a challenge in this infrastructure that deals with

these resources. Since these server side requirements are common across

wide variety of applications it is more appropriate to consider a platform that

has built-in solutions. This separates these infrastructure-level concerns from

more direct concern of translating application requirements to software that

works. J2EE runtime addresses such concerns. J2EE does not specify the

nature and structure of runtime. Instead it introduces container and via J2EE

APTs specifies a contact between the containers and applications.

3.4.3. J2EE APIs (Application Program Interfaces)

Distributed applications require access to a set of enterprise services. Typical

services include transaction processing, database access, messaging,

multithreading, etc. J2EE architecture accesses to such services in its

39

enterprise service Application Program Interfaces, APIs. Instead of having to

access these services through proprietary or non-standard interfaces,

application programs in J2EE can access these APIs via a container.

A typical commercial platform of J2EE (or J2EE application server) includes

one or more containers and access to enterprise APIs is specified by J2EE.

Java standard extensions that J2EE 1.3 platform support are:

JDBC 2.0

Enterprise Java Beans (EJB) 2.0

Java Servlets 2.3

Java Server Pages (JSP) 1.2

Java Message Service (JMS) 1.0

Java Transaction API (JTA) 1.0

Java Mail 1.2

JavaBeans Activation Framework (JAF) 1.0

Java API for XML Parsing (JAXP) 1.1

Java Connector architecture (JCA) 1.0 and

Java Authentication and Authorization Service (JAAS) 1.0.

40

J2SE APIs that J2EE 1.3 supports are:

Java Interface Definition Language (IDL) API

JDBC Core API

RMI-IIOP API and

JNDI API.

3.4.4. J2EE Technologies

A collection of technologies that provide mechanics which are needed to build

large distributed enterprise applications are:

Component technologies

These technologies are used to hold most important part of

application – the business logic. There are three types of

components: JSP pages, servlets and Enterprise Java Beans.

Service technologies

These technologies provide application’s components with

supported services to function efficiently.

Communication technologies

These technologies which are mostly transparent to the

application programmer provide mechanisms for communication

among different parts of the application whether they are local or

remote.

41

3.4.5. Java Server Pages

Java Server Pages (JSP) embeds components in a web page such that it is

sent to the client. A JSP page contains Hypertext Markup Language (HTML),

Java code and Java Bean components. JSP pages are an extension of servlet

programming model. When a user requests JSP page the web container

compiles JSP page into a servlet. Web container then invokes the servlet and

returns the resulting content to the web browser. Once servlet has been

compiled from JSP page, web container can simply return the servlet without

having to recompile each time. Thus JSP pages provide a powerful and

dynamic page assembly mechanism that benefits from many advantages of

Java platform.

Compared to servlets which are pure Java code, JSP pages are text-based

documents until web container compiles them into corresponding servlets.

This allows a clearer separation of the application logic from the presentation

logic. This in turn, allows the application developers to concentrate on

business matters and web designers to concentrate on the presentation logic.

A typical architecture of a JSP page is shown in the Figure 3.4.

42

Figure 3.4 Architecture of a JSP page (source: internet)

3.4.6. J2EE Service Technologies

J2EE framework includes the following service technologies:

Java Database Connectivity, JDBC, provides developer with an ability to

connect to relational database systems. JDBC API which comes with J2SE

has features such as connection pooling and distributed transactions.

Java Transaction API (JTA) and Service is a means for working with

transactions and especially distributed transactions independent of transaction

manager’s implementation and Java Transaction Service (JTS).

Java Naming and Directory Interface (JNDI) API in J2EE platform has two

fold roles:

43

Firstly it provides the mean to perform standard operations on

directory service resource such as LDAP, Novell Directory Services

or Netscape Directory Services.

Secondly, J2EE application utilises JNDI to look up interfaces used

to create among other things the EJBs and JDBC connections.

Java Message Service (JMS) is a mechanism of sending data

asynchronously. It provides functionality to send and receive messages

through the use of Message-Oriented Middleware (MOM).

Java Mail is an API that can be used to abstract facilities for sending and

receiving e-mail. Java Mail supports most widely used Internet mail protocols

such as IMAP4, POP3 and SMTP but compared to JMS it is slower and less

reliable.

Java Connector Architecture (JCA) is a standardised means of accessing

variety of legacy applications, typically ERP systems such as SAP R/3 and

PeopleSoft and produces “plug-and-play” components to access legacy

systems.

44

Java Authentication and Authorization Service (JAAS) provides a means

to grant permissions based on who is exceeding the code. JAAS utilises

pluggable architecture of authentication modules so that one can drop in

modules based on different authentication implementations such as Kerberos

or KPI.

3.4.7. Some popular Java editors and IDEs

Some well known Java editors which are used for writing java source code

are, Emacs and JEdit. Popular integrated development environments (IDEs)

are Eclipse, Borland JBuilder and JCreator.

To summarise:

J2EE is a container-centric architecture which provides simple runtime

and provides several levels of abstraction.

J2EE recognizes need for composing components into modules and

modules into applications. This is an attempt to standardise reuse of

application components and modules.

J2EE represents very intuitive approach to build applications. While

design process is top-down, deployment process is bottom-up and is a

composition process composing modules from components and

applications from components.

45

3.5. A Comparative Study of .NET and J2EE

J2EE and Microsoft .NET are directly competing software platforms designed

to build and run complex enterprise applications. To obtain a better

understanding of these platforms a comparative study was done by several

organisations like IBM (2004) etc. Today both are dominating the IT industry

and the superiority over the other varies from time to time and place to place.

Analysts do not believe that there will be one winner. Organisations will deploy

both depending on the type of applications they deliver. Presently J2EE and

.NET are taking 40% of the world market share.

Comparative studies on various features of these two platforms (Iqbal Asad,

Ullah Naeem, (2010)) like web services (Miller Gerry,(2003)) etc. are

summarised in the Table 3.2 which is given below (Samtani Gunjan,

Sadhwani Dimple,(2004)):

46

Table 3.2 Comparative study on various features of .NET and J2EE platforms

Features .NET J2EE

Web presentation ASP.NET, IIS server JSP/Servlets, Tomcat

server, Web logic server

etc.

Business services .NET components EJBs

Web services

XML, WSDL, SOAP,UDDI,

WS-I, compatibility

Full support and

WS-I compatibility in

release 1.4

Mobile applications .NET Compact framework J2ME

DB integration ADO.NET EJB-SQL/JDBC

Messaging integration MSMQ Message EJBs/JMS

Legacy integration Com TI JCA

Programming language C#.NET, VB.NET, C++.NET,

others for CLS

Java

Interpreted language MSIL Java Byte code

Runtime environment CLR JVM/JRE

Class libraries .NET framework Java Class Libraries

Rich client Windows Forms AWT/Swing (JTSE)

47

A few limitations and missing capabilities are given in the Table 3.3

Table 3.3 Limitations and missing capabilities of .NET vs. J2EE

J2EE .NET

Java Transaction Service (JTS) Interoperate with COM+ services

Procedural transactions via JTA Limited declarative-only capabilities

Container-Managed Persistence Program it

Message-Driven Beans Build with queued components

Java Database Connector (JDBC) Different APIs for each ADO.NET

provider

Java Naming & Directory (JNDI) Build it

JCA standard adapters and

services

Build it

JMS to other, non-native platforms Get a bridge

3.6. Business intelligence and Data mining

Business Intelligence (BI) adopts data mining techniques (Chris Rygielski,

Jyun-Cheng Wang, David Yen.C. (2002)) such as classification, clustering,

decision trees, prediction, neural networks etc. BI provides visibility, clarity and

insight into the data. Business intelligence consists of tools like data mining,

data marts and decision support systems. BI provides enterprise integration

48

and web services. BI supports powerful enterprise and web based reporting

features.

Data mining refers to extracting or “mining” knowledge from large amount of

data (Han.J, Kamber.M. (2006)). Data mining is also treated as a synonym for

Knowledge Discovery, KDD ((Imielinski. T. , Mannila. H, 1996)) Knowledge

Discovery (Hong Tzung Pei, Huang Tzu Jung, Chang Chao Sheng (2009)) is

a process which consists of iterative sequence of the following steps:

1. Data cleaning – removing noise and inconsistent data.

2. Data Integration – combining multiple data sources.

3. Data Transformation - consolidation of data into various forms

appropriate for mining by performing operations like summary or

aggregation operations.

4. Data mining – applying intelligent methods in order to extract data

patterns

5. Pattern Evaluation – identifying the truly interesting patterns representing

knowledge based on some interesting measures.

6. Knowledge Presentation – visualisation and knowledge representation of

mined knowledge to the user.

49

Data mining involves integration of techniques from multiple disciplines such

as:

Data base and data warehouse technology

Statistics

Machine learning

High-performance computing

Pattern recognition

Neural networks

Data visualization

Information retrieval

Image and signal processing and

Spatial or temporal data analysis.

Various data mining techniques used in this research are

Association rules

Apriroi algorithm

k-means algorithm

Fuzzy and rough set approach.

These data mining techniques are used in CRM for consumer segmentation

etc. (Tsiptsis Konstantinos, Chorianopoulos Antonios,(2009), Zhang

50

Limei,(2010)). These data mining techniques are part of business intelligence

and are used in CRM (Habul. A.,(2010)), Wu Kun , Liu Feng ying,(2010)).

3.6.1. Association Rules

Let I = { I1, I2, …. , Im } be a set of items, D = set of database transactions and T

= set of items such that T ⊆ I. Let A be a set of items. Transaction A is

contained in T if and only if A⊆ T.

An association rule is an implication of the form A => B, where A ⊂I, B⊂ I,

and A⋂B=Ø.

Support s is % of transactions in D that contain AUB. Confidence c is % of

transactions in D containing A that also contain B. If P (A|B) is the conditional

probability then,

support (A=>B) = P (AUB) and confidence (A=>B) = P (B|A)………... (3.4)

The above equation establishes relation between confidence and support

confidence (A=>B) = P (B|A) = support (AUB) / support (A)

= support_count (AUB) / support_count (A) ……….. (3.5)

51

3.6.2. Association Rule Mining

It is a two step process to determine frequent itemsets :

1. Find all frequent itemsets with predetermined minimum support ,

min_sup

2. Generate strong association rules from frequent itemsets that satisfy

min_sup and minimum confidence, min_conf.

3.6.3. Apriori Algorithm

Apriori property: All non empty subsets of a frequent itemset must also be

frequent. This property is used in Apriori algorithm. Various steps involved in

the algorithm are:

1. The join step: To find Lk, a set of candidate k-itemsets is generated by

joining Lk-1 with itself. This set of candidates is denoted Ck. Let l1 and l2 be

itemsets in Lk-1. Notation li[ j] refers to jth item in li. By convention, Apriori

assumes that items within a transaction or itemset are sorted in lexicographic

order. For the (k-1)-itemset, li, this means that the items are sorted such

that li[1] < li[2] < …..< li[k-1]. Join, Lk-1 on Lk-1, is performed, where members

of Lk-1 are joinable if their first (k-2) items are in common. That is, members l1

and l2 of Lk-1 are joined if (l1[1] = l2[1]) (l1[2] =l2[2])….. ……. (l1[k-2] = l2[k-2])

(l1[k-1] < l2[k-1]). Condition l1[k-1] < l2[k-1] simply ensures that no duplicates

52

are generated. Resulting itemset formed by joining l1 and l2 is l1[1], l1[2], l1[k-2],

l1[k-1], l2[k-1].

2. Prune step: Ck is a superset of Lk, that is, its members may or may not be

frequent, but all frequent k-itemsets are included in Ck. A scan of the database

to determine count of each candidate in Ck would result in determination

of Lk (i.e., all candidates having count no less than min_ sup count are

frequent by definition, and therefore belong to Lk. Ck, however, can be huge,

and this could involve heavy computation. To reduce the size of Ck, Apriori

property is used. Any (k-1)-itemset that is not frequent cannot be subset of

frequent k-itemset. Hence, if any (k-1) subset of candidate k-itemset is not

in Lk-1, then candidate cannot be frequent either and so can be removed

from Ck. This subset testing is done quickly by maintaining hash tree of all

frequent itemsets.

Mining of frequent patterns can also be done without candidate key generation

(Han. J., Pei. J, Yin. Y., (2000)) and also by other techniques like “Pincer

Search” algorithm (Lin D-I. , Kedem Z.M., (2002)).

53

3.6.4. Implementation of Apriori Algorithm

The pseudo code for the implementation of Apriori algorithm is given below:

The process of implementing the Apriori algorithm using sample consumer

transactions given in the Table 3.4 is shown in the Figure 3.6. Apriori

algorithm was improved for mining association rules (Liu.Y., Yang.B., (2007)).

Join Step: Ck is generated by joining Lk-1with itself

Prune Step: Any (k-1)-item set that is not frequent cannot be a subset

of a frequent k-item set.

Pseudo-code:

1. Ck: Candidate item set of size k

2. Lk: frequent item set of size k

3. L1= {frequent items};

4. for(k= 1; Lk != Ф; k++) do begin

5. Ck+1= candidates generated from Lk;

6. for each transaction t in database do

7. increment the count of all candidates in Ck+1that are contained in t

8. Lk+1= candidates in Ck+1with min_support

9. end

10. Return UkLk;

54

Table 3.4 Sample consumer transactions

Figure 3.6 Generating frequent itemsets with min_sup 2 using Apriori Algorithm (source: Han.J. , Kamber.M.)

55

Mining of frequent itemsets is also done using vertical data format and

projection based approach (Lan Guo-cheng, Hong Tzung-Pei and Tseng

Vincent S. , (2012)) . Sample consumer transactions given in the Table 3.5

are arranged in vertical format which is shown in the Table 3.5 (Seno.M. and.

Karypis.G. (2001)). In vertical data format consumer transaction data is

analysed with respect to the nature of transactions. This helps the enterprise

to increase its profitability.

Table 3.5 Consumer transactions in vertical data format (source: Han.J., Kamber.M.)

3.6.5. Generating Association Rules from Frequent Itemsets

Once the frequent itemsets from the transactions in the database D is found,

the next step is to generate strong association rules from them

(where strong association rules satisfy both minimum support and minimum

Itemset TID_set

I1 {T100,T400,T500,T700,T800,T900}

I2 {T100,T200,T300,T400,T600,T800,T900}

I3 {T300,T500,T600,T700,T800,T900}

I4 {T200,T400}

I5 {T100,T800}

56

confidence). This is done using the confidence formulae which is given in

equation (3.6).

confidence (A=>B) = P (B|A) = support (AUB) / support (A)

= support-count (AUB) / support-count (A) ………. (3.6)

Conditional probability is expressed in terms of itemset support_count,

where support-count (AUB) is the number of transactions containing item

sets AUB, and support-count (A) is number of transactions containing item

set A. Based on this equation, association rules are generated as follows:

For each frequent item set l, generate all nonempty subsets of l.

For every nonempty subsets s of l, output the rule s=> (l-s)

If [support-count (l) / support-count(s)] ≥ min_conf, where min_conf is

minimum confidence threshold.

Because these rules are generated from frequent itemsets, each one

automatically satisfies the minimum support. Frequent item sets can be stored

ahead of time in hash tables along with their counts so that they are accessed

quickly.

Association rule mining were also implemented using several techniques like

FP-tree(Saravanabhavan .C. Parvathi .R. M. S.,(2011)) and for mining of rare

rules (Selvi Kanimozhi.C.S., Tamilarasi.A.,(2011)).

57

3.6.6. Correlation Analysis

Correlation measure is used to augment support-confidence framework of

association rules. This leads to correlation rules, which are measured by its

support, confidence and correlation as shown in equation (3.7).

A=> B [support, confidence, correlation]. ……………….. (3.7)

P (AUB) = P (A) P (B) indicates the occurrence of an itemset A which is

independent of the occurrence of an itemset B, else both are dependent and

correlated. Then

lift (A, B) = P (AUB) / P (A) P (B)………………………… (3.8)

If lift < 1 both are negatively correlated, lift > 1 positively correlated and lift= 1

then A and B are independent and there is no correlation between them.

Hence correlation analysis is used to filter out uninteresting association rules.

Correlation analysis is also performed using . From this analysis it can be

known that whether given itemsets are negatively correlated or not.

= ∑ (observed value–expected value) 2 / expected value……… (3.9)

3.5.7. k-means clustering

k-means clustering is an iterative algorithm in which items are moved among

sets of clusters until desired set is reached. As such, it is viewed as a type of

squared error algorithm, although convergence criteria need not be defined

58

based on squared error. A high degree of similarity among elements in

clusters is obtained, while high degree of dissimilarity among elements in

different clusters is achieved simultaneously.

Cluster mean of ki = { ti1, ti2, ...…, tim} is defined as

mi= (1/m) ∑m tij ………………………………….(3.10) j=1 This definition assumes that each tuple has only one numeric value as

opposed to a tuple with many attribute values. k-means algorithm assumes

that some definition of cluster mean exists but not a particular one.

This algorithm assumes that desired number of clusters, k, is input parameter.

k-means algorithm is shown below. Here initial values for means are arbitrarily

assigned. These are assigned randomly or use values from first k input items

themselves. For example, algorithm stops when no or very small number of

tuples are assigned to different clusters. Other termination techniques such as

fixed number of iterations are also considered. Maximum number of iterations

is included to ensure stopping even without convergence.

59

Complexity of k-means is O (tkn) where t is number of iterations. k-means

finds local optimum and actually misses global optimum. k-means does not

work on categorical data because mean must be defined on the type. Only

convex-shaped clusters are found. It also does not handle outliers well. One

variation of k-means, k-modes does handle categorical data. Instead of using

means, it uses modes. Typical value for k varies from 2 to 10. Although k-

means often produces good results, it is not time efficient and does not scale

well. By saving distance information from one iteration to next, actual number

of distance calculations that must be made is reduced.

Input

D= {t1, t2,......,tn} //Set of elements

k //Number of desired clusters

Output:

k //Set of clusters

k-means algorithm:

Assign initial values for means m1, m2…... mk;

Repeat

Assign each item ti to the cluster which has the closest mean;

Calculate new mean for each cluster;

Until convergence criteria is met;

60

The efficiency of k-means algorithm was improved since its implementation on

large databases could not scale well. (Kanungo. T., Mount. D.M., Netanyahu.

N.S., Piatko. C., Silverman. R., Wu A.Y.,(2002), Mack Joun,(2002)). This k-

means algorithm later on was modified several times for different applications

(Bagirov.A.M., Mardaneh.K.,(2006)). For large databases clustering may be

performed using other techniques like BIRCH (Zhang.T., Ramakrishnan. R.,

Livny, (1996)), CURE etc.

3.6. Fuzzy Set Approach

Fuzzy set theory (Zadeh (1975, 1976)) is also known as possibility theory as

an alternative to traditional two-value logic and probability theory. It lets one

work at high level of abstraction and offers means of dealing with imprecise

measurement of data. Most important, fuzzy set theory allows dealing with

vague or inexact data. Unlike the notion of traditional “crisp” sets where an

element either belongs to a set or its complement, in fuzzy set theory,

elements can belong to more than one fuzzy set. Fuzzy logic uses truth values

between 0.0 and 1.0 to represent degree of membership that certain value

has in given category instead of having precise cutoff between categories.

Each category then represents a fuzzy set.

61

3.7. Rough Set Theory

The rough set philosophy originates on an assumption that with every object

of universe of discourse, one associates some information (data, knowledge).

Objects characterised by same information are indiscernible (similar) in view

of available information about them. The indiscerniblity relation generated in

this way is the mathematical basis of rough set theory (Pawlak.Z, (2002)).

Any set of all indiscernible (similar) objects is called an elementary set and

forms basic granule (atom) of knowledge about universe. Any union of some

elementary sets is referred to as a crisp (precise) set, otherwise set rough

(imprecise, vague).

Consequently, each rough set has boundary-line cases, i.e., objects which

cannot be, with certainty, either as members of set or of its complement.

Obviously, crisp sets have no boundary-line elements at all. That means that

boundary line cases cannot be properly classified by employing the available

knowledge.

Thus, the assumption that objects can be seen only through information

available about them leads to a view that knowledge has a granular structure.

Due to the granularity of the knowledge, some objects of interest cannot be

discerned and appear same (or similar). As a consequence, vague concepts,

in contrast to the precise concepts, cannot be characterised in terms of

62

information about their elements. Therefore, in this approach, one assumes

that any vague concept is replaced by a pair or precise concepts called lower

and upper approximation of vague concept. Difference between upper and

lower approximations constitute in boundary region of vague concept.

Approximations are basic operations in the rough set theory ( Silvia Rissino,

Germano Lambert Torres,(2009)).

3.7.1. Approximations

The starting point of the rough set theory is indiscernibility relation, generated

by information about objects of interest. Indiscernibility relation was intended

to express the fact that due to lack of knowledge it was not possible to discern

some objects employing available information. That means that, in general, it

was not possible to deal with single objects, but one had to consider clusters

of indiscernible objects, as fundamental concepts of rough set theory.

The indiscernibility relation is used to define basic concepts of rough set

theory as follows:

B*(X) = {x U: B(x) ⊆ X }………………….(3.11)

B*(X) = {x U: B(x) ∩ X ≠ Ø}……………...(3.12)

Assigning to every subset X of universe U two sets B*(X) and B*(X) called B-

lower and B-upper approximation of X, respectively are derived.

63

BNB(X) = B*(X) – B*(X)………………………(3.13)

This set is referred to as B-boundary region of X. If BNB. (X) = Ø boundary

region of X is empty set X is crisp (exact), w.r.t. to B; in opposite case, i.e., if

BNB.(X) ≠ Ø, set X is referred to as rough (inexact) w.r.t. B.

Four basic classes of rough sets i.e., four categories of vagueness are defined

as follows:

a) B*(X) = Ø and B*(X) ≠ U, if X is roughly B-definable whether they

belong to X or -X, using B i.e., this means that it is possible to decide

for some elements of universal set U.

b) B*(X) = Ø and B*(X) ≠ U, if X is internally B-indefinable

i.e., this means that it is possible to decide whether some elements of U

belong to –X, but it is not possible to decide for any element of U, using

B, whether it belongs to X or not.

c) B*(X) ≠ Ø and B*(X) = U, if X is externally B-definable

i.e., this means that it is possible to decide for some elements of U

whether they belong to X, but not possible to decide, for any element of

U whether it belongs to –X or not, using B.

d) B*(X) = Ø and B*(X) = U, if X is totally B- indefinable

64

i.e., it is not possible to decide for any element of U whether it belong, to

X or –X, using B.

Rough set can also be characterised numerically by the following coefficient

αB (X) = | B*(X) | / | B*(X) | ………………. (3.14)

called accuracy of approximation, where |X| denotes cardinality of X ≠ Ø.

Obviously 0 ≤ αB (X) ≤ 1. If αB = 1, X is crisp w.r.t. B (X is precise w.r.t. B),

otherwise if αB< 1, is rough w.r.t. B (X is vague w.r.t. B).

3.7.2. Reduction and Significance of Attributes and Approximation

Reducts

Rough set theory can be implemented to check the influence of certain

condition attribute on a decision attribute.

Let C, D ⊆ A, be sets of condition and decision attributes respectively, C’ ⊆ C

is D-reduct (reduct w.r.t. D) of C, if C is minimal subset such that

γ (C, D) = γ (C’,D)……………………(3.15)

This concept of reduct is applied to determine influence of each of condition

attribute on the decision attribute. Set of decision rules of form “if... then...

else...” can also been determined based on reducts.

Significance of attributes and approximation reducts is determined as follows:

65

Let C and D be sets of condition and decision attributes respectively, and let α

be a condition attribute, i.e., α C. The number γ (C, D) expresses degree of

dependency between attributes C and D, or accuracy of approximation of U/D

by C. We have to determine how coefficient γ (C, D) changes by removing

condition attribute α i.e., difference between γ (C, D) and γ (C – {α}, D). By

normalising the difference and defining significance of an attribute α with

following equation:

σ (C,D) (α ) =( γ (C,D) - γ (C – { α }, D))/ γ (C,D)

= 1- (γ (C – {α}, D))/ γ (C, D)………… (3.16)

denoted by σ (α), when C and D are given.

Thus σ (α) can be understood as an error classification which occurs when α

is dropped. Significance coefficient is extended as

σ (C, D) (B) = ( γ (C,D) - γ (B, D))/ γ (C,D)

= 1- (γ (C – B, D))/ γ (C, D) ………. (3.17)

denoted by σ (B), if C and D are given, where B is a subset of C. If B is reduct

of C, then σ (B) =1, i.e., removing any reduct from set of decision rules

enables one to make decisions with certainty.

Any subset B of C is called an approximate reduct of C and number

ε(C, D) (B) = (γ (C, D) - γ (B, D))/ γ (C, D) = 1- {(γ (B, D))/ γ (C, D)}

………………………. (3.18)

66

denoted simply as ε (B), is called an error of reduct approximation. It

expresses how exactly set of B attribute approximations affect set of condition

attributes C. Obviously, ε (B) = 1- σ (B) and ε (B) = 1- ε(C - B). For any subset

B of C, ε (B) ≤ ε(C). If B is reduct of C, then ε (B) =0.

The concept of approximation reduct is generalisation of the concept of

reduct. Minimal subset B of condition attributes C, such that or γ (C, D) = γ (B,

D) or ε(C, D) (B) = 0, is reduct. Idea of approximation reduct can be useful in

cases when smaller number of condition attributes is preferred over accuracy

of classification.

Decision rules are generated with reduct attributes such as If “condition” then

“decision”. The variation of “if… then… rules” are determined based on reduct

attributes. In this way rough set theory is quite useful when values in relation

are vague.

3.8. Scapegoat Trees and Max-Heaps

3.8.1. Scapegoat Trees

Scapegoat tree is a self-balancing binary search tree (BST). It was originally

discovered by Arne Andersson and again by Igal Galperin and Ronald L.

Rivest. It provides worst case O (log n) lookup time, and O (log n) amortised

insertion and deletion time.

67

Self-balancing binary search tree provides worst case O (log n) lookup time

but a scapegoat tree has no additional per-node memory overhead compared

to regular binary search tree (Bentley .J.L. (1975)). In the scapegoat tree, a

node stores only key and two pointers to the child node which make

scapegoat trees easier to implement. Due to the data structure alignment, it

reduces node overhead by up to one-third.

When something goes wrong, first thing people tend to do is find someone to

blame. He is called the “scapegoat”. After confirmation of blame, the

scapegoat is left to fix the problem. Structure of the scapegoat tree is based

on this common wisdom.

BST is weight balanced if half the nodes are on left of root, and half on right.

α-weight-balanced is therefore defined as meeting the following conditions:

size (left) <= α*size (node)……….. (3.19)

size (right) <= α*size (node)……… (3.20)

where size can be defined recursively as:

Function size (node)

if node = nil return 0; else

return size (node->left) + size(node->right) + 1;

end.

68

If α = 1, it describes linked list as balanced and if α = 0.5 it matches almost

complete binary trees, CBTs. α-weight-balanced balanced search tree must

also be α-height-balanced, that is

height (tree) <= log1/α(node count)………………………..(3.21)

Scapegoat trees do not keep α-weight-balance at all times, but many times

are loosely α-height-balance such that

height (scapegoat tree) <= log1/α (node count) + 1…………. (3.22)

For this reason scapegoat trees are similar to red-black trees since both have

restrictions on their height. They differ greatly though in their implementations

of determining where rotations (or in case of scapegoat trees, rebalances)

take place. Red-Black trees store additional 'colour' information in each node

to determine location, but scapegoat trees find “scapegoat” which isn't α-

weight-balanced to perform rebalance operation. This is similar to AVL trees,

where actual rotations depend on 'balances' of nodes, but means of

determining balance differs greatly. Since AVL trees check balance value on

every insertion/deletion, it is typically stored in each node. Scapegoat trees

calculate it only as needed, which is only when scapegoat needs to be found.

In contrast to most other self-balancing search trees, scapegoat trees are

entirely flexible as to their balancing. They support any α such that 0.5 < α < 1.

A high α value results in fewer balances, making insertion quicker but lookups

69

and deletions slower, and vice versa for low α. Therefore in all practical

applications, α is chosen depending on how frequently these actions should

be performed. An illustration of scapegoat tree is provided in Figures 3.6 and

3.7.

Figure 3.6 A scapegoat tree with 10 nodes and height 5

inserting 7 into a SGT increases its height to 6, violates the condition log3/2 q≤ log3/22n< log3/2 n+2

since 6>log3/2 11≈5.914.A scapegoat is found at the node containing 10.

Figure.3.7 Finding a scapegoat and inserting 7 at node 10

70

Scapegoat Tree (SGT) is a binary search tree (BST) (Bentley.J.L.(1975)) that,

in addition to keeping track of number, n, of nodes in the tree also keeps a

counter, int q, which maintains an upper-bound on number of nodes. At all

times, n and q obey inequalities q/2≤n≤q. In addition, SGT has logarithmic

height at all times. Height of scapegoat tree does not exceed log3/2 q≤

log3/22n< log3/2 n+2. Even with this constraint, SGT will look unbalanced. The

tree in Figure 3.9 has q=n=10 and height 5< log3/210≈5.679.

Finding a node operation (find(x)), in a scapegoat tree is done using standard

algorithm for searching in BST. This takes time proportional to height of tree

which is O (log n).

To implement add(x) operation (adding a node), first increament n and q and

then use standard algorithm for adding x to binary search tree; search for x

and then add a new leaf u with u.x=x. At this point, depth of u must not exceed

log3/2 q.

If depth (u) > log3/2 q, reduce the tree height. This is done as follows: there is

only one node, namely u, whose depth exceeds log3/2 q. To fix u, walk from u

back up to root looking for scapegoat, w. Scapegoat, w, is unbalanced node. It

has the property that (size (w.child) / size (w)) > (2/3) where w.child is child of

71

w on path from root to u. Next prove that a scapegoat exists. For simplicity, it

is taken for granted. Once the scapegoat w is found, completely destroy the

sub tree rooted at w and rebuild it into a perfectly balanced binary search tree.

Even before addition of u, w’s sub tree was not complete binary tree.

Therefore, when w is rebuilt, height decreases by at least 1 so that the height

of SGT is once again at most log3/2 q.

If the cost of finding a scapegoat w is ignored and the sub tree rooted at w is

rebuild, then running time of add(x) is dominated by initial search, which takes

O(log q) =O(log n) time. Cost is found using amortised analysis.

Implementation of remove(x) in SGT is as follow: search for x and remove it

using algorithm for removing a node from BST. It is observed that this can

never increase height of tree. Next, decreament n, but leave q unchanged.

Finally, check if q>2n and, if so, then rebuild the entire tree into a perfectly

balanced BST and set q=n. Again, if cost of rebuilding is ignored, the running

time of remove(x) operation is proportional to height of tree, and is therefore

O(log n). Subroutines for adding a node add(x) and removing a node

remove(x) is given below:

72

3.8.2. Max-Heaps

A max-heap is shown in the Figure 3.9. It is a complete binary tree in which

value in each internal node is greater than or equal to values in children of that

node. Min-heap is defined similarly. In case of max-heap, if a node is stored at

index k, then its left child is stored at index 2k+1 and its right child at index

2k+2.

Figure 3.8 A complete binary tree depicting max-heap

boolean add(T x) {

// first do basic insertion keeping track of

depth

Node<T> u = new Node(x);

int d = addWithDepth(u);

if (d > log32(q)) {

// depth exceeded, find scapegoat

Node<T> w = u.parent;

while (3*size(w) <= 2*size(w.parent))

w = w.parent;

rebuild(w.parent);} return d >= 0; }

boolean remove(T x) {

if (super.remove(x)) {

if (2*n < q) {

rebuild(r);

q = n;

}

return true;

}

return false;

}

73

Mapping elements of heap into an array, as given below, is trivial:

3.8.2.1. Building a Heap

A heap is a Complete Binary Tree (CBT). It is efficiently represented using

simple array. Given array of N values, a heap is built by “shifting” each internal

node down to its proper location as shown in the Figure 3.10. Various steps to

build a heap are as follows:

Start with the last internal node

Swap the current internal node with its larger child, if necessary

Follow swapped node down

Continue until all internal nodes are done start with last internal node

Swap current internal node with its larger child, if necessary

Follow swapped node down - continue until all internal nodes are done

Figure 3.10 Building a heap

74

3.8.2.2. Cost of Building a Heap

Start with a CBT having N nodes; number of steps required for shifting values

down will be maximised if tree is full, in which case N = 2d-1 for some integer d

= [log N]. Cost of building a heap is illustrated in the Figure 3.11.

Figure 3.11 Cost of building a heap

It is proved that in general, level k of full and complete binary tree will contain

2k nodes, and that those nodes are d – k – 1 levels above the leaves. Thus in

worst case, the number of comparisons BuildHeap() will require in building

heap of N nodes is given by

………….(3.23)

Since, at the worst, there is one swap for each two comparisons, maximum

number of swaps is N – [log N]. Hence, building heap of N nodes is O (N) in

both comparisons and swaps.

75

3.8.2.3. Heap Sort

A list is sorted by first building it into a heap, and then iteratively deleting root

node from heap until heap is empty. If deleted roots are stored in reverse

order in an array they are sorted in ascending order, if max heap is used. The

subroutine for heap sort is given below:

void HeapSort (int* List, int Size) {

HeapT<int> toSort (List, Size);

toSort.BuildHeap ();

int Idx = Size - 1;

while (! toSort.isEmpty ()) {

List [Idx] = toSort.RemoveRoot();

Idx--; } }

3.8.2.4. Cost of Heap Sort

Adding in cost of building heap total comparisons are given as,

Total Comparisons = (2N - 2[log N]) + (2N [log N] + 2[log N] - 4N

= 2N [log N]-2N …………….. (24)

Total Swaps = N [log N] – N. ………………(25)

So, in the worst case, Heap Sort is Θ (N log N) in both swaps and

comparisons.

76

3.9. MUlticriteria Satisfaction Analysis (MUSA)

MUlticriteria Satisfaction Analysis, MUSA, is an ordinal regression method to

evaluate consumer satisfaction (Grigoroudis.E. and. Siskos.Y. Christina

Diakaki (2001)). Basis of this approach is in the field of multi criteria analysis

(Nikolaos.F, Matsatsinis.E., Ioannidou.E., Grigoroudis (1999)). This method is

used for assessment of set of trivial satisfaction functions in such a way that

overall satisfaction criterion becomes as consistent as possible with

consumer’s judgments. Thus, main objective of MUSA method is aggregation

of individual judgment into collective value function.

MUSA method evaluates global and partial satisfaction functions (Joao Isabel

M, Costa Carlos A Bana e, Figueria Jose Rui (2007)) Y* and X*i respectively,

given the consumers’ judgments Y and Xi (for the i-th criterion). Ordinal

regression analysis equation has following form:

……..(3.26)

……..(3.27)

where value functions Y* and X*i are normalised in the interval [0,100], n is

the number of criteria, and bi is the positive weight of i-th criterion. It is useful

77

to assume value or tree like structure of criteria, also called as “value tree” or

“value hierarchy”.

In MUSA method an additive collective value function Y* and set of partial

satisfaction functions X*i are assumed. The main objective of this method is to

achieve maximum consistency between value function Υ* and consumers’

judgments Υ. In order to reduce the size of the mathematical program,

removing monotonicity constraints for Y * and X *i following transformation

equations are used:

…………(3.28)

…………(3.29)

This preference disaggregation methodology takes account of also post

optimality analysis stage in order to overcome the problem of model stability

(Corazza .M., Funari. S., Gusso. R., (2012)). The final solution is obtained by

exploring polyhedron of multiple or near optimal solutions, which is generated

by constraints of previous linear program. This solution is calculated by using

n linear programs which are equal to number of criteria.

78

3.9.1. Satisfaction Indices

Estimation of a performance norm is very useful in consumer satisfaction

analysis. Average global and partial satisfaction indices are used for this

purpose and are evaluated through the equations:

……………………………….(3.30)

. …….. (3.31)

where

S and Si are average global and partial satisfaction indices,

and pm and p ki are frequencies of

Consumers belonging to the ym and xik satisfaction levels, respectively.

From equation 3.30 and 3.31 it is concluded that average satisfaction indices

are basically the mean value of global and partial satisfaction functions.

Hence, these indices give average level of satisfaction value globally and per

criterion.

79

3.9.2. Demanding Indices

Global and partial satisfaction functions indicate consumers’ demanding level.

Average global and partial demanding indices, D and Di respectively, are

predictable through equations:

………………………..(3.32)

.….... (3.33)

where

α and αi are number of satisfaction levels in global and partial

satisfaction functions, respectively.

When these indices are normalized in the interval [-1, 1], the following

possible cases hold:

If D = 1 or Di = 1, then consumers have highest demanding index.

If D = 0 or Di = 0, then this case refers to “neutral” consumers.

If D = −1 or Di = −1, then consumers have lowest demanding index.

80

Demanding indices correspond to average deviation of estimated value

functions from “normal”, i.e., linear function. Average demanding indices are

used for (enhancing) the consumer behavior analysis. They can also specify

the extent of company’s improvement efforts, i.e., higher the value of

demanding index, more the satisfaction level should be improved in order to

fulfill consumers’ expectations.

Normalized variables b′i and S′i are evaluated as follows:

……….……..(3.34)

where

b = mean values of criteria weights and

S = average satisfaction indices, respectively.

Average Satisfaction Index (ASI) indicates the extent to which a consumer is

satisfied with the enterprise and Average Demanding Index (ADI) indicates the

extent to which an enterprise needs to improve to satisfy the consumer’s

demands. By improving global and sub criteria the profitability of an enterprise

is improved. Also these two indices provide new buying and selling

opportunities thereby increasing the loyalty of the consumers and extending

better services to consumers.

81

Apart from the above concepts Bayesian classification and accuracy

measures like bagging and boosting algorithms ((Han.J, Kamber.M. (2006))

were implemented in our research.

3.10. Summary:

Consumer satisfaction and loyalty are important factors of CRM for improving

the profitability of a company. For building effective CRM software .NET and

J2EE are considered as these two platforms are being widely in building

enterprise software. A comparative study is taken up to analyse the features

of these software platforms. Based on user’s preference appropriate platform

(J2EE) is considered to build CRM software, To analyse consumer buying

behavior k-means algorithm and Apriori algorithm play an important role in

business intelligence. k-means algorithm is used for consumer segmentation

and Apriori algorithm is used to study consumer buying behaviours.

Integrating MUSA method and rough set theory consumer satisfaction is

analysed in detailed manner. Fuzzy set theory is implemented to analyse

consumer loyalty.

.

82

4. METHODOLOGY

4.1. Overview

To study the identified objectives, Reliance fresh markets located in three

places of Hanamakonda city of Andhra Pradesh state in India were taken into

consideration. The research was divided into survey process and applying

data mining algorithms on the sample data. Data was collected through

various forms designed for each purpose. Depending on the applicability data

was either stored in databases, excel spreadsheets or different file formats

like .csv (comma separated values), .arff (attribute-relation file format) etc. To

analyse this data various data mining techniques of business intelligence or

visualisation techniques like line graphs, pie charts etc. were implemented.

Few algorithms were modified to improve their efficiencies.

4.2. Sampling procedures

A random sample of size 100 or more consumers were taken into

consideration in each case depending on applicability and willingness to

respond to the survey process. MUlticriteria Satisfaction Analysis (MUSA)

method, to analyse consumer satisfaction, was implemented on more than

200 consumers in 3 places of Hanamkonda city. Survey was done on

randomly selected consumers in 3 super markets of Reliance fresh located in

3 places of the city. To analyse consumer loyalty and service 120 randomly

selected consumers were considered.

83

4.3. Data Collection Techniques

Consumer profiles were collected by designing a consumer profile form shown

in Figure 4.1. A consumer website was designed to collect the consumer data

online. Consumer transaction data was collected from Reliance fresh staff and

was stored in an Excel spreadsheet.

Figure 4.1 Consumer profile form

MS Excel spreadsheet 2007, MS Access 2007 and Oracle 10g databases

were used to store the data.

To analyse the user satisfaction on .NET and J2EE platforms, a comparative

study on various parameters which are common to both platforms, was done.

Separate forms were designed to gather data on consumer satisfaction and

consumer service.

84

Which is your satisfaction level about the company? (Tick appropriate)

CS-Completely Satisfied, VS-Very Satisfied, D-Dissatisfied, S-Satisfied,

CD-Completely dissatisfied

Figure 4.2 shows the survey questionnaire designed to collect consumer

satisfaction levels for global (main) criteria of the enterprise namely the 4Ps -

personnel, product, physical appearance and place. Figure 4.3 shows survey

questionnaire form designed to collect consumer satisfaction levels for sub

criteria of the each of the global criterion.

Figure 4.2 Survey questionnaire for main satisfaction criteria

Satisfaction Criteria CS VS S D CD

1.Personnel

2.Product

3.Physical

Appearance

4.Place

85

Which is your satisfaction level about the following of the company? (Tick appropriate)

CS VS S D CD Skills/Knowledge Responsiveness Friendliness Quality Quantity Variety Prices Appearance of stores Waiting time (busy hours)

Waiting time (non-busy hours)

Service time Cleanliness Location of stores Number of stores Parking

CS-completely satisfied, VS-very satisfied, S-satisfied, D-Dissatisfied

CD-completely dissatisfied

Figure 4.3 Survey questionnaire form for sub criteria satisfaction

Opinion on consumer service data was collected using the consumer service

form shown in Figure 4.4.

86

Figure 4.4 Consumer Service Survey Form

Consumer Service Survey Customer Name : Address: Mobile Number: Email 1 2 3 4 5 Staff was available in a timely manner Staff greeted you and offered to help you Staff was friendly and cheerful throughout Staff answered your questions Staff showed knowledge of the products/services

Staff offered relevant advice Staff was well-mannered throughout Overall, how would you rate our customer service?

What did you like best about our customer service?

How could we improve our customer service?

Is there a staff person you would like to commend?

Name: Reason:

Do you prefer on line consumer support? YES/NO

Thank you for completing our customer service survey. 1. Excellent 2.Good 3.Average 4. Fair 5. Poor

87

4.4. Research Methodology

To identify “Media mix” for effective reach and quality networking, opinion of

around 200 consumers visiting Reliance fresh super market was taken. The

availability of better internet connectivity and wide use of mobile phones made

Reliance fresh opt for e-mail, SMS, live chat, e-newsletter and face to face

communication.

There are various communication channels used by businesses and these

channels are of vital importance in creating and sustaining the business. For a

business, physical presence is essential to present friendly, contactable, open

face interaction so that consumer feels comfortable. To gain the trust of a

consumer, various online channels of communication that replace face to face

were considered. As a substitute for face to face interaction with consumers,

opinion on some of the most common channels was taken which are listed

below:

e-mail is the most common and easiest way to communicate with the

consumers. When a potential consumer is interested in the products, he will e-

mail a query. Response is given in the form of text or images or attachments

of a file. Through e-mail orders can also be booked which also serves as

evidence.

88

Short Message Service (SMS) is also an important channel for the

businesses to communicate and interact with the consumers. Mobile phones

revolutionised the communication habits of the individuals and organisations

transformed it as a fundamental part of their daily activity. SMS messaging

has become a communication channel that is fast, reliable, highly effective

and instant. Some key benefits of SMS messaging are establishing one to

one communication channel with the consumer, fast, reliable and personal.

The message is instantaneously sent and received. Also, 70% of the SMS

messages are read instantly. Messages are sent to one or thousands of

recipients at same time in minutes via bulk or group SMS. It has a lower cost

than any other comparative communication medium. It saves money, time and

improves consumer experience.

Newsletters provide free information and encourage consumers to buy more

products. It works as a great customer service and a retention tool, giving

customer the satisfaction. Enterprises include new improvements to product

and intelligent content to pull in customers.

Live Chat is a novel and effective way to make selling and buying online

through company’s website. It encourages consumers with an idea of being

able to communicate immediately and get response to their queries. It

presents business as one that is proactive and technology savvy. Additionally,

89

it converts a casual web window shopper into a serious buyer more quickly

due to the time he or she spends on the site. However, with this channel it is

necessary that someone has to constantly be available at the other end. If

available only at certain times, one must put that time on the website so that

people know when to come back and do not feel frustrated if they try to chat

and find no one there.

In order to build these channels of communication and a consumer website for

storing the data online, user (here the developer) choice of software platform

was identified through the comparative study. Satisfaction opinion on various

parameters of J2EE and. NET platform was collected from various user

segments namely software executives, students and faculty who use these

two platforms. To identify the preferred platform, performance of J2EE

and.NET was compared in our lab using Load Runner tool by creating 100

virtual users on a sample application.

For identifying consumer requirements and opportunities that facilitate

increase in terms of profit margins, revenues, buying and selling, consumer

segmentation was performed with Weka tool.

90

k-means algorithm

Input:

D= {t1, t2... tn} //Set of elements

k //Number of desired clusters

Output:

k //Set of clusters

k-means algorithm:


Repeat

Assign each item ti to the cluster which has the closest mean;

Calculate new mean for each cluster;

Until convergence criteria is met;

k-means algorithm shown in Figure 4.5 was implemented for segmentation.

Consumer segmentation was done using Weka tool’s Explorer and

Knowledge Flow features. For Explorer option .csv file format was used and

for explorer .arff file format was used.

Figure 4.5 k-means clustering algorithm

For large databases the efficiency of k-means clustering algorithm was

improved by using scapegoat trees and max-heaps. This was essential since

91

consumer transaction data is increasing day by day. For large databases as

the clusters grew, the running time of traditional k-means algorithm increased

and providing fast and better quality clusters became difficult. This is because

of frequent calculation of euclidean distance which is necessary whenever a

new cluster is to be formed or to classify an object into a cluster. Hence there

was a need to improve the efficiency of k-means algorithm in case of large

databases. This was achieved using scapegoat trees and max-heaps.

The main idea behind introducing max-heaps was to sort the products in

descending order of sales so as to find the most profitable items. Then to

introduce a new product or to replace a product which was not performing

well, scapegoat tree concept was introduced.

The proposed algorithm “KCUSTMH” (k-means clustering using scapegoat

tree and max heap), shown in Figure 4.6, aimed at reducing the computational

overhead arising out of unnecessary calculation of distances between data

objects and clusters in each of the iterations. The following procedure was

adopted to implement the modified algorithm:

92

Proposed Algorithm

Initially, k data objects are chosen to serve as centroids of k initial clusters.

Then euclidean distances of each data object from these centroids are

calculated. In the next step, each data object is assigned to its nearest cluster

based on calculated euclidean distance. Then an empty scapegoat tree is

initialised. Thereon, into this tree, labels of objects as keys and max-heap

corresponding to each key as corresponding values are inserted. The max-

heap in turn contained pairs of labels of clusters and distances of their

centroids from data object (key) as its values. If in the iteration, an object

moved from one cluster to another cluster, centroids of these two clusters are

recalculated. The new distances between these two clusters and data objects

is calculated. Then the old distances saved in the max-heaps are replaced

with these new distances. This process is continued. The new distances

corresponding to only those clusters which are altered due to the movement of

data objects is then calculated. In the next iteration, maximum element of

each max-heap corresponding to each object put in scapegoat tree as key is

popped out. This popped out element is a pair of cluster label and distance of

its centroid from the object. Now cluster corresponding to this class label will

act as a new cluster for the object. Thus no recalculation of distances between

objects and clusters is required. Assume that a run of k-means algorithm

consists of only one iteration and this iteration in turn consists of only one

movement of single object. The proposed algorithm in this case, calculates the

new distances corresponding to only those two clusters which are altered due

93

to the movement of data objects. However traditional version of k-means

algorithm calculates the distances of each object with each cluster. As a

result, this new version of k-means algorithm provides huge advantage in

terms of time over traditional k-means algorithm. Final step of this new

algorithm ends in the same way as the traditional k-means algorithm i.e. when

no object moves from one cluster to other cluster in the iteration.

KCUSTMH algorithm

Input

D= {t1, t2... tn} //Set of elements

k //Number of desired clusters the key;

Output:

k //Set of clusters

KCUSTMH algorithm:


Repeat

Initialise an empty scapegoat tree;

Fill the tree with object labels as key and max-heaps as value;

Fill the max-heaps with the pairs of cluster labels and distance between the

cluster and

For each object

Repeat

Pop the topmost element i.e. the maximum element of its corresponding max heap;

If the cluster label contained in this element = the present cluster label of the object;

then

do nothing;

else

94

Figure 4.6 KCUSTMH algorithm – (k-means clustering using scapegoat tree and max heap)

Machine possessing 1 GB main memory and 1.83 GHz dual core processor

with windows XP service pack 2 as operating system was used to test the

efficiency of KCUSTMH over traditional k-means algorithm.

Move the object into the cluster corresponding to the cluster label obtained;

Calculate new centroids i.e., a scapegoat of the two clusters which have

suffered alteration i.e. the original and the new cluster of the object just

moved;

Calculate the distances of each object from these two clusters centroids;

Replace the old ones with these just calculated distances;

Until no more objects;

Pop out the maximum element of each max-heap corresponding to the object

put in the scapegoat tree as key;

This popped out element is a pair of a cluster label and distance of its

centroid from the object;

Check the cluster corresponding to this class label;

If this cluster is the same as the original cluster of the object;

then

do nothing;

else

move the object to the new cluster;

Until no object moved between clusters i.e., convergence is met;

95

Once, consumer segmentation was done it was treated as a collection of

clustered association rules (“if…then…” rules) which are used in decision

making.

Consumer buying behaviours were studied using Apriori algorithm given in

Figure 4.7. It was implemented using Weka data mining tool. Pseudo code to

implement Apriori algorithm is given in Figure 4.8.

Figure 4.7 Apriori algorithm for finding frequent itemsets

Pseudo code for implementing Apriori algorithm is as follows:

Apriori algorithm

Various steps involved in Apriori algorithm are as follows:

•Join Step: Ck is generated by joining Lk-1with itself

•Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a

frequent k-itemset.

96

Figure 4.8 Pseudo code to implement Apriori algorithm

By identifying frequent itemsets, consumer requirements and opportunities

that facilitate increase in terms of profit margins, revenues, buying and selling

are identified.

To improve consumer service, online consumer support systems and to build

loyalty to position as a competitive advantage, MUlticriteria Satisfaction

Analysis (MUSA) and rough set theory were implemented for an extensive

study of consumer satisfaction on product, personnel, place and physical

•Pseudo-code:

Ck: Candidate itemset of size k

Lk: frequent itemset of size k

L1= {frequent items};

for (k= 1; Lk! =∅; k++) do begin

Ck+1= candidates generated from Lk;

For each transaction T in database do

increament the count of all candidates in Ck+1that are contained in T

Lk+1= candidates in Ck+1with min_support

end

return ∪kLk;

97

appearance (4Ps) attributes of the enterprise. Consumer loyalty was adjudged

using grid partition method of fuzzy set theory, normalising the data values. By

analysing consumer satisfaction and loyalty opportunities that facilitate

increase in terms of profit margins, revenues, buying and selling were

identified.

Initially consumer satisfaction with the enterprise was studied using MUSA

method. Then, rough set theory was implemented for analysing consumer

satisfaction with on reduct attributes. The flowchart in Figure 4.9 shows the

implementation of MUSA method and rough set theory to study consumer

satisfaction. Once the consumer satisfaction on the enterprise was explored

consumer loyalty was next analysed.

98

NO

YES

NO

YES

YES NO

Start MUSA

Identify Global and Sub

criteria for Consumer

Satisfaction

Develop Questionnaire for

Global and Sub citeria

Satisfaction

Obtain Global Satisfaction

Satisfied?

Obtain Sub criteria

Satisfaction

Satisfied?

Consumer Satisfied

Need

Improvement? Stop MUSA

Apply Rough Set Theory

Figure 4.9 - Flowchart of MUSA method integrating with rough set theory

99

Analysing consumer loyalty

To analyse consumer loyalty age and income data were discretised and then

normalised through data transformation. Consumer loyalty was obtained from

the number of visits made and satisfaction obtained from a scale. Consumer

satisfaction and consumer loyalty were fuzzified, and linguistic values are

shown in Figure 4.10.

Based on grid partition method, k-various linguistic values with triangle-

shaped membership functions were interpreted as low (L), medium (M), and

high(H) and their fuzzy values were interpreted as very low(VL), low(L),

medium(M), high(H) and very high(VH).

Consumer satisfaction Consumer loyalty

Figure 4.10 Consumer satisfaction and Consumer loyalty

obtained from a scale

100

Data preprocessing was done in order to clear noisy and inconsistent data.

Consumer satisfaction and loyalty are fuzzified, and the linguistic values are

shown in Figure 4.10. On minimum confidence (min_conf, equation 5) and

support (min_sup) “if…then… rules” were derived to determine customer

loyalty. To identify potential consumers linear regression or Spearman’s rank

correlation (rho-ρ) is applied to find correlation between number of visits made

by consumer and value purchased by him.

To extend better consumer service opinion of 120 consumers was taken in

consumer service Survey form shown in Figure 4.4. Using Bayesian

classification (Bayes’ theorem) it was determined that “Media mix” was also

useful to extend online consumer support.

Bayes’ theorem

Let X is a data tuple. In Bayesian terms, X is considered “evidence”. It is a set

of n attributes. Let H be some hypothesis, such as that the data tuple X

belongs to a specified class C. For classification problems P(H/X) is

determined. The probability that the hypothesis H holds given the “evidence “

or observed data tuple X.P(H/X) is the posterior probability of H conditioned

on X. P(H) is prior probability. The posterior probability P(H/X), is based on

more information (e.g. customer information) than the prior probability , P(H),

which is independent of X. Similarly, P(X/H) is the posterior probability of X

101

conditioned on H.P(X) is the prior probability of X. Bayes’ theorem is useful in

that it provides a way of calculating the posterior probability, P(H/X), from

P(H),P(X/H) and P(X).Bayes’ theorem is

P(H/X)=P(X/H) P(H) / P(X) ………………………(4.1).

Accuracy Measures:

Accuracy of classifications in the above results is determined by drawing a

confusion matrix for positive and negative tuples as shown below:

Predicted class

Accuracy class

To know how well the classification is performed sensitivity measures and

specificity measures are used.

Sensitivity is true positive (recognition) rate. It is measured as proposition of

positive tuples that are correctly identified.

sensitivity = t_pos / pos…………………….(4.2)

where t_pos=number of true positives that are correctly classified.

pos = number of positive tuples…………..(4.3)

C1 C2

C1 True positives False negatives

C2 False positives True negatives

102

Specificity is true negative rate. It is measured as proposition of negative

tuples that are correctly identified.

specificity = t_neg/neg……………………..(4.4)

where t_neg=number of true negatives that are correctly classified.

neg= number of negative tuples…………………………………………….(4.5)

From above two equations precision is calculated as

precision = t_pos / (t_pos +f_pos)…………………………………………….(4.6)

where f_pos = number of false positives

Hence accuracy is defined as

accuracy = [sensitivity (pos/ (pos + neg)]+[specificity( neg (pos+ neg)]

……….(4.7)

True positives, true negatives, false positives, and false negatives are also

useful in assessing costs and benefits (or risks and gains) associated with

classification model.

Increasing the accuracy of classification

Classifier and predictor accuracy is further improved using the ensemble

methods – Bagging and Boosting.

103

Bagging Algorithm:

The bagging algorithm - create ensemble of models (classifiers or predictors)

for learning scheme where each model gives equally-weighted prediction.

Input:

D, a set of d training tuples;

K, the number of models in the ensemble;

A learning scheme (e.g., decision tree algorithm, back propagation, etc.)

Output: A composite model, M*.

Method:

1) For i=1 to k do// create k models;

2) Create bootstrap sample, Di, by sampling D with replacement;

3) Use Di to derive model, Mi;

4) End for

Using composite model on tuple X;

1) If classification then

2) Let each of the k models classify X and return the majority

vote;

3) If prediction then

4) Let each of the k models predict a value for X and return

average predicted value;

104

Boosting Algorithm: Adaboost

Boosting algorithm - create an ensemble of classifiers. Each one gives

weighted vote.

Input:

D, set of d class-labeled training tuples;

K, number of rounds (one classifier is generated per round);

Classification learning scheme.

Output: Composite model.

Method:

1) Initialise weight of each tuple in D to 1/d;

2) For i= 1 to k do// for each round;

3) Sample D with replacement according to the tuple weights to obtain

Di.

4) Use training set Di to derive a model, Mi

5) Compute error(Mi), the error rate of Mi

6) if error(Mi) > 0.5 then

7) reinitialise the weights to 1/d

8) go back to step 3 and try again;

9) end if

10) for each tuple in Di that was correctly classified do

105

11) multiply the weight of the tuple by error(Mi) /(1- error(Mi));//

update weights

12) normalise the weight of each tuple;

13) end for

Use composite model to classify tuple X:

1) initialise weight of each class to 0;

2) for i= 1 to k do// for each classifier;

3) wi = log[(1- error(Mi)/ /error(Mi)];//weight of the classifier’s vote

4) c=Mi(X);// get class prediction for X from Mi;

5) add wi to weight for class c

6) end for

7) return class with largest weight;

Class with highest sum is the “winner” and is returned as class prediction for

tuple X.

4.5. Tools Used:

a) Automation Tools :

i. MS Excel Spreadsheet 2007 For storing data

ii. Weka For implementing data

mining techniques.

106

iii. JCreator Pro V4 (Screen Shot 1) For writing and executing Java

Programs, JSP & HTML pages.

iv. Apache Tomcat V 5.0 Web Server

v. MS Access 2007, Oracle 10g Databases for storing data

vi. MS Visual Studio 2005 ASP.NET Web pages

b) Graphical Tools : Pie Chart, Bar Chart, Line graph

c) Questionnaire : Multiple Choice, Open ended

107

5. RESULTS AND DISCUSSION

To identify better communication channels i.e. “Media mix”, for new buying

and selling opportunities, effective reach and quality networking, e-mail, SMS,

newsletters and live chat were considered. These were developed in a user

friendly and customized manner using .NET/ J2EE.

Establishing good communications with consumers need a mix of better

communication channels. To identify “Media mix” for effective reach and

quality networking, opinion of around 200 consumers visiting Reliance fresh

super market was taken. The opinion of the consumer on the “Media mix” is

shown in Figure.5.1. The availability of better internet connectivity and wide

use of mobile phones made Reliance fresh opt for e-mail, SMS, Live Chat, e-

newsletter and face to face communication.

Figure.5.1 Preferred communication channels, “Media mix”, by consumers

108

Due to a growth in the use of internet and mobile phones by consumers, 62%

of them opted for e-mail, 43% of SMS, 25% Live Chat and 16 % News letter.

Consumers were willing to share their email-Ids and mobile numbers without

any fear. Still 49% of them felt that face to face interaction was the best.

Consumers opined that, these can as well be used to extend online consumer

support.

To create techniques for maintaining database of consumers with J2EE/.NET

software and also to build customised communication channels comparative

study was done on J2EE and .NET platforms. 250 students, 50 faculty

members and 50 software executives (Table 5.1 A and B) participated in the

study. User opinion on various parameters of J2EE and. NET platform was

collected and stored in Table 5.2. All of them expressed nearly the same

opinion that both are equally good. But slight majority went in favour of J2EE.

Information in Tables.5.1 A & B indicates this fact. The gaining popularity of

open source software states that results are justifiable. Hence in this research

work J2EE was preferred to build web applications.

109

Table 5.1 Responses summary on J2EE and .NET platforms

A. Overall satisfaction analysis

B. Group-wise % satisfaction

.NET J2EE Total

Complete 103 110 213

Partial 66 57 123

Not at all 2 8 10

Total 171 175 346

Platform Students Software Executives

Faculty

J2EE 90 92 97

.NET 91 89 94

110

Table 5.2 User satisfaction opinion of J2EE and .NET platforms

Average Satisfaction Index

Average Demanding Index

(%) (%) S NO

PARAMETERS Wt (%)

J2EE .NET J2EE .NET

1 Simplicity of the language 53.33 90.6 93.1 -53.2 -51.3

2 Architecture 43.33 92.5 83.6 -64.2 -62.1

3 Object oriented Concepts 86.67 95.2 94.8 -21.3 -22

4 Support technologies 73.33 91.8 90.4 -69.2 -64.4

5 Presentation tier technologies 50.6 86.2 88.3 -92 -90.9

6 Middle tier technologies 46.67 91.6 90.9 -92 -88.5

7 Data tier technologies 63.33 71.8 71.7 -20 -33.7

8 Frame Work Technologies 40.7 88.9 90.8 -13.3 -11

9 Maturity 80.23 71.5 68.6 -30.19 -29.8

10 Interoperability and Web Services

46.67 88.5 90.9 -70.2 -69.4

11 Scalability of applications 65.3 96.9 94.1 -89.9 -28.8

12 Portability 68.2 95.5 93.3 -94.8 -62.7

13 Client device independence 57.3 90.4 90.7 -71.5 -93.4

14 Cost of developing

applications

63.08 91.9 80.1 -54.4 -42.3

15 Performance level of

applications

66.67 74.6 73.9 -19.1 -12.9

Overall 87.86 86.35 -57.66 -50.88

111

The user satisfaction opinion of J2EE and .NET on various parameters is

shown in Table 5.2. Analysis of the information in Table 5.2 revealed the

following:

Average satisfaction index is calculated as the mean value of various

parameters. This indicated the extent of satisfaction on each of the

platforms. Higher the value more is the satisfaction level.

Higher the value of demanding index more the satisfaction level should

be improved to fulfill expectations of the users. That means that users

demanded more improvement on these platforms.

Based on the users’ opinion a customer website was built using Java Server

Pages (JSP code) to store consumer data. Provision to send an e-mail and

SMS to the consumers was also provided in the website.

With JSP, as a developer, it had been easy to develop web pages without

having to know Java programming language or to know anything about writing

servlet code. Hence it was possible to concentrate on writing HTML code

while concentrating on creating objects and application logic.

112

The following observations were made while using JSP to build the consumer

website:

Using HTML and XML with JSP code was easy.

Compiling JSP code and making updates to the presentation code

was easy.

Invoking Java Bean components managed these components

completely shielding the complexity of application logic.

Changing and editing of fixed template portions of web pages was

possible without affecting the application logic.

Similarly, changing the logic without editing JSP code was possible

at the component level.

One major advantage of JSP was its platform independent feature whereas

ASP.NET was attached to the Microsoft technology. ASP.NET pages run only

on IIS but JSP pages run with Tomcat Apache web server, Web Logic,

Glassfish (Net Beans) etc. JSP response time was significantly faster than

ASP.NET especially when the number of user requests was increasing.

ASP.NET runs only on IIS whereas it was possible to host JSP pages on

different web servers. IIS was not compatible with some browsers like Mozilla

etc whereas JSP was compatible with nearly almost all browsers. Drivers

needed to be installed and connectivity needed to be established for building

database support applications. Data type errors were difficult to identify while

building .NET or JSP applications.

113

After a thorough study of J2EE and .NET platforms, following advantages

were identified in the Java Mail over .NET to build a consumer website with

communication channels like e-mail, SMS, live chat etc.:

Using Java Mail receiving and sending e-mails through website was

easy.

Writing e-mail programs using SMTP, POP and IMAP protocols was

easy.

Creating framework, sending and receiving messages was done without

much difficulty using set of abstract classes in API.

Accessing mail folders, downloading and sending messages with

attachments was done without much difficulty using Java Mail methods

and classes.

Without an in-depth knowledge of e-mail it was easy to create cross-

platform mail application using framework.

For accessing mail folders, downloading the messages and sending

messages with attachments and filter mail, there were corresponding

methods and classes.

The following Java Mail API Packages were used to develop customised

mails:

114

javax.mail Java Mail API provided classes that model mail system

javax.mail.event Provided listeners and events for Java Mail API

javax.mail.internet Consisted classes specific to Internet mail systems

javax.mail.search Contained message search terms for Java Mail API

javax.mail.util Java Mail API utility classes

Using JSP had a number of advantages over many of its alternatives like

PHP, Cold Fusion, Flex etc. These advantages are as follows:

JSP code was best suitable to implement the presentation page layer

components.

Business logic and presentation logic was separated without much

difficulty.

Presentation skills were sufficient and in-depth java knowledge was not

required.

If any changes were made to JSP, there is no need to recompile and

reload.

Development time was also reduced.

As web developers and designers it was easy to maintain information-rich,

dynamic web pages with JSP technology. Web based applications which were

platform independent were developed rapidly using JSP technology. JSP

technology enabled the changes to overall page layout without altering the

underlying dynamic content and by separating user interface from the content

generation.

115

JSP uses XML-like tags that encapsulate the logic. Application logic resides in

server-based resources and JSP page was accessed with HTML/XML tags.

HTML or XML tags were passed directly back to the response page. By

separating page logic from its design, display and supporting reusable

component-based design JSP technology makes it faster and easier to build

web-based applications.

It was observed that JSP was well suited for building enterprise applications.

Being an open source to the developer community, JSP interface supported

many web and application servers. JSP pages had the property of "Write

Once Run Anywhere" (WORA).

The advantages of using JSP over competing technologies like PHP,

ASP.NET, Flex, Cold Fusion, etc., are summarised as follows:

Business logic and presentation logic were separated from one another.

Javascript was not limited to a specific platform.

It had full access to the server-side resources that are an integral part of

J2EE architecture.

116

The performance of J2EE and.NET was compared in our lab using Load

Runner tool by creating 100 virtual users on a sample application. The front

end for J2EE was a JSP form and for .NET it was an ASP.NET form. As back

end, MS Access database (and Oracle 10g) was used. Table 5.3 shows the

performance results of ASP.NET and J2EE.

Table 5.3 Memory utilization and response time of ASP.NET and J2EE

Performance results of J2EE and ASP.NET in Table 5.3 proved that J2EE

was a better choice when compared to ASP.NET for building web

applications.

Virtual users

Memory utilisation

Response time

(MB) (msec) ASP.NET J2EE ASP.NET J2EE

1 1012 649 3.018 1.074

20 1105 685 3.341 3.322

40 1181 738 49.121 3.282

60 1202 833 56.975 3.161

80 1295 937 71.741 3.052

100 1314 1065 88.415 5.368

117

For identifying consumer requirements and opportunities that facilitate

increase in terms of profit margins, revenues, buying and selling, as an initial

step, consumer segmentation was performed on the gender attribute of

consumer data with Weka tool. k-means algorithm was implemented on

sample consumer transaction data given in Figure 5.2 as .csv file.

Figure.5.2 Consumer transaction data in .csv file format

Prod_no,Prod_name,Quantity,Gender,Date_of_ purchase,Brand_name 1,Shampoo,7,F,28/12/2012,Clinic Plus 2,Shampoo,5,M,28/12/2012,Garnier 3,Hair Conditioner,6,F,28/12/2012,Dove 4,Sugar,7,M,28/12/2012,Reliance 5,Flour,10,F,28/12/2012,Ashirwad 6,Toothpaste,5,M,29/12/2012,Meswak 7,Toothpaste,6,M,29/12/2012,Colgate 8,Toothbrush,5,F,30/12/2012,Colgate 9,Shampoo,7,M,30/12/2012,Meera 10,Hair Conditioner,6,F,30/12/2012,Sunsilk 11,Toothpaste,5,F,30/12/2012,CloseUp 12,Toothbrush,5,M,30/12/2012,OralB 13,Biscuits,10,M,31/12/2012,Good-day 14,Chocolates,25,M,31/12/2012,Amul 15,Chocolates,12,F,31/12/2012,5 star 16,Shampoo,5,F,31/12/2012,Garnier 17,Hair Conditioner,5,F,31/12/2012,Dove 18,Chewing gum,5,M,1/1/2013,Boomerag 19,Oil,5,F,1/1/2013,Saffola 20,Salt,4,M,1/1/2013,Ashirwad

118

Consumer details were stored in a comma separated value (.csv) file format.

Using Weka tool the customer data containing prod_no, prod_name, quantity,

gender and date_of_purchase was analysed. k-means clustering and Apriori

algorithms were implemented using the Weka data mining tool. The run

information of k-means algorithm for gender attribute is shown Figure 5.3.

== Run information ===

Scheme: weka.clusterers.SimpleKMeans -N 2 -A

"weka.core.EuclideanDistance -R first-last" -I 500 -S 10

Relation: purchases-

weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-last

Instances: 20

Attributes: 6

Prod_no

Prod_name

Quantity

Date_of_ purchase

Brand_name

Ignored:

Gender

Test mode: Classes to clusters evaluation on training data === Model and evaluation on training set === kMeans ====== Number of iterations: 2 Within cluster sum of squared errors: 68.0 Missing values globally replaced with mean/mode

119

Figure 5.3 Run information of k-means clustering performed on gender

attribute using Weka

Cluster centroids:

Cluster#

Attribute Full Data 0 1

(20) (14) (6)

=======================================================

Prod_no 1 6 1

Prod_name Shampoo Toothpaste Shampoo

Quantity 5 5 7

Date_of_ purchase 8/12/2012 11/12/2012 8/12/2012

Brand_name Garnier Colgate Clinic Plus

Time taken to build model (full training data) : 0.17 seconds

=== Model and evaluation on training set ===

Clustered Instances

0 14 ( 70%)

1 6 ( 30%)

Class attribute: Gender

Classes to Clusters:

0 1 <-- assigned to cluster

7 3 | F

7 3 | M

Cluster 0 <-- F

Cluster 1 <-- M

Incorrectly clustered instances: 10.0 50 %

120

In Figure 5.3 cluster centroids were Prod_no 1 and 6. Two clusters- cluster 0

with 14 instances and 1 with 6 instances was generated. In cluster 0 there

were 7 males and 7 females and in cluster 1 there were 3 males and 3

females. In cluster 1 Colgate toothpaste was primary product whereas in

cluster1 it was clinic plus shampoo.

k-means clustering was also implemented using knowledge flow feature of

Weka and the run information is given in Table 5.4. The number of clusters

was predefined as 6 and clustering was performed on date of purchase using

sample data in Figure 5.3. The input data file had an extension of .arff

@relation custtran_clustered

@attribute Instance_number numeric

@attribute Prod_no numeric

@attribute Prod_name {Shampoo 'Hair Conditioner' Sugar Flour Toothpaste Toothbrush Biscuits Chocolates 'Chewing gum' Oil Salt}

@attribute Quantity numeric

@attribute Gender {F M}

@attribute 'Date_of_ purchase' {28/12/2012 29/12/2012 30/12/2012 31/12/2012 1/1/2013}

@attribute Brand_name {'Clinic Plus' Garnier Dove Reliance Ashirwad Meswak Colgate Meera Sunsilk CloseUp OralB Good-day Amul '5 star' Boomerag Saffola}

@attribute Cluster {cluster0 cluster1 cluster2 cluster3 cluster4 cluster5}

@data

121

Table 5.4 Consumer data segmented into 6 clusters

S NO

PROD NO

PRODUCT NAME

QTY GENDER PURCHASE DATE

CLUSTER CENTROID

CLUSTER

4 5 Flour 10 F 28/12/2012 Ashirwad cluster0

12 13 Biscuits 10 M 31/12/2012 Good-day cluster1

13 14 Chocolates 25 M 31/12/2012 Amul cluster1

14 15 Chocolates 12 F 31/12/2012 '5 star' cluster1

7 8 Toothbrush 5 F 30/12/2012 Colgate cluster2

10 11 Toothpaste 5 F 30/12/2012 CloseUp cluster2

15 16 Shampoo 5 F 31/12/2012 Garnier cluster2

18 19 Oil 5 F 1/1/2013 Saffola cluster2

2 3 'Hair Conditioner'

6 F 28/12/2012 Dove cluster3


6 F 30/12/2012 Sunsilk cluster3


5 F 31/12/2012 Dove cluster3

0 1 Shampoo 7 F 28/12/2012 'Clinic Plus' cluster4

1 2 Shampoo 5 M 28/12/2012 Garnier cluster4

3 4 Sugar 7 M 28/12/2012 Reliance cluster4

5 6 Toothpaste 5 M 29/12/2012 Meswak cluster4

6 7 Toothpaste 6 M 29/12/2012 Colgate cluster4

8 9 Shampoo 7 M 30/12/2012 Meera cluster5

11 12 Toothbrush 5 M 30/12/2012 OralB cluster5

17 18 'Chewing gum'

5 M 1/1/2013 Boomerag cluster5

19 20 Salt 4 M 1/1/2013 Ashirwad cluster5

122

The efficiency of k-means algorithm was increased to reduce running time on

large databases and also to get quality clusters. The KCUSTMH algorithm

reduced the running time from 25 seconds to approximately 17 seconds on a

database containing around 3500 transactions with 6 attributes. Machine

possessing 1 GB main memory and 1.83 GHz dual core processor with

windows XP service pack 2 as operating system was used. Figure 5.4 shows

the efficiency of KCUSTMH over traditional k-means algorithm.

Figure 5.4 Graph showing efficiency of KCUSTMH and

traditional k-means algorithm

Once, consumer segmentation was done it was treated as collection of

clustered association rules which are used in decision making. A two-

dimensional grid was formed with a set of two-attributes (in this case age and

salary) and corresponding association rules were determined as shown in

Figure 5.5. The goal was to find clusters that cover association rules within

123

this grid. These clusters represented association rules and also defined the

segmentation. Once association rules were discovered for a particular level of

support and confidence, grid of only those rules that gives information about

this group was formed.

The following four association rules were considered where RHS attribute

“Group label" was given value “1".

R1. (age = 27) ^ (salary = 41450) => (Group label = 1)




Age bins are assigned a1, a2,…, an and salary bins are assigned s1,

s2,…,sn, then these rules were binned to form corresponding binned

association rules:

R1. (age = a3) ^ (salary = s5) => (Group label = 1)




Rules R1 through R4 were represented with a grid as shown in Figure 5.6.

Linear, adjacent cells were combined to form line segment, and this idea was

extended to rectangular regions. Representing the rules in the rectangular

grid transformed all four association rules into cluster of one rule as:

124

(a3 ≤ age < a4) ^ (s5 ≤ salary < s6) => (Group label = 1)

Figure 5.5 Clustering association rules using 2-D grid

Assuming the bin mappings shown in Figure 5.5, the final clustered rule output

was:

(27 ≤ age < 29) ^ (40000 ≤ salary < 60000) => (Group label = 1)

These associaton rules were then converted into a decsion tree for making

effective decisions.

Once consumer data was segmented next step was to study consumer buying

behaviours using Apriori algorithm which is given below:

salary

s7 70-80K

s6 50-60K

s5 40-50K

s4 30-40K

s3 20-30K

s2 10-20K

s1 <10 K

25 26 27 28 29 30

a1 a2 a3 a4 a5 a6

age

125

All data that is recorded in transaction database is fed as input for Apriori

algorithm which was implemented using Weka. Association rules were

generated on a given support and confidence measures. Association rules are

adopted to discover interesting relationship of purchased products and to gain

knowledge of transactions in a large dataset. Apriori is designed to operate on

databases containing transactions. Analysis of the run information of Weka

given in Figure 5.6 had given knowledge of frequent itemsets purchased by

the consumer for a given support and confidence.

=== Run information ===

Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1

Relation: purchases-weka.filters.unsupervised.attribute.NumericToNominal-Rfirst-last

Instances: 20

Attributes: 6

Prod_no

Prod_name

Quantity

Gender

Date_of_ purchase

Brand_name

=== Associator model (full training set) ===

Apriori

=======

Minimum support: 0.1 (2 instances)

Minimum metric <confidence>: 0.9

126

Figure 5.6 Run information of Apriori algorithm implementation using Weka

Number of cycles performed: 18

Generated sets of large itemsets:

Size of set of large itemsets L (1): 20



Generated sets of large itemsets:




Best rules found:

1. Prod_name=Hair Conditioner 3 ==> Gender=F 3 conf :(1)

2. Brand_name=Garnier 2 ==> Prod_name=Shampoo 2 conf :(1)

3. Brand_name=Dove 2 ==> Prod_name=Hair Conditioner 2 conf :(1)

4. Date_of_ purchase=29/12/2012 2 ==> Prod_name=Toothpaste 2 conf :(1)

5. Prod_name=Toothbrush 2 ==> Quantity=5 2 conf :(1)

6. Prod_name=Toothbrush 2 ==> Date_of_ purchase=30/12/2012 2 conf :(1)

7. Prod_name=Chocolates 2 ==> Date_of_ purchase=31/12/2012 2 conf :(1)

8. Brand_name=Garnier 2 ==> Quantity=5 2 conf :(1)

9. Brand_name=Dove 2 ==> Gender=F 2 conf :(1)

10. Date_of_ purchase=29/12/2012 2 ==> Gender=M 2 conf :(1)

127

Analysis of consumer buying behaviours enabled the company to improve

support of their consumer oriented business processes, which was aimed to

improve overall performance of the enterprise.

Data mining methodology is very useful to extract hidden knowledge and

information. Significant product association rules are also identified within

each segment by applying correlation analysis using . Product association

correlation analysis using is determined to know whether associations are

positively or negatively correlated with one another. Product association rules

are used to motivate consumers to increase their purchases and keep them

loyal to the company. Behaviour of consumers is easily identified. Association

mining of frequent itemsets is also performed using vertical data format as

shown in Table 3.5. Mining is performed on consumer data to find which of

the transactions involve 2-itemsets, 3-itemsets etc. and largest itemsets for a

given support and confidence. Transactions are then analysed for effective

decision making.

Apriori algorithm was helpful to find best association rules. Most frequent

itemsets were easily found from the consumer database. It helps in product

and price bundling. A dashboard given in Figure 5.7 was built to identify items

generating best sales. This dashboard also identified which items had better

buying and selling opportunities. The dashboard was built using MS Excel

2007 spreadsheet using traffic light signals option. Green lights indicated best

128

selling items and red lights indicated items that were not selling well. Average

selling items were indicated by orange lights.

Figure 5.7 Dashboard projecting the behavior of sales of different items

To improve consumer service, online consumer support systems and to build

loyalty to position as a competitive advantage consumer satisfaction was

initially studied with the enterprise using MUSA method. MUSA (MUlticriteria

Satisfaction Analysis) method was integrated with the rough set theory for

analysing consumer satisfaction with the enterprise on reduct attributes.

129

MUSA method was implemented on more than 200 consumers in 3 places of

Hanamkonda city. Survey was done on randomly selected consumers in 3

super markets of Reliance fresh located in 3 places of the city.

Initially a set of global satisfaction criteria was identified on which the

consumers were pondering upon to express their satisfaction on the

enterprise. The global criteria identified here were product, personnel, physical

appearance of malls and place which were termed as 4Ps.

Global consumer satisfaction criteria considered here were 4Ps which had the

following characteristics or sub criteria, namely:

1. Personnel: This criterion included all characteristics relating to

personnel, (their skills and knowledge, responsiveness, friendliness,

communication and collaboration with consumers, etc).

2. Product: This criterion refers mainly to offered products (quality and

quantity, variety of products, and prices).

3. Physical Appearance: This criterion refers to service offered to

consumers. It includes appearance and cleanliness of the stores,

waiting time during busy and non-busy hours, and service time.

130

4. Place: Location and number of stores and parking availability are

included in this criterion.

A survey questionnaire form as shown in Figure.4.2 with 5 levels of

satisfaction was designed to identify the global satisfaction levels on 4Ps. The

5 levels of satisfaction were CS-Completely Satisfied, VS-Very much

Satisfied, D-Dissatisfied, S-Satisfied and CD-Completely dissatisfied.

Table 5.5 shows the satisfaction opinion of 20 sample consumers obtained on

1.Personnel, 2.Product, 3.Physical Appearance and 4.Place criteria.

131

Table 5.5 Survey data of sample 20 consumers on global criteria

Consumer 1 2 3 4 1 CD S S S

2 VS D S S

3 D VS VS VS

4 S S S S

5 D S S S

6 S VS VS VS

7 CD CS D VS

8 S CS VS CS

9 S VS S VS

10 D VS S S

11 CS CD S VS

12 VS VS D VS

13 D VS S S

14 S CS D VS

15 D VS S S

16 S S S S

17 VS S S VS

18 S VS S VS

19 S S S S

20 S VS S S

132

Table 5.6 Overall satisfaction results on global criteria

Using simple additive formulas as mentioned in MUSA method, average

satisfaction index (ASI) and average demanding index (ADI) were calculated

as shown in Table 5.6. From the results obtained from Table 5.6 it was

concluded that average global satisfaction index is approximately 91%, while

company’s performance according to whole set of criteria varied between 87%

and 92%. Even though satisfaction values were encouraging, considering the

high competitive conditions of the market, this performance was not relatively

high.

Sno Criteria Weight (%) ASI

(%) ADI (%)

1 Product 45.20 86.56 -68.61

2 Physical Appearance 25.00 88.08 -66.21

3 Personnel 22.00 92.44 -75.20

4 Place 10.70 87.19 -85.01

Global satisfaction 90.81 -73.20

133

The following are the findings from analysing the above results:

Average global satisfaction index was approximately 90%, while

company’s performance according to whole set of criteria varied

between 86% and 92%.

Given high competitive conditions of market, this performance could not

be considered relatively high.

Consumers considered “Product” as most important criterion, with

significant importance level of 45.2%. They did not give much

importance to remaining criteria.

Low weight for “Place” criterion stated that consumers were least

bothered about parking facilities and location and also main competitors

had no better performance in this particular criterion.

Consumer satisfaction opinion on each of the sub criteria of the global criteria

is given in Table 5.7. Along with the global criteria consumers were also

adjudged for satisfaction on sub criteria for which a separate questionnaire on

sub criteria was given to them. From the results on sub criteria satisfaction the

following conclusions were made on the consumer satisfaction.

“Personnel’s friendliness” added up to competitive advantage of company.

One personnel sub criteria that was most appealing to most consumers

was “dress code” through which sales executives were easily

distinguishable.

134

“Quality” of product was one strongest point of the company, although

consumers did not seem to be satisfied according to quantity of product.

This result was related to low satisfaction index appearing for “Price”

criterion.

Major consumers opined that attention should be paid to waiting time

during busy hours (between 7:30 AM and 10:30 AM) and service time as

well.

Alternatively, appearance of malls (infrastructure and arrangement of

products) seemed to be the competitive advantage for company.

Satisfaction level with respect to “Place” criterion could have been higher, if

consumers were provided proper parking facilities (there is no proper

parking facility).

135

Table 5.7 Sub criteria satisfaction results

Sno Sub criteria Weight (%)

ASI (%)

ADI (%)

1 Skills/Knowledge 34.00 93.10 -71.30

2 Responsiveness 15.90 83.60 -62.10

3 Friendliness 50.10 94.80 -82.00

4 Quality 49.80 90.40 -84.40

5 Quantity 48.30 88.30 -40.90

6 Variety 25.00 90.90 -68.50

7 Prices 11.90 71.70 -33.70

8 Appearance of stores 42.80 90.80 -81.00

9 Waiting time (busy hours) 8.50 68.60 -29.80

10 Waiting time (non-busy

hours)

19.20 90.90 -69.40

11 Service time 8.30 74.10 -28.80

12 Cleanliness 21.20 93.30 -62.70

13 Location of stores 87.10 90.70 -93.40

14 Number of stores 6.80 81.10 -42.30

15 Parking 6.10 43.90 -12.90

136

Overall analysis of results on main and sub criteria proved that consumers

were very much satisfied with Reliance fresh but owing to high competitive

conditions of the market especially from malls like Spencer’s, Aditya Birla’s

More, etc. this performance could not be considered relatively high. Average

demanding index of the results revealed this fact.

Rough set theory was implemented on the consumer data shown in Table 5.8.

It was initialised in the following way: Given data was considered as a set of

all consumer objects B.

Given the set

B = {C1, C2, C3, C4, C5,C6, C7, C8, C9, C10, C11, C12, C13, C14, C15,

C16, C17, C18, C19, C20}

Set conditional attributes are represented by

C= {Personnel, Product, Physical Appearance} and

Set D represented decision attribute,

where D= {Satisfaction}.

137

Table 5.8 Sample consumer data with condition and decision attributes

Conditional Attributes Decision Attribute

Consumer Personnel Product Physical Appearance

Satisfaction

C1 D D G No

C2 D D VG No

C3 D D E Yes

C4 D S VG Yes

C5 D S E Yes

C6 S S VG Yes

C7 S S E Yes

C8 D D VG No

C9 S D E Yes

C10 S D VG No

C11 S D E No

C12 D S G No

C13 D S VG Yes

C14 D S G No

C15 S S G No

C16 S D G No

C17 S D VG No

C18 S S E Yes

C19 S D G No

C20 D S G No

138

Nominal values of given attributes are specified in Table 5.9:

Table 5.9 Nominal values of the consumer sample data

Indiscernibility relation is the relation between two objects or more, where all

values are identical in relation to subset of considered attributes. In the Table-

5.9, it was observed that set C was composed of attributes that were directly

related to consumers’ preferences, namely C= {Personnel, Product, Physical

Appearance}. Indiscernibility relation is given by INDA(C). Data in the Table

5.8 was then broken down into 3 conditional attributes. Because of low weight

4th attribute was not considered.

Attributes Nominal Values Conditional Attributes

Personnel Dissatisfied (D),

Satisfied (S)

Product D Dissatisfied,

Satisfied (S)

Physical Appearance

Good(G),

Very Good(VG),

Excellent (E)

Decision Attributes

Satisfaction Yes

No

139

Personnel attribute generated two indiscernibility elementary sets:

INDA ({Personnel}) = {{C1, C3, C4, C5, C8, C12, C13, C14, C20},

{C6, C7, C9, C10, C11, C15, C16, C17, C18, C19}}.

Product attribute generates two indiscernibility elementary sets:

INDA ({Product}) = {{C1, C2, C3, C8, C9, C10, C11, C16, C17, C19},

{C4, C5, C6, C7, C12, C13, C14, C15, C18, C20}}

Physical Appearance attribute generates three indiscernibility elementary sets:

INDA ({Physical Appearance}) = { {C2, C4, C6, C8, C10, C13, C17},

{C3, C5, C7, C9, C11, C18},

{C1, C12, C14, C15, C16, C19, C20}}

Data in the Table 5.8 was rearranged based on decision attribute as shown in

Table 5.10.

140

Table 5.10 Sample consumer data organized w.r.t. decision attribute

Customer Personnel Product Physical Appearance

Satisfaction

C1 D D G No

C2 D D VG No

C8 D D VG No

C10 S D VG No

C11 S D E No

C12 D S G No

C14 D S G No

C15 S S G No

C16 S D G No

C17 S D VG No

C19 S D G No

C20 D S G No

C3 D D E Yes

C4 D S VG Yes

C5 D S E Yes

C6 S S VG Yes

C7 S S E Yes

C9 S D E Yes

C13 D S VG Yes

C18 S S E Yes

141

Lower and upper approximations of the set were interior and closure

operations in the topology generated by indiscernibility relation.

Approximations were applied on Table 5.8, from which it was observed as

follows:

Lower Approximation set B*

- Lower Approximation set (B*) of consumers who were definitely satisfied

were identified as B* = {C3, C4, C5, C6, C7, C13, C18}

- Lower Approximation set (B*) of consumers who certainly had no satisfaction

were identified as B* = {C1, C2, C8, C10, C12, C14, C15, C16, C17, C19,

C20}

Upper Approximation set B*

- Upper Approximation set (B*) of consumers who had satisfaction were

identified as B* = {C3, C4, C5, C6, C7, C9, C13, C18}

- Upper Approximation set (B*) of consumers who had no satisfaction were

identified as B*= {C1, C2, C8, C10, C11, C12, C14, C15, C16, C17, C19, C20}

Boundary Region (BNB(X))

- Boundary Region (B*) of consumers who had no satisfaction were identified

as:

BNB(X) ={C1,C2,C8,C10,C11,C12,C14,C15,C16,C17,C19,C20} –

{C1, C2, C8, C10, C12, C14, C15, C16, C17, C19, C20} = {C11};

142

Boundary Region (B*), set of consumers who had satisfaction were identified

as:

BNB(X) = {C3,C4,C5,C6,C7, C9, C13,C18} - {C3,C4,C5,C6,C7,C13,C18}

= {C9}

Boundary Region (BNB(X)), set constituted by elements C9 and C11, could

not be classified, since they possessed same characteristics, but differ in

decision attribute.

Two coefficients of quality of approximation are calculated as follows:

Imprecision coefficient -

• with likelihood of satisfaction αB(X) = 7/8;

• with likelihood of no satisfaction αB(X) = 8/12.

Quality coefficient of upper approximation

• αB (B*(X)) =8/20, for consumers who had likelihood of satisfaction.

• αB (B*(X)) =11/20, for consumers who had likelihood of no satisfaction.

Quality coefficient lower approximation

• αB (B*(X)) =7/20, for consumers who had likelihood of satisfaction;

• αB (B*(X)) =8/20, for consumers who had likelihood of no satisfaction.

143

Observations:

Consumers with satisfaction: αB (B*(X)) =7/20, that is, 35% of

consumers were with likelihood of satisfaction.

Consumers who don't have satisfaction: αB(B*(X)) = 11/20, that is,

approximately 55% of consumers with likelihood of no satisfaction.

10% of consumers (C9 and C11) could be classified neither with

satisfaction nor with no satisfaction, since characteristics of all attributes

were same, with only decision attribute (satisfaction) not being identical.

This generated inconclusive analysis for satisfaction.

`

Data reduction

Data presented in Table 5.8 must guarantee that redundancy is avoided as it

involves minimisation of complex computations in relation to creation of rules

to aid knowledge extraction. Redundancies in Table 5.8 were treated using

reduct concept, without altering indiscernibility relations. A reduct is a set of

necessary minimum data, given that original proprieties of the system or

information table were maintained. Hence, reduct has the capacity to classify

objects, without altering the form of representing knowledge.

Verifying inconclusive data

Analysis of data contained in Table 5.8 showed that preferences of consumer

C9 and consumer C11 are both inconclusive, since they possess equal values

144

of conditions attributes together with value of decision attribute that was

different. Therefore, data of consumer C9 and consumer C11 was excluded

from Table- 5.8.

Verifying equivalent information

Analysis of data contained in Table5.8 showed that it possessed equivalent

information as given below:

C2 D D VG No

C8 D D VG No

C4 D S VG Yes

C13 D S VG Yes

C7 S S E Yes

C18 S S E Yes

C10 S D VG No

C17 S D VG No

C12 D S G No

C14 D S G No

C20 D S G No

C16 S D G No

C19 S D G No

145

Hence, by analysing above data it was concluded that Table 5.8 is presented

in the form of reduced data as shown in Tables 5.11 through 5.14:

Process of reduction of Table 5.8 is presented below and, it was observed that

data was of discrete type.

Table 5.11 Reduct Information of sample consumer data


Satisfaction

C1 D D G No

C2 D D VG No

C3 D D E Yes

C4 D S VG Yes

C5 D S E Yes

C6 S S VG Yes

C7 S S E Yes

C8 D D VG No

C10 S D VG No

C12 D S G No

C15 S S G No

C16 S D G No

C19 S D G No

146

Table 5.12 Analysis of condition attributes with Personnel criteria

Customer Personnel Satisfaction C1 D No

C2 D No

C3 D Yes

C4 D Yes

C5 D Yes

C6 S Yes

C7 S Yes

C8 D No

C10 S No

C12 D No

C15 S No

C16 S No

C19 S No

147

Table 5.13 Analysis of condition attributes with Product criteria

Customer Product Satisfaction C1 D No

C2 D No

C3 D Yes

C4 S Yes

C5 S Yes

C6 S Yes

C7 S Yes

C8 D No

C10 D No

C12 S No

C15 S No

C16 D No

C19 D No

148

Table 5.14 Analysis of Physical Appearance attribute

Customer Physical Appearance

Satisfaction

C1 G No

C2 VG No

C3 E Yes

C4 VG Yes

C5 E Yes

C6 VG Yes

C7 E Yes

C8 VG No

C10 VG No

C12 G No

C15 G No

C16 G No

C19 G No

From this analysis of reduct attributes as tabulated in Tables 5.12, through

5.14 it was concluded that no data was excluded.

Analysis of condition attributes in Table 5.8, revealed that same data existed

in Tables 5.15 through 5.21.

149

Table 5.15 Analysis of attributes Personnel and Product

Customer Personnel Product Satisfaction C1 D D No

C2 D D No

C3 D D Yes

C4 D S Yes

C5 D S Yes

C12 D S No

C6 S S Yes

C7 S S Yes

C8 D D No

C16 S D No

C19 S D No

C10 S D No

C15 S S No

150

Table 5.16 Analysis of Personnel and Physical Appearance

Customer Personnel Physical Appearance

Satisfaction

C1 D G No

C12 D G No

C2 D VG No

C8 D VG No

C3 D E Yes

C5 D E Yes

C7 S E Yes

C4 D VG Yes

C6 S VG Yes

C10 S VG No

C15 S G No

C16 S G No

C19 S G No

151

Table 5.17 Analysis of Attributes Product and Physical Appearance

Customer Product Physical Appearance

Satisfaction

C1 D G No

C16 D G No

C19 D G No

C2 D VG No

C8 D VG No

C10 D VG No

C3 D E Yes

C4 S VG Yes

C6 S VG Yes

C5 S E Yes

C7 S E Yes

C12 S G No

C15 S G No

The reduct information of tables 5.15 through 5.17 is generated as given in

Tables 5.18through 5.20 and the entire reduct of all the data is given in Table

5.21.

152

Table 5.18 Reduct of Personnel and Product

Customer Personnel Product Satisfaction C1 D D No

C3 D D Yes

C4 D S Yes

C6 S S Yes

C10 S D No

C15 S S No

Table 5.19 Reduct of Personnel and Physical Appearance

Customer Personnel Physical Appearance

Satisfaction

C1 D G No

C2 D VG No

C3 D E Yes

C4 D VG Yes

C6 S VG Yes

C10 S VG No

C15 S G No

153

Table 5.20- Reduct of Product and Physical Appearance

Customer Product Physical Appearance Satisfaction

C1 D G No

C2 D VG No

C3 D E Yes

C4 S VG Yes

C5 S E Yes

C12 S G No

Table 5.21 Final reduct information of sample consumer transactions

Decision rules

The information reduct in Table 5.21., generated necessary decision rules

R1,R2 and R3 to aid satisfaction analysis of consumers.


Satisfaction

C1 D D G No

C3 D D E Yes

C4 D S VG Yes

154

Rule-1

R1: If (for) Consumer, Personnel = “D” and Product = “D” and Physical

Appearance = “G” then Satisfaction= “No”.

Rule-2

R2: If (for) Consumer, Personnel = “D” and Product = “D” and Physical

Appearance = “E” then Satisfaction = “Yes”.

Rule-3

R3: If (for) Consumer, Personnel = “D” and Product = “S” and Physical

Appearance = “G” then Satisfaction = “Yes”.

Rules R1 and R2 are nearly one and the same except for the variation in the

satisfaction levels on the physical appearance of the malls. Rules R1 and R3

states the importance of product as identified in the MUSA method.

In this way rough set theory was helpful to identify influence of reduct

attributes (3 P’s namely Personnel, Product, and Physical Aranceppe) on

consumer satisfaction and the effect of product attribute proved to be

significant. 4th P was omitted because of low weight.

155

Analysing consumer loyalty

Transaction data of about 120 consumers was considered and following

attributes of the consumer data set were taken into account:

1. Name 2. Gender 3. Age 4. Mobile number 5. Address 6. Average

yearly income 7. Category.

Normalised value for consumer loyalty and satisfaction was calculated as Xi=

(Xi – Xmin) / (Xmax– Xmin), where Xi is normalised value of consumer

loyalty/satisfaction, Xmin minimum number of visits and Xmax is maximum

number of visits. Sample consumer data taken for this study is shown in Table

5.22.

Table 5.22 - Sample data set transaction format

Gender Age Income Customer Satisfaction

Customer Loyalty

No.of consumers

M 41 to 50 30,000 ~35,000 0.75 0.75 42

M 21 to 25 25,000 ~30,000 0.65 0.84 45

F 25 to 42 21,5000~25,000 0.77 0.69 18

M 26 to 32 20,000 ~26,500 0.51 0.84 15

156

Data preprocessing was done in order to clear noisy and inconsistent data.

Based on grid partition method, consumer satisfaction and consumer loyalty

were obtained from fuzzy values.

On minimum confidence (min_conf, equation 5) of 30% and support (min_sup,

equation 4) of 40% the following “if…then… rules” were derived where CF(R)

is certainty grade of R and R is called fuzzy classification rule.

1. IF (Age >= 41 AND Gender = “M” AND Average yearly income = 30K ~

35K AND Customer satisfaction = “VS” AND Customer loyalty = “H”)

THEN CF = 0.77.

2. IF (Age = 26 ~30 AND Gender = “F” AND Average yearly income = 20K

~ 25K AND Customer satisfaction = “VS” AND Customer loyalty = “M”)

THEN CF = 0.66.

3. IF (Age = 21 ~ 25 AND Gender = “M” AND Average year income = 26K

~ 30K AND Customer satisfaction = “VS” AND Customer loyalty = “M”)

THEN CF = 0.62.

157

4. IF (Age = 21 ~ 25 AND Gender = “M” AND Average year income = 20K

~ 25K AND Customer satisfaction = “S” AND Customer loyalty = “H”)

THEN CF = 0.58.

5. IF (Age = 36 ~ 40 AND Gender = “F” AND Average year income = 20K

~ 25K AND Customer satisfaction = “S” AND Customer loyalty = “M”)

THEN with CF = 0.49.

From the above rules it was identified that consumer loyalty varied between

“High” and “Medium” for different income groups of “Male” and “Female”.

Since overall satisfaction is good there is no low level consumer loyalty in

either case. This consumer loyalty placed Reliance fresh on a competitive

advantage over the other private super markets like Spencer’s, More etc. of

Hanamkonda City.

Analysing Consumer Service

In order to extend a better consumer service opinion of 120 consumers was

taken in Consumer Service Survey form shown in 4.4. The opinion of these

consumers is given in the Table 5.23.

158

Table 5.23 Opinion of consumer service survey

SNO Customer Service

No. of respondents

%

1 Excellent 42 35

2 Good 45 38

3 Average 18 15

4 Fair 10 8

5 Poor 5 4

Study of consumer service survey given in Table 5.23 revealed the following

facts:

1. 35% of consumers expressed that service was Excellent, 38% as Good,

15% as Average, 8% as Fair, and 4% as Poor.

2. Since major consumers (73%) opined that service extended was

Excellent/Good, it revealed that consumers were very much contented with

the existing service.

By using Bayesian classification (Bayes’ theorem) it was determined that

“Media mix” was also useful to extend online consumer support. This is

attributed to the fact more than 75% consumers had e-mail IDs and more than

90% consumers had mobile numbers.

159

From the initial studies (objective 1) it was observed that 62% preferred e-mail

as better communication channel. From Bayes theorem it was estimated that

75% of the consumers who preferred e-mail as communication channel also

opted it for online consumer support (0.75 * 0.62 / (0.75 * 0.62 + 0.25 *0.62) =

0.465/ (0.465+0.155) =0.465 /0.62 =75%). Similar is case mobile phones.

Hence it was concluded that online support can as well be extended to

consumers using “Media mix”.

By employing survey methods and applying various data mining techniques of

business intelligence like k-means clustering, Apriroi algorithm, association

rules, rough set theory, fuzzy logic and Bayes’ theorem objectives of this

research were analysed.

160

6. CONCLUSIONS AND FUTURE WORK

6.1. Conclusions

In this research an attempt was made to study the influence of business

intelligence (BI) techniques in Consumer Relation Management (CRM). Data

mining techniques of business intelligence were productive in understanding

the consumer buying behaviors and also to determine the consumer

satisfaction and loyalty towards the enterprise. By applying data mining

techniques of business intelligence it is concluded that better decisions are

made to improve the businesses.

This research also established that two major platforms namely .NET and

J2EE are immensely useful in building customized CRM software. User’s

opinion on these platforms ascertained that they are satisfied with the existing

features of these platforms but demanded improvements. Their average

satisfaction and demanding index revealed this fact.

The findings of this research are summarized as follows:

1. Preferred communication “Media mix” of the consumers was in the order -

e-mails (63%), face to face (49%), SMS (43%), live chat (25%) and news

161

letter (16%). Inclination towards e-mails and SMS is attributed to the

growth in use of internet and mobile phones.

2. A comparative study of .NET and J2EE platforms revealed that 59% of the

users of these platforms favoured the use of .NET and 63% favored J2EE

for building CRM software. This increased use of J2EE is attributed to the

growing popularity of free and open source software (FOSS) and J2EE

being one among them. User friendliness of .NET over J2EE makes the

users also to prefer .NET equally with J2EE.

3. Extensive study of buying behaviors is essential to find consumer

requirements and opportunities that facilitate increase in terms of profit

margins, revenues, buying and selling. For this, consumer segmentation,

association rules and identifying frequently bought itemsets are important.

Weka data mining tool was very useful for consumer segmentation using

k-means algorithm and also in finding frequently bought itemsets using

Apriori algorithm.

To identify opportunities that facilitate increase in terms of profit margins,

revenues, buying and selling extensive study of consumer satisfaction on

the enterprise was essential. MUSA method and rough set theory were

found valuable in exploring the consumer satisfaction.

162

Average global consumer satisfaction index was approximately 90%, while

company’s performance according to whole set of satisfaction criteria

(Product, Personnel, Physical Appearance and Place – 4Ps) varied

between 86% and 92%. Because of the high competitive conditions of the

market this performance could not be considered relatively high. Most

important criterion with a significant importance level of 45.2% was the

“product”. Consumers did not consider important rest of criteria.

Average demanding index of 73% indicated that consumers demand more

improvement in the business process of the company. Higher value of

demanding index indicates that consumer’s expect a better business

process than the existing. Hence the business process must be improved

further to make the consumer more satisfied.

4. Fuzzy set theory and grid partition methods were useful in determining the

consumer loyalty.

High consumer loyalty puts an enterprise in a competitive advantage over

similar enterprises in city. Consumer loyalty, good consumer service and

support help in consumer retention. This retention helps the enterprise to

calculate consumer life time value with which better decisions are made to

improve the profitability of the company.

163

Consumers were satisfied with the present services and support extended

to them. Advancements in information technology are forcing enterprises

and consumers to explore new opportunities in online selling and buying

(trends are growing towards e-business and e-CRM).

The above conclusions are liable to change from time to time and place to

place since business process is highly agile in nature.

Finally to conclude, in this research work, attempt to implement business

intelligence techniques in CRM to attain profitability of the enterprise and to

position it as a competitive advantage had proved worthwhile and productive.

6.2. Limitations of the study:

The main limitation in this study was consumer segmentation using k-means

algorithm. Ideal value of k ranges between 2 to 10. But for large databases the

efficiency of k-means algorithm got reduced due to the frequent calculation of

euclidean distances to form new clusters. This was overcome by the use of

scapegoat tree and max heap which resulted in a new algorithm namely

KCUSTMH algorithm.

164

Use of scapegoat tree also has its limitation. In contrast to self-balancing

search trees, scapegoat trees are entirely flexible as to their balancing. They

support any value of α such that 0.5 < α < 1. High α value results in fewer

balances, making insertion quicker but lookups and deletions slower, and vice

versa for low values α. Therefore in practical applications, α is chosen

depending on how frequently these actions should be performed.

6.2. Future Work and Suggestions

1. Consumer satisfaction analysis is effective if study is based on marketing

mix and their individual factors.

2. Attempt is also made to study effect of physical attributes of consumers

such as height, weight, body color, blood group, eye color, hair style on

their buying behaviours and profitability of company. Biometric and RFID

devices are useful for collecting such data.

3. Future information technology relies on advanced technologies like

analytics, big data, android platform etc. Future studies are aimed to

implement these concepts in CRM and business intelligence.

165

4. Consumer data is growing day by day. For databases involving big data,

mining algorithms such as Balanced Iterative Reducing and Clustering

using Hierarchies (BIRCH), Clustering Using Representatives (CURE), will

be effective. They overthrow benefits of k-means clustering. Hence future

studies may be proposed to implement and improve their efficiencies.

166

7. BIBLIOGRAPHY / REFERENCES

[1] AgrawaI. R. and Srikant. R., “Mining sequential patterns”, proceedings

of International. Conference on Data Engineering, Taipei, Taiwan, 1995.

[2] AgrawaI. R., Imielinski. T., Swami. A. , “Mining association rules

between sets of items in large databases”, Proc. ACM-SIGMOD,

International Conference on Management of Data, pp. 207-216,

Washington, D.C., May 1993.

[3] Agrawal. R. , Srikant. R., “Fast Algorithms for Mining Association Rules

in Large Databases”, Proceedings of the 20th International Conference

on Very Large Data Bases, (VLDB'94), pp. 478-499, 1994.

[4] Bagirov.A.M., Mardaneh.K., “Modified global k-means algorithm for

clustering in gene expression datasets”, WISB’06, Australian Computer

Society, Inc., Darlinghurst, Australia, pp.23–28, 2006.

[5] Bentley.J.L, “Multidimensional Binary Search Trees Used for

Associative Searching”, Comm. ACM, vol. 18, pp. 509-517, .1975.

[6] Chris Rygielski, Jyun-Cheng Wang, David Yen. C.,“Data Mining

Techniques For Customer Relationship Management”, Issue 2, Vol., 24,

, Elsevier Science Ltd., ISSN: 0160-791X, pp483-502, 2002.

[7] Christian Borgelt,, “Efficient Implementations of Apriori and Eclat”,

proceedings 1st IEEE ICDM Workshop on Frequent Item Set Mining

Implementations(FIMI) Melbourne, pp 1-9, 2003.

167

[8] Corazza .M., Funari. S., Gusso. R., “An evolutionary approach to

preference disaggregation in a MURAME-based credit scoring

problem”, ISSN: 2239-2734, 2012.

[9] E.W.T. Ngai , Li Xiu, D.C.K. Chau, “Application of data mining

techniques in customer relationship management: A literature review

and classification, Expert Systems with Applications”, Issue 36,

,Elsevier Ltd. ISSN: 0957-4174, 2009.

[10] Gangadhara Rao. N.V.B., Sirisha Aguru ,“A Hash based Mining

Algorithm for Maximal Frequent Item Sets using Double Hashing”,

Journal of Advances in Computational Research: An International

Journal, Vol. 1 No. 1-2, pp1-6, 2012.

[11] Grigoroudis.E. ,. Siskos.Y. Christina Diakaki , “Preference

Disaggregation For Measuring And Analysing Customer Satisfaction:

The MUSA Method”, European Journal of Operational Research, pp 1-

41, 2001.

[12] Habul. A., “Business intelligence and customer relationship

management”, IEEE Conference Publications, ISSN: 1330-1012, pp

169 – 174, 2010.

[13] Han Jiawei , Yongjian Fu, “Discovery of Multiple-Level Association

Rules from Large Databases”, Proceedings of the 21st VLDB

Conference, Zurich, Switzerland, pp 420-431, 1995.

[14] Han. J., Pei. J,Yin. Y., “Mining Frequent Patterns without Candidate

168

Generation”, ACM SIGMOD, pp. 1-12, 2000.

[15] Han.J., Kamber.M., ”Data Mining Concepts and Techniques”, Morgan

Kaufmann Publishers, San Francisco, 2006.

[16] Hong Tzung Pei, Huang Tzu Jung ,Chang Chao Sheng , “Mining

Multiple-level Association Rules Based on Pre-large Concepts, Data

Mining and Knowledge Discovery in Real Life Applications”, ISBN 978-

3-902613-53-0, pp. 438, 2009.

[17] Hsieh Nan-Chen , Chu Kuo-Chung ,” Enhancing Consumer Behavior

Analysis by Data Mining Techniques”, International Journal of

Information and Management Sciences, Vol.17, No. 2, pp 39-53, 2009.

[18] Imielinski. T. , Mannila. H,” A database perspective on knowledge

discovery”, Communications of ACM., 39:58-64, 1996.

[19] Iqbal Asad, Ullah Naeem, “J2EE vs. Microsoft.NET- A Comparison of

two platforms for component-based development of web applications”,

pp 1-60, 2010.

[20] Isakki P. , Rajagopalan .S.P., “Mining Unstructured Data using

Artificial Neural Network and Fuzzy Inference Systems Model for

Customer Relationship Management”, IJCSI International Journal of

Computer Science, Vol. 8, Issue 4, No. 1, ISSN: 1694-0814, pp 630-

634, 2011.

[21] Isakki.P. , Rajagopalan. S.P., “Analysis of Customer Behavior using

Clustering and Association Rules”, International Journal of Computer

Applications, Vol. 43, No. 23, , ISSN: 0975 – 8887, pp 19-26, 2012.

169

[22] Ishibuchi. H., Nakashima.T., Yamamoto. T., “Fuzzy association rules

for handling continuous attributes,” proceedings of IEEE International

Symposium on Industrial Electronics, Pusan, Korea, pp.118-121, 2001.

[23] Ishibuchi. H., Yamamoto. T., Nakashima,.T.,“Fuzzy data mining: effect

of fuzzy discretization”, proceedings of the 1st IEEE International

Conference on Data Mining, San Jose, USA, pp.241-248, 2001.

[24] Ishibuchi.H, Nakashima.T., Murata.T., “Performance evaluation of

fuzzy classifier systems for multidimensional pattern classification

problems”, IEEE Transactions on Systems, Man, and Cybernetics, Vol.

29, no. 5, pp.601-618, 1999.

[25] Ishibuchi.H., Nozaki. K., Yamamoto. N., Tanaka.H, “Selecting fuzzy if-

then rules for classification problems using genetic algorithm”, IEEE

Transactions on Fuzzy Systems, Vol. 3, No. 3, pp.260-270, 1995.

[26] Jain. A. K, Murty. M. N., and Flynn. P. J., “Data Clustering: A Review,”

ACM Computing Survey, Vol. 31, No. 3, pp. 264-323, 1999.

[27] Joao Isabel M, Costa Carlos A Bana e, Figueria Jose Rui, “An

alternative to MUSA method for customer satisfaction analysis”, Vol.20

ISSN-1646-2955, pp 1-28, 2007.

[28] Jones. T. O., Jr. Sasser. W. E., “Why Satisfied Customer Defect,”

Harvard Business Review, Vol. 73, No. 6, pp. 88-99, 1995.

[29] Jong Soo Park, Ming-Syan Chen, Pilip S Yu, “An Effective Hash

Based Algorithm for Mining Association Rules”, ACM SIGMOD Record,

Vol. 24, Issue 2, pp: 175- 186, ISSN: 0163-5808, 1995.

170

[30] Kalra Shipra, Gupta Rachika, “Data Mining:A Tool for the

Enhancement of Banking Sector”, IFRSA, International Journal of Data

Warehousing & Mining, IIJDWM, Vol.1, pp.204-208, 2011.

[31] Kanungo. T., Mount. D.M., Netanyahu. N.S., Piatko. C., Silverman. R.,

Wu A.Y., ,” An efficient k-means clustering algorithm: Analysis and

implementation”, IEEE Transaction on Pattern Analysis and Machine

Intelligence,Vol.24, 2002.

[32] Khan Aurangzeb, Baharudin Baharum, Khan Khairullah,” Mining

Customer Data For Decision Making Using New Hybrid Classification

Algorithm”, Journal of Theoretical and Applied Information Technology

,Vol. 27, No. 1, ISSN: 1817-3195, pp 54-61, 2011.

[33] Krishnamurthy M., Kannan. A. , Baskaran .R., Deepalakshmi .R.,

“Frequent Item set Generation Using Hashing-Quadratic Probing

Technique” , “European Journal of Scientific Research ISSN 1450-216X

Vol.50 No.4 pp. 523-532, 2011.

[34] Kumar Rajeev, Puran Rajeshwar , Dhar Joydip, “Enhanced K-Means

Clustering Algorithm Using Red Black Tree and Min-Heap”,

International Journal of Innovation, Management and Technology, Vol.

2, No. 1, ISSN: 2010-0248, pp 49-54, 2011.

[35] Lan Guo-cheng, Hong Tzung-Pei , Tseng Vincent S. , “A Projection-

Based Approach For Discovering High Average-Utility Item sets”,

Journal of Information Science And Engineering, Vol.28, pp193-209,

2012.

171

[36] Lin D-I. , Kedem Z.M., “Pincer-Search: A New Algorithm For

Discovering The Maximum Frequent Set,” IEEE Transactions on

Knowledge and Data Engineering, , Vol. 14, No. 3,pp. 553-566, 2002.

[37] Ling Amy Poh Ai, Saludin Mohamad Nasir, Mukaidono Masao,

“Deriving Consensus Rankings via Multicriteria Decision Making

Methodology”, Emerald Journals Business Strategy Series, Volume 13,

Issue 1, ISSN: 1751-5637, pp 3-12, 2012.

[38] Liu Ying, Liao Wei-keng, Choudhary Alok , “A Two-Phase Algorithm

for Fast Discovery of High Utility Item sets”, LNAI 3518, Springer-Verlag

Berlin Heidelberg, pp. 689 – 695, PAKDD 2005.

[39] Liu.Y., Yang.B., “Research of an Improved Apriori Algorithm in Mining

Association Rules”,Journal of Computer Applications, vol. 27, pp. 418-

420, 2007.

[40] Lu Dai, Arun Kumar. S.,”Fuzzy Evaluation Model for Customer

Relationship Management”, International Journal of Emerging Trends in

Engineering and Development, Vol.7, Issue 2, ISSN: 2249-6149, pp

266-279, 2012.

[41] Mack Joun.,” An Efficient k-Means Clustering Algorithm, Analysis and

Implementation”, IEEE Transactions on Pattern Analysis And Machine

Intelligence, Vol. 24, No. 7., 2002.

[42] Meng Qingliang, Kong Qinghua, Han Yuqi, Chen Jie ,”Neural

Networks Based Integrated Evaluation Method for the Effectiveness of

CRM”, Proceedings of the Fourth International Conference on

172

Electronic Business (ICEB) / Beijing , pp 320-321, 2004.

[43] Miller Gerry, “The Web Services Debate, .NET vs. J2EE”,

Communications of the ACM, June,Vol. 46, No. 6 ,pp 64-67, 2003.

[44] Nikolaos.F, Matsatsinis.E., Ioannidou.E., Grigoroudis,“Customer

satisfaction using data mining techniques”, European Journal of

Operational Research, pp 1-4, 1999.

[45] Pawlak.Z,“Rough Set Theory and Its Applications,” Journal of

Telecommunications and Information Technology, Vol. 3, , pp. 7-10,

2002.

[46] Pillai Jyothi , Vyas .O.P. , “CSHURI – Modified HURI algorithm for

Customer Segmentation and Transaction Profitability”, International

Journal of Computer Science, Engineering and Information Technology

(IJCSEIT), Vol.2, No.2, pp 79-89, 2012.

[47] Pillai Jyothi, Vyas .O.P., “High Utility Rare Itemset Mining (HURI): An

approach for extracting highutility rare item sets”, i-manager’s Journal

on Future Engineering and Technology (JFET), ISSN Online: 2230-

7184, ISSN Print: 0973 – 2632, 2011.

[48] Rada Rexhep , Ruseti Bashkim, “Artificial Neural Networks in CRM”,

ICT Innovations, Web Proceedings, ISSN 1857-7288, pp 595-598,

2012.

[49] Rahman Zubair.A.M.J. Md., Balasubramanie.P. and Venkata

Krihsna.P., “A Hash based Mining Algorithm for Maximal Frequent

Itemsets using Linear Probing”. Info comp Journal of Computer Science

173

Vol.8, No.1,pp.14-19, 2009.

[50] Raorane Abhijit, Kulkarni .R.V., “Data Mining Techniques: A Source

For Consumer Behavior Analysis”, International Journal of Database

Management Systems, Vol.3,No.3, pp.45-56, 2011.

[51] Russell K.H., Ching, Chen Ja-Shen, Lin Yi-Shen, “A Proposed

Clustering Method for Customer Segmentation in CRM Practices”,

Journal of Business Research, Vol.44, No.2, pp.75-92., 2002.

[52] Samtani Gunjan, Sadhwani Dimple, “Web Services and Application

Frameworks (.NET and J2EE)” ,pp 1-4, 2004

http://www.nws.noaa.gov/oh/hrl/hseb/docs/ApplicationFrameworks.pdf.

[53] Saravanabhavan .C., Parvathi .R. M. S., “Utility FP-Tree: An Efficient

Approach to Mine Weighted Utility Itemsets”, European Journal of

Scientific Research,Vol.50 No.4 pp.466-480, 2011.

[54] Seddawy Bahgat El Ahmed, Moawad Ramadan, Dr. Hana Maha

Attia, “Applying Data Mining Techniques in CRM”, online publication of

Research article from AASTMT, pp 1-11, 2010.

[55] Selvi Kanimozhi.C.S., Tamilarasi.A., “Mining of High Confidence Rare

Association Rules”, European Journal of Scientific Research ISSN

1450-216X Vol.52 No.2 pp.188-194, 2011.

[56] Seno.M. ,. Karypis.G., “LPMiner: An Algorithm For Finding Frequent

Itemsets Using Length- Decreasing Support Constraint” ,IEEE ICDM, ,

pp. 505-512. 2001.

174

[57] Senthil Kumar.A.V. , Wahidabanu.R.S.D., “DHFI-tree mining: A new

approach for frequent itemset mining”, Advances in Computer Science

and Engineering (ACSE), Vol.2, No.2, pp 115-132, 2008.

[58] Senthil Kumar.A.V. , Dr.Wahidabanu.R.S.D., “Mining Frequent

Itemsets: Efficient Hashing and Tree-Based Approach”, International

Journal of Computer Science and Software Technology (IJCSST),

Vol.1, No.1, pp.1-7, January-June 2008.

[59] Senthil Kumar.A.V. , R.S.D. Wahidabanu, “An Effective Algorithm for

Mining Association Rules”, Journal of Computer Science, pp: 174-183,

Nov- Dec 2006.

[60] Senthil Kumar .A.V. , R.S.D. Wahidabanu, “Discovery of Frequent

Itemsets: Frequent Item Tree-Based Approach”, ITB Journal, ICT Vol. 1

C, No. 1, pp: 42-55, May 2007.

[61] Silvia Rissino , Germano Lambert Torres, “Rough Set Theory –

Fundamental Concepts, Principals, Data Extraction, and Applications,

Data Mining and Knowledge Discovery in Real Life Applications”, ISBN

978-3-902613-53-0, pp. 35-58, 2009.

[62] Siskos Yannis ,Grigoroudis Evangelos, “Measuring Customer

Satisfaction for Various Services Using Multicriteria Analysis”, Springer

US, International Series in Operations Research & Management

Science Volume 44, , ISSN:0884-8289, pp 457-482, 2002.

[63] Teng Shaohua, Su Jiangyu, Zhang Wei, Fu Xiufen, Chen Shuqing,

China. P. R.

175

“An Algorithm of Mining Frequent Itemsets in Pervasive Computing”,

proceedings of IEEE ICDM, pp.559-563, 2009.

[64] Tsiptsis Konstantinos, Chorianopoulos Antonios, “Data Mining

Techniques in CRM: Inside Customer Segmentation”, John Wiley &

Sons, Ltd, ISBN: 978-0-470-74397-3. pp.373, 2009.

[65] Vanitha.K. , Santhi.R., “Using Hash Based Apriori Algorithm To

Reduce The Candidate 2- Item sets For Mining”, Journal of Global

Research in Computer Science, Vol. 2, No. 5, ISSN-2229-371x, pp 79-

80, 2011.

[66] Wang Chien Hua ,Pang Chin Tzong , “Applying Fuzzy Data Mining for

an Application CRM”, Bulletin of Networking, Computing, Systems, and

Software, Vol. 1, No. 1, ISSN 2186–5140, pp 46–51, 2012.

[67] Wu Kun & Liu Feng ying, “Application of Data Mining in Customer

Relationship Management”, IEEE Conference Publications, ISBN: 978-

1-4244-5325-2, pp 1– 4, 2010.

[68] Yuan.F, Meng.Z.H, Zhang.H. X .and Dong .C. R. , “A New Algorithm to

Get the Initial Centroids,” Proc. of the 3rd International Conference on

Machine Learning and Cybernetics, pp. 26–29, 2004.

[69] Zadeh. L.A., “The concept of a linguistic variable and its application to

approximate reasoning”, Information Science (part 1), Vol.8, No.3,

pp.199-249, 1975.



176

List of Publications

International Journals

pp.301-357, b, 1975.



pp.43-80,. 1976.

[72] Zhang Limei ,”Data mining application in customer relationship

management”, IEEE Conference Publications, ISBN: 978-1-4244-7235-

2, pp 171 – 174, 2010.

[73] Zhang.T., Ramakrishnan. R., Livny, “BIRCH an efficient data

clustering method for very large databases”. ACM. SIGMOD, 1996.

[74] Zu Qiaohong, Wu Ting, Wang Hui, “A Multi-Factor Customer

Classification Evaluation Model”, Journal of Computing and

Informatics”, Vol. 29, No.24, pp 509–520, 2010.

[75] Vijayarani..S., Ms.Sathya.P., “An Efficient Algorithm for Mining

Frequent Items in Data Streams”, International Journal of Innovative

Research in Computer and Communication Engineering Vol. 1, Issue 3,

ISSN : 2320 – 9798, pp 742-747, 2013.

[76] “A comparison of J2EE and .NET as platforms for teaching Web

services”, Proceedings of IEEE Conference on Frontiers in Education,

(FIE) 34th Annual, Vol. 3 pp 1- 17, ISSN : 0190-5848,2004.

177

[1] Narendra Kumar .V. V., Dr. RSD Wahidabanu., “Customer Relationship

Management on J2EE and .NET using Business Intelligence (A

comparative Study on J2EE and .NET Platforms on various parameters

and features)”, International Journal of Datamining Emerging

Technologies, Vol.2, No.1, ISSN: 2249-3212 pp 41-48, 2012.

[2] Narendra Kumar .V. V., Dr. RSD Wahidabanu.,” Customer Relationship

Management with J2EE and .NET using Business Intelligence”,

International Journal of Advanced Research in Computer Science and

Applications, Vol.1,Issue 3, September 2013, ISSN 2321-872X, 2013.

[3] Narendra Kumar .V. V., Dr. RSD Wahidabanu.,”The role of Business

Intelligence with support of J2EE or .NET in consumer relation

management”, International Journal of Datamining Emerging

Technologies,Vol.3, Issue 1,pp 1-15, Print ISSN: 2249-3212, Online ISSN :

2249-3220,2013.

National Journals

[1] Narendra Kumar .V. V. ,“Datamining in various disciplines of

Management”, The Osmania Journal of Management, Vol.V, No.12, ISSN

No. 0976-4208, pp 77-89, 2009.

[2] Narendra Kumar .V. V., “Business Intelligence- A new vision for HR”,

NSHM Journal of Management Research & Application, NJRMA, Vol. 1,

ISSN 0975-2510, pp 72-76, 2009.

178

[3] Narendra Kumar .V. V., “Year wise Dasavatars of Business Intelligence

through BI2.0”, PRATIBHIMBA, the journal of IMIS Vol.8, No.1, ISSN-

0079-2541, pp 49-60, 2008.

Conference papers

[1] Narendra Kumar.V.V., “Customer Satisfaction Using Data mining”

Proceedings of International Conference held at Sai Ram Institutions (Sri

Sai Ram Engineering College), 2007.

[2] Narendra Kumar .V. V., “Data Mining – Visualization Tools (Clementine

work Bench)” Proceedings of the National Conference ETA-2005 , CS-

Dept-Saurastra University, Rajkot, jointly with Amoghasidhi Educational

Society, Sangli, 2005.

[3] Narendra Kumar .V. V., “Clustering (k-means clustering) using Clementine

workbench”, Proceedings of the National Conference on “Recent Trends in

Data Mining and its Applications (NCDMA-2006)” ,Department of Computer

science and Engineering, Faculty of Engineering and Technology,

Annamalai University, 2006.

[4] Narendra Kumar .V. V., “Data cleansing using Oracle ware house &

Managing the knowledge workers: SAPTHASUTHRAS”, National

conference on Business Intelligence ,Dept of Computer Science and

Commerce, P.B. Siddartha college of Arts & Science, Vijayawada and

sponsored by UGC, New Delhi, 2007.

179

[5] Narendra Kumar .V.V., “Banking upon Business Intelligence in Banks”,

National Conference on “Organization and working of Financial Sector in

India”, Alluri Institute of Management Sciences, Warangal.A.P, 2010.

CONSUMER RELATION MANAGEMENT OF .NET AND J2EE USING ... fileSALEM, TAMILNADU, INDIA February 2014 ....

Documents

Transcript of CONSUMER RELATION MANAGEMENT OF .NET AND J2EE USING ... fileSALEM, TAMILNADU, INDIA February 2014 ....