Rajia cluster analysis
-
Upload
college-of-fisheries-kvafsu-mangalore-karnataka -
Category
Education
-
view
43 -
download
4
Transcript of Rajia cluster analysis
REHANA RAJ
DFK1307
DEPT OF FISH PROCESSING TECHNOLOGY
COLLEGE OF FISHERIES
MANGALORE
CLUSTER ANALYSIS
Cluster Analysis is a multivariate statistical techniques
in which large data set is segregated into several
groups based on homogeneity or similarity measures
Cluster Analysis make sensible and informative
classification of an initially unclassified set of data
with desired accuracy, using the variable values
observed on each individual
It saves lot of resource in terms of time, money etc
Before clustering After clustering
To assign observations to groups (‘clusters’)
To divide the observations into homogenous and
distinct groups
To reduce the complexity of data
Generates several groups of data set which are similar
Homogeneous within the group and as much as
possible heterogeneous to other groups
Normally, data consists of objects or persons
Segregation is done based on more than two
variables.
Hierarchical Clustering
Centroid-based clustering
Distribution-based clustering
Density-based clustering
Hierarchical clustering is a method of cluster analysis which
seeks to build a hierarchy of clusters.
Two types:
Agglomerative (bottom-top):
◦ Start with each document being a single cluster.
◦ Eventually all documents belong to the same cluster.
Divisive (top-bottom):
◦ Start with all documents belong to the same cluster.
◦ Eventually each node forms a cluster on its own.
No. of clusters need not be k.
Construction of a tree-based hierarchical diagram
usually called dendrogram. E.g., In case of taxonomy
classificationanimal
vertebrate
fish reptile amphib. mammal worm insect crustacean
invertebrate
In this clustering, clusters are
represented by a central
vector, which may not
necessarily be a member of
the data set.
Aims to partition on
observations into k clusters.
Each observation belongs to
the cluster with the nearest
mean.
Here, the no. of clusters is
fixed to k(k-means clustering)
Clusters can be defined as objects belonging to same
distribution.
It provides correlation and dependence of attributes.
Clusters are based on density.
Objects in these sparse areas - that are required to separate
clusters - are usually considered to be noise and border
points.
The most popular density based clustering method is
DBSCAN (density-based spatial clustering of applications
with noise).
OPTICS (Ordering Points To Identify the Clustering
Structure) is a generalization of DBSCAN that handles
different densities much better way.
Density-based clustering
with DBSCAN.
DBSCAN assumes clusters of
similar density, and may have
problems separating nearby
clusters
OPTICS is a DBSCAN variant
that handles different densities
much better
1. Forming the clusters from the given data set – resulting
in a new variable that identifies cluster members among
the cases (one phase cluster)
2. Description of clusters by re-crossing with the data
(Two phase cluster)
FISH CUTLET
FISH FINGER
FISH BURGER
VALUE ADDED
PRODUCTS
One phase cluster
Forming of clusters by the chosen data set
FISH CUTLET
Seer fish Mackerel
Baked Fried
Two phase cluster
Third phase cluster
Cuts down the cost of preparing a sampling frame and other administrative factors.
No special scales of measurement necessary
Visual graphic provides clear understanding of the clusters.
Disadvantages:
Choice of cluster-forming variables often not based on
theory but at random
In some cases, determination of clusters is difficult to decide.
Advantages :
Marketing: Help marketers to discover distinct groups in their
customer bases, and then use this knowledge to develop targeted
marketing programs
Land use: Identification of areas of similar land use in an earth
observation database
Insurance: Identifying groups of motor insurance policy holders
with a high average claim cost
City-planning: Identifying groups of houses according to their
house type, value, and geographical location
Earth-quake studies: Observed earth quake epicenters should be
clustered along continent faults
for your kind attention!