Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella...
-
Upload
patrick-collins -
Category
Documents
-
view
215 -
download
0
Transcript of Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella...
![Page 1: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/1.jpg)
Clustering under Clustering under Constraints with Constraints with
Genetic AlgorithmsGenetic Algorithms
by by
Albert Ali Salah Albert Ali Salah
Stanislav Redman Stanislav Redman
Gabriella KovacsGabriella Kovacs
![Page 2: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/2.jpg)
OutlineOutline• Definition of the problem• Background on genetic algorithms• Case study: Workgroup assignment• Results
![Page 3: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/3.jpg)
Clustering under ConstraintsClustering under Constraints• N multi-dimensional data items • A bunch of soft constraints• (A bunch of hard constraints)• The problem: Clustering the data
points so that the hard constraints are satisfied, and the soft constraints are optimized.
![Page 4: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/4.jpg)
Constrained ClusteringConstrained Clustering• Constrained clustering is an unsupervised
learning technique, where some data items are known to be in the same cluster, and some are known to be in different clusters.
• Clustering under constraints is an optimization problem (I saw Karp in the elevator, and he said it’s probably NP-complete)
![Page 5: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/5.jpg)
Genetic AlgorithmsGenetic Algorithms
• A GA is essentially a heuristic random search tool
• Has no rigorous mathematical principle, no one knows why it works
• Used frequently in soft constraint optimization, rarely in clustering
![Page 6: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/6.jpg)
Details You All KnowDetails You All Know• Solutions are ‘coded’ into simple, DNA-
like structures called chromosomes• A fitness function is supplied to evaluate
the quality of solutions• The algorithm works on a population of
individuals• There is a Genetic Algorithm package
written for the object-oriented Dolphin Smalltalk environment
![Page 7: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/7.jpg)
Genetic Algorithm FlowchartGenetic Algorithm Flowchart
Initial Population
End CriteriaReached?
Selection Cross-over
Mutation
New Population
No
YesOutput Best
Individual
![Page 8: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/8.jpg)
Case Study: Santa FeCase Study: Santa Fe• Aim: Cluster people such that:
– Groups are balanced in number of students
– Each group consists of people with similar interests
– Each group has some people with basic skills
– Each group possesses enough knowledge in its areas of interest
![Page 9: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/9.jpg)
Problem 1: RepresentationProblem 1: Representation• A good GA representation is:
– unambiguous– short (k bits means 2k search space)– smooth with respect to fitness
landscape– robust to mutations– free of preferential bias– simple to decode
![Page 10: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/10.jpg)
• 01101001010010101001010…
• 01101001010010101001010…
Representation Representation
Three bits code the group number
The position indicates the student number
1 2 3 4…
![Page 11: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/11.jpg)
Problem 2: FitnessProblem 2: Fitness• A good fitness function is:
– between 0 (awful) and 1 (optimal)– a correct ordering of individuals with
respect to their closeness to the optimal solution
– informative, and indicative of relative fitness
– pragmatic about the boundary conditions– simple and fast to calculate
![Page 12: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/12.jpg)
Composite FitnessComposite Fitness• Assume there are n different, possibly
independent fitness criteria. Let f1, f2,… ,fn be the individual fitness functions that order the solutions according to individual criteria. The total
fitness function is
where i are coefficients to be determined
![Page 13: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/13.jpg)
N : number of students
M : number of groups
S : number of interests
pi : interest vector of student i
gj : mean interest vector of group j
ij : Kronecker delta
ff11 : Interest Term : Interest Term
SN
SN
fij
N
i
M
jjigp
9
)(91 1
2
1
![Page 14: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/14.jpg)
Problem with Problem with ff11
• 9SN is a too big normalization factor, all decent individuals (with small distances from the mean) will have f1 very close to 1.
• General Solution:
replace with dist
distdist
max
max distaveragez _
SN
ij
N
i
M
j jigp
f
1 1
2)(
1 8.0
![Page 15: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/15.jpg)
ff22 : Balance Term : Balance Term
N : number of students
M : number of groups
ni : number of students in group j
N
nNM
jj MN
f 2
1
22
2
)(
![Page 16: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/16.jpg)
M : number of groups
B : number of basic skills
bik: kth skill of student i
ij : Kronecker delta
ff33 : Basic Skills Term : Basic Skills Term
MB
bMB
f
M
j
B
k iijik
9
))max(arg4(91 1
2
3
![Page 17: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/17.jpg)
M : number of groups
S : number of interests
hik: kth knowledge term of student i
ij : Kronecker delta
jk: 1 if kth interest term is among the first
three interests of group j, 0 otherwise.
ff44 : Knowledge Term : Knowledge Term
M
hM
f
M
j
S
k ijkijik
27
))max(arg4(271 1
2
4
![Page 18: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/18.jpg)
GA parametersGA parameters• Population size: 100• Generations: 30• Crossover probability: 0.4 (single
point)• Mutation probability: 0.001• Equal coefficients
![Page 19: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/19.jpg)
Some entertaining Some entertaining facts about the datasetfacts about the dataset
![Page 20: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/20.jpg)
Basic skillsBasic skillsAverage Experts Beginners
Mathematics 2.83 9 4
Programming 2.75 14 11
English 3.10 19 1
Statistics 2.87 8 1
![Page 21: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/21.jpg)
InterestsInterests
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
Self-organization
Computer Science
Multi-Agent Systems
Evolution
Biology
Neural Nets & Simulation
Information Theory
Economics
Optimization
Cognitive Science
Physics
Social Networks
Psychology
Neuroscience
Philosophy
Anthropology
Quantum Consciousness
![Page 22: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/22.jpg)
KnowledgeKnowledge
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Computer Science
Evolution
Physics
Optimization
Multi-Agent Systems
Neural Nets & Simulation
Biology
Self-organization
Information Theory
Economics
Philosophy
Cognitive Science
Psychology
Social Networks
Neuroscience
Anthropology
Quantum Consciousness
![Page 23: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/23.jpg)
TOP 10 knowledge-seeking peopleTOP 10 knowledge-seeking peopleIrina
Anton
Mourad
Zoltan
Anukool
Angel
Lyudmila
Mianlai
Aaron
Arthur
![Page 24: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/24.jpg)
TOP 10 knowledgeable peopleTOP 10 knowledgeable peopleAnton
Louise
Arndt
Angel
Suzanne
Mark
Nilanjana
Wojciech
Albert
Aaron
![Page 25: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/25.jpg)
Some serious resultsSome serious results
![Page 26: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/26.jpg)
Clustering of interest vectors withClustering of interest vectors with
• Nearest neighbor• Furthest neighbor• Average linkage• Ward linkage
![Page 27: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/27.jpg)
Nearest neighborNearest neighborsrsjri njnixxdistsrd :1,:1)),,(min(),(
FITNESS TERMS: 0,37352071 0,847012823 0,722222222 0,916006652
![Page 28: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/28.jpg)
GROUP 1: Natalia, Nilanjana, Angel, Arndt, Alexander, Wojciech, Frederic, Jason, Gerard, Ferenc, Sergey, Milica, Zoltan, Bartlomiej, Aaron, Pau, Sergey, Jasper, Matthew, Mark, Eva, Volodymyr, Victor, Oleksiy, Anukool, Hilary, Lyudmila, Alex, Vaclav, Anton, Mourad, Nicholas, Arthur, Carolyn, Stanislav, Denis, Suzanne, Albert, Lisa, Vadim, Pavel, Sergiy, Valentin, Mianlai, Gordan
Interests: Self-organization (2,98) Evolution (2,8) Computer Science (2,78)
GROUP 2: LouiseInterests: Anthropology (4) Biology (4) Cognitive Science (4)
GROUP 3: Tatyana Interests: Cognitive Science (4) Computer Science (4) Information Theory (4)
GROUP 4: Gabriella Interests: Computer Science (4) Information Theory (4) Optimization (4)
GROUP 5: Ana-MariaInterests: Social Networks (4) Cognitive Science (3) Multi-Agent Systems (3)
GROUP 6: Angelica Interests: Cognitive Science (4) Computer Science (4) Multi-Agent Systems (4)
GROUP 7: ChristopheInterests: Cognitive Science (4) Neural Nets & Simulation (4) Psychology (4)
GROUP 8: Irina Interests: Cognitive Science (4) Computer Science (4) Information Theory (4)
![Page 29: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/29.jpg)
Furthest neighborFurthest neighborsrsjri njnixxdistsrd :1,:1)),,(max(),(
FITNESS TERMS: 0,926035503 0,887127441 0,958333333 0,964728892
![Page 30: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/30.jpg)
GROUP 1: Hilary, Angel, Mark, Mourad, Jason Interests: Psychology (3,8) Evolution (3,6) Anthropology (3,2)
GROUP 2: Bartlomiej, Louise, Alexander, Matthew, Valentin, Angelica, VictorInterests: Evolution (3,43) Multi-Agent Systems (3,29) Social Networks (3,29)
GROUP 3: Suzanne, Aaron, Alex, Arndt, WojciechInterests: Evolution (3,57) Biology (3,2929) Self-organization (3,14285714)
GROUP 4: Lisa, GerardInterests: Social Networks (4) Cognitive Science (3) Multi-Agent Systems (3)
GROUP 5: Sergiy, Albert, ChristopheInterests: Information Theory (2,625) Physics (2,625) Self-organization (2,625)
GROUP 6: Natalia, Nilanjana, Lyudmila, Vaclav, Anton, Frederic, Arthur, Ferenc, Stanislav, Milica, Denis, Sergey, Jasper, Pavel, Mianlai, Volodymyr, Gabriella, Oleksiy,
AnukoolInterests: Cognitive Science (4) Computer Science (4) Multi-Agent Systems (4)
GROUP 7: Pau, Vadim, Ana-Maria, Eva, Nicholas, Sergey, GordanInterests: Cognitive Science (3,33) Neural Nets & Simulation (3,33) Biology (3)
GROUP 8: Irina, Zoltan, Tatyana, CarolynInterests: Quantum Consciousness (3,75) Cognitive Science (3,5) Computer Science (3,5)
![Page 31: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/31.jpg)
Average linkageAverage linkage
r sn
i
n
jsjri
sr
xxdistnn
srd1 1
),(1
),(
FITNESS TERMS: 0,821745562 0,879219281 0,902777778 0,951247491
![Page 32: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/32.jpg)
GROUP 1: Natalia, Nilanjana, Angel, Wojciech, Frederic, Jason, Ferenc, Milica, Aaron, Sergey, Jasper, Mark, Volodymyr, Gabriella, Oleksiy, Hilary,
Lyudmila, Vaclav, Anton, Mourad, Arthur, Stanislav, Denis, Suzanne, Pavel, Mianlai
Interests: Self-organization (3,15) Multi-Agent Systems (3,04) Computer Science (3)
GROUP 2: AnukoolInterests: Computer Science (4) Neuroscience (4) Optimization (4)
GROUP 3: Bartlomiej, Lisa, Alexander, Matthew, Valentin, Gerard, Victor Interests: Evolution (3,57) Biology (3,29) Self-organization (3,14)
GROUP 4: Ana-MariaInterests: Social Networks (4) Cognitive Science (3) Multi-Agent Systems (3)
GROUP 5: Pau, Alex, Arndt, Vadim, Eva, Nicholas, Sergey, Gordan Interests: Information Theory (2,625) Physics (2,625) Self-organization (2,625)
GROUP 6: Angelica, LouiseInterests: Cognitive Science (4) Computer Science (4) Multi-Agent Systems (4)
GROUP 7: Sergiy, Albert, ChristopheInterests: Cognitive Science (3,33) Neural Nets & Simulation (3,333) Biology (3)
GROUP 8: Irina, Zoltan, Tatyana, Carolyn Interests: Quantum Consciousness (3,75) Cognitive Science (3,5) Computer Science (3,5)
![Page 33: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/33.jpg)
Ward linkageWard linkage)/()(),( 2
, srsrsr nnxxdistnnsrd
FITNESS TERMS: 0,968195266 0,891630074 0,972222222 0,965034915
![Page 34: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/34.jpg)
GROUP 1: Lisa, Alex, Arndt, Frederic, GerardInterests: Self-organization (3,6) Biology (3,4) Evolution (3,4)
GROUP 2: Pau, Vadim, Ana-Maria, Eva, Nicholas, Sergey, Gabriella, GordanInterests: Physics (2,625) Self-organization (2,625) Computer Science (2,5)
GROUP 3: Bartlomiej, Matthew, Valentin, AlexanderInterests: Economics (3,25) Evolution (3,25) Biology (3)
GROUP 4: Louise, Mianlai, Volodymyr, Victor, Angelica Interests: Computer Science (4,) Multi-Agent Systems (4,) Self-organization (3,8)
GROUP 5: Sergiy, Albert, ChristopheInterests: Cognitive Science (3,33) Neural Nets & Simulation (3,33) Biology (3)
GROUP 6: Stanislav, Natalia, Denis, Sergey, Vaclav, Anton, Pavel, Ferenc, Milica, OleksiyInterests: Computer Science (3,4) Neural Nets & Simulation (3,4) Economics (3,3)
GROUP 7: Irina, Zoltan, Tatyana, Carolyn Interests: Quantum Consciousness (3,75) Cognitive Science (3,5) Computer Science (3,5)
GROUP 8: Hilary, Lyudmila, Nilanjana, Angel, Wojciech, Mourad, Jason, Arthur, Suzanne, Aaron, Jasper, Mark, Anukool
Interests: Biology (3,38) Evolution (3,38) Self-organization (3,23)
![Page 35: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/35.jpg)
FITNESS TERMS:0,988905325 0,845403674 0,989583333 0,981469795
GROUP 1 Self-organization (4) Neural Nets & Simulation (3,6) Physics (3,4) Arndt, Tatyana, Mianlai, Sergey, Zoltan
GROUP 2 Computer Science (2,56) Neural Nets & Simulation (2,56) Evolution (2,44) Denis, Pau, Alex, Ana-Maria, Lisa, Vadim, Sergiy, Eva, Milica
GROUP 3 Computer Science (3,1) Multi-Agent Systems (3,1) Self-organization (2,9) Stanislav, Natalia, Nilanjana, Gordan, Mourad, Gerard, Ferenc, Victor, Valentin, Oleksiy
GROUP 4 Self-organization (3,43) Evolution (3,14) Psychology (3) Suzanne, Lyudmila, Angel, Wojciech, Mark, Anton, Nicholas
GROUP 5 Cognitive Science (3) Biology (2,83) Evolution (2,67) Christophe, Aaron, Hilary, Albert, Alexander, Frederic
GROUP 6 Economics (3,33) Self-organization (3) Computer Science (2,67) Bartlomiej, Sergey, Jasper, Vaclav, Pavel, Gabriella
GROUP 7 Biology (3,75) Evolution (3,5) Self-organization (3,5) Matthew, Angelica, Louise, Arthur
GROUP 8 Computer Science (3,2) Information Theory (3,2) Philosophy (3,2) Anukool, Irina, Jason, Volodymyr, Carolyn
![Page 36: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/36.jpg)
Comparison of resultsComparison of results
Nearest Neighbour Furthest Neighbour Average Linkage Ward Linkage GABalance 0,37 0,93 0,82 0,97 0,99Interests 0,85 0,89 0,88 0,89 0,85Basic Skills 0,72 0,96 0,90 0,97 0,99Knowledge 0,92 0,96 0,95 0,97 0,98
![Page 37: Clustering under Constraints with Genetic Algorithms by Albert Ali Salah Stanislav Redman Gabriella Kovacs.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f435503460f94c63f61/html5/thumbnails/37.jpg)
GOOD BYE, CSSS 2002GOOD BYE, CSSS 2002