Revisiting evolutionary information filtering
-
Upload
manolis-vavalis -
Category
Technology
-
view
32 -
download
1
description
Transcript of Revisiting evolutionary information filtering
![Page 1: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/1.jpg)
Revisiting Evolutionary Information Filtering
Nikolaos Nanas, Centre for Research and Technology Thessaly, GREECE
Stefanos Kodovas, University of Thessaly, GREECE
Manolis Vavalis, University of Thessaly, GREECE
![Page 2: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/2.jpg)
outline
Adaptive Information Filtering – brief introduction
Evolutionary Information Filtering – review
Diversity & dimensionality – theoretical issues
Experimental evaluation
• Methodology – a test-bed
• Results – not a success
• Discussion – interesting observations
Conclusions and future work
![Page 3: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/3.jpg)
Information Overload is still around
![Page 4: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/4.jpg)
Adaptive Information Filtering in the case of textual information
![Page 5: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/5.jpg)
Adaptive Information Filtering (AIF)
challenging problem with no established solution
complex and dynamic
• multiple and changing user interests
• changing information environment
crucial issues for successful AIF
• profile representationprofile representation
• profile adaptationprofile adaptation
![Page 6: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/6.jpg)
Evolutionary Information Filtering with the Vector Space Model
Profile adaptation through evolution of user’s profiles.
![Page 7: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/7.jpg)
Evolutionary Information Filtering
• “A Review of Evolutionary and Immune-Inspired Information Filtering”, Natural Computing, 2009
• A common vector space with as many dimensions as the number of unique keywords
• A population of profiles that collectively represent the user’s interests
• Both profiles and documents are represented as (weighted) vectors in this space
• Trigonometric measures of similarity for comparing profile vectors to document vectors
• Fitness function based on (explicit or implicit) user feedback
• reward profiles that assigned a high relevance score to relevant documents and vice versa• fitness is updated proportional to user feedback
• average score of relevant documents
• ratio of successful evaluations
![Page 8: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/8.jpg)
Evolutionary Information Filtering
profile initialisation is not random
selection
• fixed percentage of best individuals
• variable percentage
• roulette wheel
crossover
• single-point, two-point, three-point
• variable percentage
• roulette wheel
mutation
• keyword replacement
• random weight modification
steady-space replacement
• offspring typically replace less fit individuals
![Page 9: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/9.jpg)
Diversity Issues
AIF is not a classic optimisation problem
• online learning problem
• reminiscent of Multimodal Dynamic Optimisation (MDO)
Traditional GAs suffer in the case of MDO due to diversity loss.
Four types of remedies:
1. adjust mutation rate when changes are observed
2. spread the population
3. memory of previous generations
4. multiple subpopulations
• in “Multimodal Dynamic Optimisation: from Evolutionary Algorithms to Artificial Immune Systems”, 2007
• intrinsic diversity problems due to:• selection based on relative fitness
• no developmental process
• fixed population size
![Page 10: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/10.jpg)
Dimensionality Issues
• A vector space with a large number of dimensions (keywords) is required for successful AIF
• In a multi-dimensional space:
• the volume increases exponentially with the number of dimensions
• distance based measures become meaningless as points become equidistant
• the discriminatory power of pair-wise distances is significantly affected
• scalar metrics can not differentiate between vectors with distributed and concentrated differences
• in a multi-dimensional keyword space the ability of GAs to achieve profile adaptation is affected because:
• the number of possible weighted keyword combinations increases exponentially with the number of dimensions
• crossover and mutation cannot randomly produce the right combination of weighted keywords
![Page 11: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/11.jpg)
Experimental Evaluation: Dataset
Reuters-21578
• 21578 news stories that appeared in Reuters newswire in 1987
• documents are ordered according to publication date
• 135 topic categories
• experiments concentrate on the 23 topics with at least 100 relevant documents
document pre-processing
• stop word removal
• stemming with Porter’s algorithm
• weighting with Term Frequency Inverse Document Frequency (TFIDF)
words with large average TFIDF are selected to build the keyword space
topic code
size
earn 3987
acq 2448
money-fx
801
crude 634
grain 628
trade 552
interest 513
wheat 306
ship 305
corn 254
dlr 217
oilseed 192
topic code
size
money-suply 190
sugar 184
gnp 163
coffee 145
veg-oil 137
gold 135
nat-gas 130
soybean 120
bop 116
livestock
114
cpi 112
![Page 12: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/12.jpg)
Experimental Evaluation: Baseline Experiment
![Page 13: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/13.jpg)
Baseline Results
• as the number of extracted words increases the AUP values increase
• for a small number of extracted keywords the results are biased towards topics with a large number of relevant documents
• the best results are achieved when all extracted keywords are used
• if we wish to represent a range of topics then a multidimensional space is required
![Page 14: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/14.jpg)
Experimental Evaluation: Evolutionary Experiments
a vector space comprising 31298 keywords
The basic Genetic Algorithm: • with a population of 100 profiles
• each profile is a weighted keyword vector (randomly initialised)
• the same random initial population is used in all experiments
• documents are evaluated in order using the inner product
• new fitness = old fitness + relevance score
• the 25% fittest profiles are selected for reproduction
• single-point crossover
• mutation through random weight modification
• the offspring replace the 25% worst profiles
two further variations of the basic GA
• GA_init: initialisation using the first 100 relevant documents per topic.
• GA_init + learning: a MA that uses Rocchio’s learning algorithm
![Page 15: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/15.jpg)
Comparative Results: accuracy
• y-axis: best AUP achieved in 50 generations (bias)
• baseline results are included
• additional results for ranking by date
Findings:
• the GA performs worse than the baseline
• marginal improvements for non-random initialisation
• significant improvement when learning is introduced
• the MA is only better for some topics with small size
![Page 16: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/16.jpg)
Comparative Results: learning
• y-axis: average AUP over all topics after each generation
• x-axis: number of generations
• embedded figure focuses on GA and GA_init
Findings:
• GA does not essentially improve
• better initial performance and learning rate for non random initialisation (GA_init)
• much steeper learning curve when learning is introduced (GA_init + learning).
![Page 17: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/17.jpg)
Conclusions
The basic GA fails to learn the topic of interest.
• the right combination of keyword weights can not be randomly produced.
• the GA is lacking a mechanism for appropriately updating keyword weights.
• performance depends on the weighted keywords that initialisation produced.
When the GA is initialised based on relevant documents
• then the initial set of weighted keywords produces better filtering results
The introduction of learning allows for further improvements in the initial keyword weights.
• still worse than the baseline experiment despite the 50 generations
• this is possibly due to the negative effect of the genetic operations
![Page 18: Revisiting evolutionary information filtering](https://reader035.fdocuments.us/reader035/viewer/2022062511/54b6f4bc4a7959f5698b45aa/html5/thumbnails/18.jpg)
Discussion
Our experimental results do not agree with the promising results reported in the literature
• we did not re-implement an existing approach, but adopted existing techniques
• AIF is a complex problem that can not be easily tackled with weighted keyword in a multi-dimensional space
• comparative experiments between GAs and other machine learning algorithms have been missing from AIF
large differences observed between the GA and the baseline algorithm
• despite the biased comparison in favour of the GA
• more fundamental alternatives which are not based on vector representations
• the choice of representation should facilitate the learning task
• external remedies like those adopted for MDO are not practical
we wish to reanimate the interest of the research community on AIF
• biologically inspired solutions are well suited to the problem
• appropriate experimental methodologies that reflect the complexity and dynamics of AIF are required