Jean-Charles Lamirel [email protected] INRIA-NSC 3rd SFW Jean-Charles LAMIREL, Jieh HSIANG Liu WJ...
-
Upload
poppy-farmer -
Category
Documents
-
view
215 -
download
0
Transcript of Jean-Charles Lamirel [email protected] INRIA-NSC 3rd SFW Jean-Charles LAMIREL, Jieh HSIANG Liu WJ...
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Jean-Charles LAMIREL, Jieh HSIANG
Liu WJ
LORIA, Nancy, France
Using a Background Neural Model in a Digital Library
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Research areas : Biological-like models for intelligent
information management
Applications : Autonomous robotics and in-board intelligence Numerical classification (vs. symbolical) Information retrieval and discovery
The CORTEX team
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Main themes of researchInterface for personalized access to informationIntelligent multimedia data miningWeb - Documentary database interaction
Collaborations ORPAILLEUR INRIA team, INIST, LaVillette,
NSC Taiwan, industry... European projects: SCHOLNET, EISCTES
The CORTEX information retrieval and discovery activity
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Adaptive environment for assistance to investigation on the Web
Multi-topographic navigation MultiSOM For multimedia data mining For data mining on full text (patents)
Numerical-symbolic collaboration
Some examples of application
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Introduction: Basic set of functionalities for information discovery Limitations of the classical methods for information
discovery The MultiSOM model + Butterfly application:
Basic behaviour Extensions
Management of textual information
Presentation summary
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Synthetical view of the studied domain = Distribution of the thematical indicators of the domain Highligting of regularities / weak signals Management of several type of synthesis
Interactivity = Dynamic data mixture / type of need Choice of meta-orientation of investigation Setting of the granularity level of the analysis
Multimedia
Basic set of functionalities for information discovery
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Exploratory (no goal): « Which is the contents of the database ?»
Thematic (general orientation): « Images of space conquest »
Connotative (hidden goal, indirect research): « Impressive images on human technology »
Precise: « Images of Amstrong moonwalk, July 69 »
Managing different kinds of queries for discovery
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Overall view of the studied domain = Noise Complex interpretation (hidden information)
Local views necessarily independant Weaks signal difficult to highlight No interactivity =
Passive classification Predefined ways to access to information
Limitations of the classicalmethods for information discovery
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Topographic learning (SOM) = classification projection
Multi-viewpoint modelization capabilties (MultiSOM) Intuitive auto-organization of information Active maps (IR + Navigation) Low human intervention during construction Multimedia capabilities
Neural methods for information cartography
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Different kinds of query Query by keywords Query by example
Different kinds of criteria Colour (automatic) Shape (manual) Texture (manual)
Problems Hand-made classifications Combination of results coming from different criteria
Butterfly museum application
Yellow = very strong,Red = not,Edge = strongSpot = middle, …
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Viewpoint classifications
Global and/or cross viewpoints classifications
Butterfly application
Query by keywords
Query by example
Adding new individuals
User interface
Combination of results
Validation of insertionor classification recalculation
Butterfly application automationU
ser interface
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
IDF
Data description: Document (image) = index vector : eg vector of characteristics Weighting of the characteristics modalities (very strong=1, …) Optionnal IDF weighting (weak signals detection)
Basic topographic map building
WEIGHTED DESCRIPTION
TEXTURE
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Map predefined parameters settings: Number of neurons Structure : eg 2D grid with square neighbourhood
Competitive learning:
Basic topographic map building
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Selection of the winning neuronInfluence on the neigbourhood
Current data(image)
at time T
Competitive learning
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Map labelization: Based on the best components of the profiles Class or member-oriented One single method is not sufficient
=> Gives an overview of the detected themes
Map zoning: Based on the SOM topographic properties Based on the best components of the class profiles
=> Gives an overview of the weights of the themes
Map labelization and zoning
THEME
« YELLOW »
MULTIMEDIA THEMATIC CARTOGRAPHY OF « BUTTERFLY »
COLOR VIEWPOINT
THEME
« GREEN »
CENTRAL SUB.
LIST OF THEME MEMBERS
IMAGE DESCRIPTION
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
VIEWPOINT 1
Basic map (core classification)
On-line generalizations
VIEWPOINT 2
The MultiSOM model
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Goal: Synthethize the map contents by decreasing the number
of neurons (classes) Constraints:
Preserve the map topographic properties No classification re-computation
Method: Exploitation of the neighbourhood relations on the map
Map on-line generalization
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Subspace of the description space Can be a field, a subset of keywords, ... Possible overlapping sets Concurrent or complementary viewpoints
=>Examples: indexer keywords, title keywords, authors, … , visual characteristics, sounds
=>Butterflies: color, shape, texture, …
Semantic viewpoints
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Goal: Cope with the limitations of a global map Allow communication between viewpoints
Constraints: Interpretable behaviour
Method: Re-projected data = Transmitters neurons Two steps:
1) Activation of a source map (directly or through a query)
2) Transmission to target maps
Inter-map communication
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
A function:
Two modes: Possibilistic (weak thematic relations over viewpoints)
Probabilistic (mesure of the themes similarities)
=> g = class belonging degree
Inter-map communication
COLOR MAP
BUTTERLIES
Question: Regularities in textures of yellow butterflies ?
Inter-map communicationTEXTURE MAP
Response: YES, Spots and Edges
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Compliance with IR operations
Question: Are there butterflies with spots AND veins ?
Response = NOResponse = YES
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Remaining problems (to be solved)
Validation of the automatic classification results by the experts
Testing of different results merging methods Test the use of prototype features in classification* Realization of a Web interface for the maps Compare map build-in result combination mechanism
with external combination mechanism Test map capabilities for the help in adding new
individuals Introduce textual data and combine it with images
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Experimentation on patents (texts)
Goal : Intelligent technological survey =
Full text analysis of the patentsDomain of oil engineeringProvide answers to questions like :1. “Which are the relationships between patentees ?”,
2. “On which specific technology does a patentee work ? Which are the advantages of this specific technology ? For which use ?”,
Basic experimental protocol
DILIBReformating
Nominal groupsExtraction
MicroNOMADMultiSOM
Patents in XMLFormat
Structured by Viewpoints
PatentsDatabase
Interactive maps for analysisInteractive maps for analysisValidated
Multi-indexes
ViewpointsDefinition
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Nominal groups extraction
1) Lexicographic analysis (compound terms)
2) Normalization :
Ex: “ oil fabrication ” and “ oil engineering” => “ oil engineering ”
Results :
Déposants Titre Utilisation AvantagesNombre de documents indexés 1000 1000 745 624Nombre d’index bruts générés 73 605 252 231Nombre final d’index (après filtrage) 32 589 234 207Nombre de classes non vides par carte * 28 55 57 61
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Patents reindexingSelected Viewpoints: title, use, advantages and patentees
Use
Advantages
Title (Components)
Patentees
Example of dynamic analysisDYNAMIC DEDUCTION : Parentee «TONEN CORP. » is a specialist of lubrification
of the « automatic transmission ». It products mainly oils based on « organo- molybdenum compound » whic have the specific property of having a « friction
coefficient stable stable on a wide range of temperature »
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Different viewpoints yield complementary results: Ex: Indexer keywords = Closed themes,
Title keywords = Open themes, ...
Detection of indexation inconsistencies Projection of thematic pertinence of a query Bilateral synergy: images <=> textual information Very rich and flexible inter-map communication
mechanism: Cross analysis between viewpoints, dynamics No limitation regarding viewpoints type and number
Conclusion
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Sophisticated 2D mapping, 3D mapping Pure image mosaic navigation Automatization of communication between viewpoints Interaction with Gallois lattice: map zoning and
generalization, rule mapping, lattice entry points selection Applications:
1) La Vilette: interactive browsing through museum collection, setting up of exibitions
2) INIST: Cartography of the Web (EISCTES EEC Project)
Perspectives
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
3) Combining Symbolic and Numeric Techniques for DL Contents Classification and
Analysis
Jean-Charles LAMIREL,
Yannick TOUSSAINT (Orpailleur)
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Introduction
Combining numerical and symbolic methods: MicroNOMAD Self Organizing Maps (SOM)
• Basic SOM topographic properties
• MicroNOMAD multi-map communication process
Lattice• Formal properties and symbolic deduction
• Hierarchical structure and inheritance of properties
Study of projection of SOM over lattice • Making explicit formal properties on the map
• Map intelligent zoning and labelization
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Galois lattice Symbolic hierarchical method: ({i1, i2}, {p1, p2, p3}) Partial order defined by the subsumption relation over
the set of formal concepts:
(I1, P1) (I2, P2) I1 I2,
(I1, P1) (I2, P2) P1 P2,
I1, I2 there is a unique meet and join.
Inheritance of properties Extraction of association rules:
Search Engine {Web, IR}
{{i1, i2, i3, i4} , }
{ , {IA, Robots, Search Engine, Web, IR} }
{{i1, i2}, {Web, IR} }{{i4}, {AI, Robots} }
{{i1, i2, i3}, {Search Engine, Web, IR} }
R1 = Search Engine {Web, IR}
I = {i1, i2, i3, i4}, P = {AI, Robots, Search Engine, Web, IR}i1 = {Web, IR}i2 = {Web, IR}i3 = {Web, IR, Search Engine}i4 = {AI, Robots}
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Complementarity of approaches Kohonen SOM
Complex weighting scheme Difficulty for precise interpretation Good illustrative power (topographic structure) Good synthesis capabilities Non linearity
Lattice High number of classes Memory and time consuming Hierarchical structure Rule extraction Incrementality
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Conclusion
Cosine method seems to be the best of the test Good accuracy Well-balanced agglomeration Agglomeration preserves closed areas on SOM
Other projection and agglomeration methods have to be tested Preservation of partial order and inheritance
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Perspectives
Evaluation on large corpus + Expert Rule management
class quality evaluation class labelisation
Deduction validation on communicating maps (lattice extensions)
Implementation of an operational prototype
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Other approaches
Multi-classificator cooperation (PhD) SVM Stigmergy Genetic Neural maps
On-line learning of user ’s behaviour, intelligent relevance feedback
Jean-Charles Lamirel [email protected] 3rd SFWINRIA-NSC 3rd SFW
Annexes
Topographic inconsistencies Area computation Inter-map communication Activity coherency
Inter-map communication
HYPERAWACKENESS : CONDITIONNAL POSSIBILITY
WEIGHTED SUM : CONDITIONNAL PROBABILITY
Viewpoint oriented Patents Analysis
Selected Viewpoints: title, use, advantages and patentees