Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from...
-
Upload
kimberly-cook -
Category
Documents
-
view
212 -
download
0
Transcript of Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from...
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
ICDC background and know-howand expectations from CROSSMARC
CROSSMARC Project IST-2000-25366
Kick-off meeting
Edinburg March 2001
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
NLP-based applications at ICDC
• Documents filtering– Syntactic analysis + NERC + Inference engine– Intranet and commercial internet
• Documents clustering– Statistical analysis
• Real-time documents indexing– Search engine techniques
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
NLP- based prototypes at ICDC
• Shareholding events detection– Information extraction
• Documents filtering– transducers (CORAIL)– neural networks (TREC)
• Control techniques using machine learning– controlling filters with neural networks (RIAO)– controlling NERC with C4.5 (ADIET with NCSR)
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
NLP-based applications
• Complex applications– development– exploitation– maintenance
• Heterogeneous modules– implementation: OS, language, communication, format– processing: data, resources, algorithms
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
TalLab
• ICDC architecture for NLP-based applications– Operational since 1997– Used in several applications and prototypes
• Publications– [Wolinski et al. 98] NLP+IA, Moncton
– [Wolinski et Vichot 01] TSI, Paris
• Reference– [Cunningham et al. 00] LREC, Athens
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
Guidelines for the design of TalLab
• Relying on a multi-agents model
• Reusing the OS wherever it is possible
• Refusing to impose a single standard
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
Agents and circuits in TalLab
messagesknowledge
Accointances
activitybehavior
persistence
messagebox
Agent
Circuit of agents
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
NLP techniques used in TalLab
• Tokenisation• POS tagging• Syntactic analysis• Named Entity
Recognition and Classification
• Semantic analysis
• Search engines• Neural networks• Finite state transducers• Vector space model• Statistical clustering
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
Transistor-like agents
Cardinality 1-N
Cardinality N-1
Cardinality 1-1
Multiplier Dispatcher Switcher
Filter Translator Networker
ConcentratorSynchronizer
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
TalLab main features• Malleability : plug & play architecture, easy prototyping
• Openness : reuse market components, low integration cost
• Efficiency: distribute applications, real-time, batch processing
• Exploitability:
– Deployability full integration in the MIS
– Reliability quality of service, robustness
– Controllability monitoring facilities, surveillance tools
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
Malleability
Units of production=
Circuits of agents
Linking modules=
Plugging agents
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
OpennessIntegrating a component
=Building a transducer
Managing heterogeneity=
Programming a translator
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
Efficiency
Pipeline architecture Concurrent architecture=
Using multiplier
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
Exploitability
• Deployability
– distribution: sub-networks architecture
– networkers: intranet proxies and internet firewalls
• Fiability
– modularity: independence of agents
– persistence: knowledge / message box / failures
• Controllability
– uniformity: general controlling procedures
– OS integration: connection to monitoring software
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
ICDC technical expectations
• Adaptive techniques for information extraction from web pages
• Techniques for managing multilingual NLP-based applications
• Processing typical web texts (vs news items)
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
ICDC applicative expectations
• Evaluation of the added-value of CROSSMARC in the context of CDC
• Exploitation of CROSSMARC by-products for competitive intelligence applications
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
Intranet application at CDC• Real-time news filtering and clustering
– 100 users, 100 topics
• Information retrieval– 2 years of AFP economic news
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
Internet application at CDC-Mercure
• Real-time news filtering– 8,000 users, 80 topics
Edinburg March 2001 CROSSMARC Kick-off meeting ICDC
IE prototype at CDC
• IE dedicated to shareholding events