AI Methods in Data Warehousing A System Architectural View Walter Kriha.
-
date post
19-Dec-2015 -
Category
Documents
-
view
216 -
download
2
Transcript of AI Methods in Data Warehousing A System Architectural View Walter Kriha.
![Page 1: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/1.jpg)
AI Methods in Data Warehousing
A System Architectural View
Walter Kriha
![Page 2: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/2.jpg)
Business Driver: Customer Relationship Management (CRM)
• learn more about your Customer
• Provide personalized offerings (cheaper, targeted)
• Make better use of in-house information (e.g. financial research)
• Somehow use all the data collected
The web is accelerating the problems (terabytes of clickstream data) and provides new solutions: Web-mining, the Web-House)
![Page 3: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/3.jpg)
CRM: Simulate Advisor Functions
• Know interests and hobbies
• Know personal situation
• Know situation in life
• Know plans and hopes
Client oriented: Bank oriented:
• Know where to find information and what applications to use
• Know how to translate, summarize and prepare for customer
• Know who to ask if in trouble
Plus: new ideas from automatic knowledge discovery etc. that even a real advisor can’t do!
![Page 4: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/4.jpg)
Overview
• Requirements coming from a dynamic, personalized Portal Page
• Data Collection and DW Import• AI Methods used to solve requirements• How to flow the results back into the portal
![Page 5: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/5.jpg)
A Portal: A self-adapting System
• Collect information for and about customers• Learn from it• Adapt to the individual customer by using the
“lessons learned”
The problem: a portal does not have the time to learn. This needs to happen off-line in a warehouse!
![Page 6: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/6.jpg)
DW Integration: Sources
ClosedClosedLoopLoop
SAPSAP
IBMIBM
PeopleSoftPeopleSoft
Data Integration PlatformData Integration PlatformDataDataMartsMarts
DataDataWarehouseWarehouse
Web ServersWeb Servers
Application ServersApplication Servers
WebWebLogsLogs
TransactionTransactionServerServer
Supplier Supplier ExtranetExtranet
ContentContentServerServer
AdAdServerServer
RDBMSRDBMS
Demographics/Demographics/External SourcesExternal Sources
e-Business Analyticse-Business Analytics
![Page 7: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/7.jpg)
DW Integration: Structure
LogFramewk
Operational DB
Warehouse
Web stats
Miningtools
Navigation, Transactions, Messages
Personalized information and offerings
RuleEngine
External data
And Applications
Integration
Off-line
On-line
![Page 8: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/8.jpg)
What information do we have?
• The pages the customer selected (order, topics etc.)
• Customer interests from homepage self-configuration
• Customer transactions
• Customer messages (forum, advisor)
• Internal financial information
The data collection and import process needs to preserve the links between different information channels (e.g. order of customer activity)
![Page 9: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/9.jpg)
Welcome Mrs. Rich,We would like to point you to ourNew Instrument X that fits nicely
To your current investment strategy.
News: IBM invests in company Y
Research: asian equity update
Charts: Sony
Quotes: UBS 500, ARBA 200
Links: myweather.com,UBS glossary etc.
Common: customize, filter, contact etc.
Messages: 3 newFrom foo: hi Mrs. Rich
Portfolio: Siemens, Swisskom, Esso,
Common: Banner
Forum: art banking, 12 new
E-Banking: balance =
Interest in our services
(homepage config)
Interest in our services
(homepage config)
forum activityforum activity
transactionstransactions
Interest in shares etc.
Interest in shares etc.
Message activity
Message activity
Special interest (filters
selected)
Special interest (filters
selected)
![Page 10: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/10.jpg)
What do we want to know?
• Does a customer know how to work the system (site usability)?
• Does a customer voice dissatisfaction with company (customer retention)
• If new financial information enters the system – which customers might be interested in it (content extraction, customer notification)?
Which AI techniques might answer those questions?
![Page 11: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/11.jpg)
What do we want to provide?
• A personalized homepage that adapts itself to the customers interests (from self-customization to automatic integration)
• An early warning system for disgruntled customers or customers that have difficulties working the site
• An ontology for financial information• An integrated view of the company and its services and
information (“electronic advisor”)
See: “Finance with a personal touch”, Communications of the ACM Aug.2000/Vol.43 No.8
![Page 12: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/12.jpg)
Welcome Mrs. Rich,We would like to point you to ourNew Instrument X that fits nicely
To your current investment strategy.
News: IBM invests in company X,X now listed on NASDAQResearch: X future prospects asian equity update
Charts: X
Quotes: UBS 500, X 100
Links: X homepagemyweather.com,.
Common: customize, filter, contact etc.
Messages: 3 newFrom advisor: about X inv.
Portfolio: Siemens, add X?
Common: Banner about X
Dynamic, personalized and INTEGRATED
homepage
Dynamic, personalized and INTEGRATED
homepage
Forum: X is discussed here
Connect communities
and site content
Connect communities
and site content
Personal “touch”
Personal “touch”
![Page 13: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/13.jpg)
Data Mining
• The automatic extraction of hidden predictive information from large databases
• An AI-technique: automated knowledge discovery, prediction and forensic analysis through machine learning
Web Mining• Adds text-mining, ontologies and things like xml
to the above
![Page 14: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/14.jpg)
Data Mining Methods
Data mining
Equational
Data DistilledData retained
Decision Trees
Cross Tab
Belief Nets
K-nearest n.CBR.
Rules
Logical
Agents
Neural Nets Statistics
Non-numeric data
Non-symbolic results
Induct.GACART etc.
Kohonen etc.Smooth surfaces
Ext.training
![Page 15: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/15.jpg)
Data Preparation
• Catch complete session data for a specific user• Store meta-information from content with
behavioral data• Create different data structures for different
analytics (e.g. Polygenesis)
Use a special log framework! Make sure there are meta-data for the content available (e.g. dynamically generated page content)
![Page 16: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/16.jpg)
Data Analysis
• Cluster Analysis
• Classification
• Pattern detection
• Association rules
Content Mining (e.g Segmentation of Topics)
Usage Mining (e.g. Segmentation of Customers)
Problem: How to express similarity and distance
•Linguistic analysis, statistics (k-nearest-neighbours)
•Machine learning (Neuronal nets, decision trees)
Problem: How to create a user profile e.g from navigation data
collaborative filtering: derive content similarities from behavioral similarities
![Page 17: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/17.jpg)
(Combined content and behavioral analysis)
• Use statistical cluster mining to extract page-views that co-occur during sessions (visit coherence assumption)
• Use a concept learning algorithm that matches the clusters (of page-views) with the meta-information of the pages to extract common attributes
• Those common attributes form a “concept”
Example: Find Session Topics automatically
![Page 18: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/18.jpg)
Learning Concepts
Session flow
User A
User B
Meta-Information
Conceptual Learning Algorithm
Concept User Profile
![Page 19: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/19.jpg)
The Text-Warehouse: Information Extraction
Serving personalized information requires fine-grained extraction of interesting facts from text bodies in various formats
User profileWith interests
Financial ResearchDocuments
(pdf, html, doc,xml)
Facts not Stories!
Autom.Database
IETool
![Page 20: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/20.jpg)
Methods for Information Extraction
• Analyze Syntax to derive Semantics
• Context changes break algorithm
• Use contextual features to infer semantics (e.g. html tags)
• Very brittle in case of source changes
Natural Language Processing
Wrapper Induction
Both methods use extraction patterns that were acquired through machine learning based on training documents.
![Page 21: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/21.jpg)
More textual methods
• Thematic Index: Generate the reference taxonomy from training documents (linguistic and statistic analysis)
• Clustering: group similar documents with respect to a feature vector and similarity measure (SOM and other clustering technologies)
![Page 22: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/22.jpg)
Automatic Text Classification
Rule based: Experts formulate rules and vertical vocabularies (Verity, Intelligent Classifier)
Example-Based: A machine learning approach based on training documents and iterative improvement (e.g Autonomy, using Bayesian Networks)
Fully automated text classification is not feasible today. Cyborg classification needed. More tagged data needed.
Case: Building a directory for an enterprise portal
![Page 23: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/23.jpg)
The Meta-data/Ontology Problem
“The key limiting factor at present is the difficulty of building and maintaining ontologies for web use”
J.Hendler, Is there an Intelligent Agent in your future?
This is also true for all kinds of information integration e.g. financial research
![Page 24: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/24.jpg)
The Solution: Semantic Web?
XML Syntax
Logic, Rules etc.
Ontologies/Vocabularies
XML Schemas/RDF
Humans define meta-data and use them
Software build, extracts new Ontologies (e.g. Ontobroker)
Agents and tools use meta-data to construct new information
![Page 25: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/25.jpg)
AI on Topic Maps?
Occurrences
Topics
Associations
See: James D.Mason, Ferrets and Topic Maps, Knowledge Engineering for an Analytical Engine
![Page 26: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/26.jpg)
Financial Research Integration
XML Editor
Dep. BDep. B Dep. ADep. A
Warehouse
Distribution
Result DBs
Meta-DataTopicMaps
Wrapper Induction discovers facts
Schema translation, semantic consistency checks e.g. recommendations
Internal Information Model users
![Page 27: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/27.jpg)
Deployment
Operational DB
(Profiles, Meta-Data)
Warehouse
Miningtools
Personalized information and offerings
RuleEngine
Off-line
On-line
Rules
![Page 28: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/28.jpg)
The Main Problems for the “Web-house”
Portal architecture must be designed to collect the proper information and to use the results from the web-house easily
Portal content is at the same time customer offer as well as customer measuring tool
Few people understand both the portal system aspect and the warehouse analytical aspect.
![Page 29: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/29.jpg)
Resources
• Katherine C.Adams, Extracting Knowledge (www.intelligentkm.com/feature/010507/feat.shmtl)
• Dan Sullyvan, Beyond The Numbers (www.intelligententerprise.com/000410/feat2.shtml)
• Communications of the ACM, August 2000/Vol.43 Nr. 8
• Information Discovery, A Characterization of Data Mining Technologies and Process (www.datamining.com/dm-tech.htm)
• Dan R.Greening, Data Mining on the Web (www.webtechniques.com/archives/2000/01/greening.html)
![Page 30: AI Methods in Data Warehousing A System Architectural View Walter Kriha.](https://reader035.fdocuments.us/reader035/viewer/2022062515/56649d265503460f949fd30c/html5/thumbnails/30.jpg)
Data Mining Tools (examples)
• IBM Intelligent Miner• SPSS, Clementine• SAS• Netica (Belief Nets)