Ernestina Menasalvas [email protected] Facultad de Informatica Univesidad Politecnica de Madrid...
-
Upload
daniella-murphy -
Category
Documents
-
view
213 -
download
0
Transcript of Ernestina Menasalvas [email protected] Facultad de Informatica Univesidad Politecnica de Madrid...
Ernestina Menasalvas [email protected]
Facultad de InformaticaUnivesidad Politecnica de Madrid
May 2004
Introduction and motivation• Internet as a communication channel.• Technology needed to develop new services, security, infraestructure,
analysis• Web Mining to analyze the patterns so the services reply to user needs
• Most of the webmining projects that have been developed, have note taken into account the context in which they have been developed:
– Competitive society – Success criteria dependes both:
• User satisfaction• Sponsors benefit increase
• The gap between tecnology depelopment in the web and the business factors is increasing and genetares as a side effect a separation on what tecnologist develop and what the companies need.
• Knowing that the problem exists is just the begining… • Technological projects have to be integrated in the global strategy of
the company
The problem• Innovative ideas in e-commerce are vaguely defined so they
loose focus and precision• New technologies are being applied consuming resources but
without appropriate finantial or economic benefits• Growth of the web activity, participation in every daily activity
(commercial, educational news, ..) is not being replied by an accordindly number of servicies
• Services are being considered insuficient.
• Thus, site sponsors have to improve offered services to satisfy the increasing growth in demand.
• On the other hand, the growth in offers will bring a growth in demand what will make that the consumer will ask for a better service offer.
• Web Mining projects have to be planned as one more project in the global strategy of the company
Web Site personalization Optimization and personalization of user web experience is crucial for
attracting and retaining electronic, web-based commerce customers. Try to maintain the one-to-one relationship Identifying future behaviour is crucial for the site to act proactively. Information about user experience is captured in clickstream logs:
pages viewed, timing, and sequence. Solutions given:
– Clustering of users– Cluster of pages– Most visited path– Recommender systems– …
• The question:– How to deploy?– How has the method been evaluated?– How does it helps to the company– How does it evolves in time?
Web Mining project evaluation• Criteria being used to evaluate the success of a site takes not external
(commercial) aspects into account. • Site aspects such as: increasing volume of selling, fraud decrease,
customer retention, competitivie prizes are not explicitiy tackled • Success in web sites is a measure related to eficiency and quality:
– Efficiency: number of pages being accessed along one session, lenght of the session and actions developed
– Quality: respose time of the site to the user requests, pages accesibility, visitors per page …
• Company success is evaluated in terms of:– Incomes, Outcomes, Expenses– ROI, Market presence
• Differences between criteria used to evaluate the success of any project in the entreprise compared to those in the case of a web project are in the root of the problem of webmining not complete success
• Site sponsors do no evaluate commercial and finantial aspects and are only based on vague commertial notions
• The success in terms of use, structure and content has to be linked to company business goals achievement
Web Mining project management
• An enterprise is a system design to fulfil certain goals by means of the integration of different resources.
• Subsistems are at the same time interrelated and inter independent
• When the company uses the Web as a channel, all the services, infraestructure, …, has to be seen as one of the subsystems.
• Success of solution in the web subsystem has to be related to the behaviour of the rest of the subsistems
• Web Mining projects are concerned with the Web subsystem• So web mining project is not only an IT problem• Apply a project management methodology to control the
process: A project manager is needed-> different role from the data miner
• Identify Data Mining problems. • For each of them apply CRISP-DM
Web Mining Project management (cont)
• To properly deal with a data mining project we need explicit information of the company:
– Structure of the company (departments, sections, channels, …)– Goals of the company and success criteria (both at the higher level and at the
department level) • Company environment, identify:
– Resources, constraints, and any factor that can determine the goal analysis and the development of a web project
– Web Project goals and their relationship with the goals of the company• To evaluate if the web mining project results contribute to the company goals
fulfilment:– The web site is not usually the end but the means.– It is of the channels that the company uses to achieve goals.– So in order to establish a site as a sucessful site, then it is a must the activities being
developed through the site to generate value for the company • Traditional approaches only analyze the site from the user perspective, but the
actions of the users have to generate value for the company
• It is a CRM project
• Web Project plan generation
CRM project – the three legs
ERP/ERM
Order Manag.
Supply ChainMgmt.
Order Prom.
LegacySystems
SalesAutomation
ServiceAutomation
MarketingAutomation
FieldService
Mobile SalesVertical Apps.
Category Mgmt.
MarketingAutomation
Campaign Mgmt.
CustomerActivity
Customers Products
DataWarehouse
Voice(IVR, ACD)
Conferencing
WebConferencing
ResponseManagement
FaxLetter
DirectInteraction
Operational CRM Analytical CRM
Collaborative CRM
Bac
kO
ffic
eF
ront
Off
ice
Mob
ileO
ffic
eC
usto
mer
Inte
ract
ion
Clo
sed-
Loop
Pro
cess
ing
(EA
I Too
lkits
, Em
bedd
ed/M
obile
Age
nts
Data MiningIncreasing potentialto supportbusiness decisions
Relationship with End User
Business Analyst
DataAnalyst
DBA
MakingDecisions
Data Presentation
Visualization Techniques
Data MiningInformation Discovery
Data Exploration
OLAP, MDA
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Data SourcesPaper, Files, Information Providers, Database Systems, OLTP
Fact Gap
“Fact Gap”: difference between the available information and the ability to take decisions based on these information. (Gartner Group)
Data Mining gives the intelligence
• Data bases gives the data.• But intelligence is needed to explore the data
to find the patterns, rules and ideas to explain what is going on and to predict what will go on
• Techniques and tools are needed to add this intelligence to data in order to extract the maximum benefit from data.
• But tools alone (nowadays) do not put the intelligence, this has to be provided by EXPERTS and translated into the data for better understanding
Data warehouse and data bases are the support
Data Mining Standard process model : Crisp-DM
Problem Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment
Building the bridge
• In order to provide users with the most appropriate solution, data to be analyzed have to be enriched with business information
• Business problems have to be translated to data mining problems
• Results have to be understable not only by data mining experts but also by end users
• Underlying the data mining solution semantics has to be settled
Deeper analisis of Personalization
• What is personalization?• Observe user-web page interactions to identify patterns that:
indicate high-level user activity, anticipate future use activity, Make it possible to proactively act
• What is going to be personalized?– The site: this means pages according to the users behaviour or
pattern • Why the personalization is needed?
– To improve the site performance– The web is just another channel – Site performance has to do with improving the goals of the
company• Who is the user?
– Navigator– Customer
Web Data to be analyzed
• In any web mining problem we have data related to:– Pages– Navigators and navigation– Customers and their transactions
• Web Logs is just the begining• Not only the data has to be taken into account but all
the circumstances under which the data were collected:
• Environment– General– Organization-related– Customer-related
Enviroment
• Affects both direct and indirectly to the way activites occur. Between the factors to take into account:– Legal conditions– Technological conditions– Demography– Ecological conditions (weather, transports,
communications)– Cultural and social conditions– Geographical situation
• Take into account the location of the site, of the navigator, …
Information to be added• Departments:
– The same concept can have different meaning depending on the department – Product for marketing is not the same than for production
• Products, services:– Data per se of the object: size, color, …– Data relevant for the company: margin of benefits, top ten, …– How it is presented in the web
• People consumers in general:– Static data: gender, demographic information (varies over the time but in a particular
moment it is static)– Roles:…– Behavior with the company being analyzed: number and kind of transaction he/she
performs– Behavioural data related to the environment (economy, legal constraints, climate,…)
• Navigators:– Web Log: Location (IP), time, browser,…– Behaviour : comparative with the “normal” if any to discover : mood, different location,
…• Dates
– Itself has no meaning– Legal and fiscal periods, holidays, weekend, – Opening, closure, ….
Data enrichment • There is no method, no model to follow. It is more an art• Only with experience • Projects for the same domain share the enrichment:
– A model could be established– Evaluate if data are appropriate to mine– Evaluate kind of patterns that can be obtained– Evaluate if a certain pattern cannot be obtained
• Metadata is needed about the data– Meaning for the business of each value, attribute, page, action, …
• Metadata for each attribute, has to include semantics:– Meaning: group according to it: demographical, behavioural, enviromental,
social, cultural– Business value – Cirmcunstances– Constraints– Relationship with other concepts
• Ontology of concepts ??? • Integrate metadata so the mining activity deals with them.
Data Modelling and deployment
• Once enriched data, patterns extracted can be interpreted according to:– User profiles– Session value (according to certain goals)– Period of the day
• Solution has to be deployed and integrated in the site structure.
• Patterns evolve in time as new data are coming
• Models have to be refined• Establish the basis for the model to be refined
without performance decrease
Web Mining infraestructure
DECISION LAYER
User HTTP Client
OriginalWEBSITE
UserAgent
InterfaceAgent
UserModel
SEMANTIC LAYERCRM SERVICES PROVIDER LAYER
PlanningAgent
USERS
PlanningAgentPlanning
Agent VWi
OperationalPLANS
ActionPlan
HTTP Request
HTTP Response HTTP Response
WebLogs
ModelsServices Information AgentsAgents
Case-study: act according to the value of the current session
Patterns to help: Predict user behavior based on current behavior, not identity. Abstract user behavior with varying degrees of granularity =>
subsessions. Estimate the value of the session to accordidly act
Subsessions capture/approximate user state information.
Key concept: frequent behavior paths. Markov model to predict next set of pages and
behaviour Webhouse to store information about users Modify APACHE: pop ups and precaching
Case-study
1. Find behavior rulesPartial tree:
Define break points as decision points in the path. Use them to create rules.
Knowing PIND
allows us to
predict a set of pages to
follow....
PIND
PDEP
Break point
PDEP
Break point
Behaviour rules– Página principal, Tablón Exámenes– Página principal, Tablón Prácticas, Material apoyo Práctica 1– Página principal, Tablón Prácticas, Material apoyo Práctica 2
5
2
Páginaprincipal
PrácticasMaterial deapoyo Práctica 2
4
Tablón
Página deDecisión
Material deapoyo Práctica 13
Exámenes-3
PáginaObjetivo
...
2. Find Subsessions Sessions may be described
in terms of subsessions. E.g., browse catalog,
browse shipping information, browse privacy notices, perform purchase.
Subsessions may be defined in a number of ways, according to the desired semantics. E.g., use breakpoints.
PDEP
PIND
PDEP
Click-path Subsession FigureReal-time user web page access path, with identified frequent paths
Web page access path expressed as a sequence of subsessions
3. Markov models to predict behavior and paths
session1
session2
session3
session4
session5
session6
. .
.
Behavior X Behavior Y
BK N BK M BK P
Dep2
Dep1
Dep3
4. Per user analysis: average time spent in page
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 1011121314151617181920212223
Time(secs)
URLs
5. Online Value evolution
Value
Traversed number of links
-5
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
sesión 1 sesión 2 sesión 3
Benefits of the algorithm
• Makes it possible to know at any point if the ongoing navigation would be beneficial for the site, so that the site can be dynamically adjusted accordingly.
• Quantify the value of a user session while he or she is navigating
• Makes relationship user - site closer to real life relationships
• The algorithm integrates the site/department goals:– Sends pop ups to students according to the exercises they
have already done– Professors can establish preferences and the rules are
changed accordingly– …
Conclusion
• Without a proper project management:– Difficult to obtain significant patterns– Difficult interpretation of the resutls– The potential of the process is minimized
• Site goals have to be integrated• Algorithms alone are of not use: The best
algorithm not always means the best result• The patterns have to be deployed in a proper
architecture
THANKS!
QUESTIONS???