S-CUBE LP: Mining Lifecycle Event Logs for Enhancing SBAs
-
Upload
virtual-campus -
Category
Technology
-
view
330 -
download
1
description
Transcript of S-CUBE LP: Mining Lifecycle Event Logs for Enhancing SBAs
Exploiting Knowledge on Past Process Execution to Improve SBA Analysis
Mining Lifecycle Event Logs for Enhancing SBAs
ISTI-CNR (CNR), TU Wien (TUW)
Franco Maria Nardini, Gabriele Tolomei, CNR
Learning Package Categorization
S-Cube
Monitoring and Analysis of SBA
Process Mining
Exploiting Knowledge on Past Process Execution to Improve SBA Analysis
Connections to the S-Cube IRF
Conceptual Research Framework: – Service Composition and Coordination – Service Infrastructure
– Adaptation and Monitoring
Logical Run-Time Architecture: – Monitoring Engine – Adaptation Engine
– Negotiation Engine
– Runtime QA Engine – Resource Broker
3
Overview
Introduction
Goal
Methodology
Experiments
Conclusions
SBA Event Logs
Most complex software systems collect their lifecycle usage data in event log files
SBA event logs contain several information about service components exchanging messages – e.g., service invocation, service failure, registry querying, etc.
Event logs represent a huge source of “hidden” information (i.e., knowledge)
5
Mining SBA Event Logs
Data Mining algorithms and techniques allow extracting valuable knowledge from event logs
Extracted knowledge may refer to several aspects: – e.g., service usage patterns, service failure patterns, etc.
If properly exploited, such knowledge might help improving the overall quality of the system: – recommending frequent invoked services;
– avoiding/handling anomalous situations, etc.
6
Process Mining (PM)
Process Mining (PM) is an application of data mining techniques to SBA event logs
PM aims at discovering structured process models derived from patterns that are present in actual traces of service executions
Each process is usually represented by a digraph and the problem of PM has been modeled as: – finite state machine [CW96]
– sequential pattern mining (SPM) [AGL98]
– Petri-net [vdAWM04]
7
Another Example: Web Search Engines
Web Search Engines (WSEs) are another example of systems that benefit from mining their event log data (i.e., Query Logs)
Query Log Mining (QLM) has proven to be effective for enhancing the overall performances of WSEs
We propose a QLM technique for identifying search patterns (tasks) from the stream of queries recorded in query logs [LOPST11]
8
Overview
Introduction
Goal
Methodology
Experiments
Conclusions
Goal
Treat PM as an instance of the SPM problem
Detect frequent sequential patterns of service invocation, i.e., services that are frequently co-invoked within the same sequence – e.g., service Y is usually invoked afterwards service X
Find which/how services are actually used – service recommendation
– avoiding/handling anomalous situations
10
Overview
Introduction
Goal
Methodology
Experiments
Conclusions
Sequential Pattern Mining
Event log might be viewed as sequences of events that change with time (time-series)
We are interested in finding sequences of services that are frequently invoked in a specific order, i.e., sequential patterns
Sequential Pattern Mining (SPM) is the process of extracting sequential patterns whose support exceeds a predefined minimal support threshold min_supp
12
PrefixSpan
One of the most efficient algorithm for finding sequential patterns [PHMP01]
Mines the complete set of patterns but greatly reduces the efforts of candidate subsequence generation
Takes only into account the chronological order between events
- i.e., it only cares if X comes before Y without worrying about the actual time interval
13
MiSTA
Hint: observing that two services are invoked really close rather than far away to each other in a sequence could lead to distinct conclusions
MiSTA [GNPP06] is able to deal with the actual time interval between any two consecutive service invocations
It needs a time threshold tau for specifying the maximum time interval of events in a frequent sequence
14
Overview
Introduction
Goal
Methodology
Experiments
Conclusions
Data Set: VRESCo
VRESCo is the runtime environment for Service-oriented Computing developed by VITALab@TUW
It collects usage data (i.e., events) in the form of XML log file
VRESCo event log file contains information about: invoked services, service rebinding, service failure, etc.
We only focus on service invocation events
16
PrefixSpan: min_supp=25%
17
PrefixSpan: min_supp=50%
18
PrefixSpan: min_supp=66%
19
MiSTA: min_supp=32%, tau=5sec.
20
MiSTA: min_supp=32%, tau=60sec.
21
MiSTA: min_supp=32%, tau=300sec.
22
Results
The service logs coming from the VRESCo runtime environment contain frequent patterns of services;
Those patters contains information about: invoked services, service rebinding, service failure, etc;
Those patterns could be collected by considering co-occurring sequences and also by considering the time;
Such inferred knowledge can be used to enhance SBAs: e.g., by means of novel design tools like service recommendation.
23
Overview
Introduction
Goal
Methodology
Experiments
Conclusions
Conclusions
Event logs collected by complex software systems represent a huge source of information (knowledge)
Find sequences of frequently co-invoked services from SBA event logs using Sequential Pattern Mining (SPM)
2 SPM algorithms run on top of a real-world SBA event log (VRESCo): PrefixSpan, MiSTA
Experimental results show that some services are often invoked together in a frequent sequence
Exploit such inferred knowledge to enhance SBAs: e.g., by means of novel design tools like service recommendation
References
– [CW96] J. E. Cook and A. L. Wolf, “Discovering models of software processes from event-based data”. Research Report Technical Report CUCS-819-96, Computer Science Dept., Univ. of Colorado, 1996.
– [AGL98] R. Agrawal, D. Gunopulos, and F. Leymann, “Mining Process Models from Workflow Logs”. In Sixth International Conference on Extending Database Technology, pp. 469–483, 1998
– [vdAWM04] W. van der Aalst, T. Weijters, and L. Maruster, “Workflow Mining: Discovering Process Models from Event Logs”. IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 9, pp. 1128–1142, Sep. 2004.
– [LOPST11] C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei, “Identifying task-based sessions in search engine query logs”, in WSDM ’11. ACM, 2011, pp. 277–286.
– [PHMP01] J. Pei, J. Han, B. Mortazavi-Asl, and H. Pinto, “Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth,” in ICDE ’01. IEEE, 2001
– [GNPP06] F. Giannotti, M. Nanni, D. Pedreschi, and F. Pinelli, “Mining sequences with temporal annotations,” in SAC ’06. ACM, 2006, pp. 593–597.