Xiao Liu CITR - Centre for Information Technology Research Swinburne University of Technology,...

Post on 13-Jan-2016

219 views 0 download

Transcript of Xiao Liu CITR - Centre for Information Technology Research Swinburne University of Technology,...

Xiao Liu

CITR - Centre for Information Technology Research

Swinburne University of Technology, Australia

xliu@swin.edu.au

Temporal Verification in Grid/ Scientific Workflows

2

Grid/Scientific Workflows Temporal QOS Framework

Setting Temporal Constraints in Scientific Workflows

SwinDeW-G Grid Workflow Management System Additional Information

Research areas in Workflow Technology Program Data Mining Techniques in Workflow area Optimization Algorithms in Workflow area

Content

3

Grid/Scientific Workflow Grid Workflow Management System

A type of workflow management system aiming at supporting large-scale sophisticated scientific and business processes in complex e-science and e-business applications, by facilitating the resource sharing and computing power of underlying grid infrastructure.

Scientific Workflow Management System A type of workflow management system aiming at supporting

complex scientific processes in many e-science applications such as climate modelling, astronomy data processing. It may or may not be built upon grid infrastructure. Can be cluster or P2P.

4

How Are Grid Used

High-performance computing

Collaborative data-sharing

Collaborative design

Drug discovery

Financial modeling

Data center automation

High-energy physics

Life sciences

E-Business

E-ScienceNatural language processing & Data Mining

Utility computing

From www.gridbus.org

5

An Example Grid Application

From www.gridbus.org

6

Grid Architecture

From www.gridbus.org

7

Grid Workflow Engine

From www.gridbus.org

8

Grid/Scientific Workflows Temporal QOS Framework

Setting Temporal Constraints in Scientific Workflows

SwinDeW-G Grid Workflow Management System Additional Information

Research areas in Workflow Technology Program Data Mining Techniques in Workflow area Optimization Algorithms in Workflow area

Where Are We

9

Temporal Verification In reality, complex scientific and business processes are normally

time constrained. Hence, time constraints are often set when they are modelled as grid workflow specifications.

Temporal constraints mainly include: upper bound, lower bound and fixed-time

Upper bound constraint Lower bound constraint Fixed-time constraint

Temporal verification is used to identified any temporal violations so that we can handle them in time.

10

Temporal QOS Framework Constraint Setting

Setting temporal constraints according to temporal QOS Specifications

Checkpoint Selection Selecting necessary and sufficient checkpoints to conduct temporal

verification

Temporal Verification Verifying the consistency states at selected checkpoints Temporal Consistency: SC (Strong Consistency), WC (Weak

Consistency), WI (Weak Consistency), SI (Strong Consistency)

Temporal Adjustment Handling temporal violations

11

Grid/Scientific Workflows Temporal QOS Framework

Setting Temporal Constraints in Scientific Workflows

SwinDeW-G Grid Workflow Management System Additional Information

Research areas in Workflow Technology Program Data Mining Techniques in Workflow area Optimization Algorithms in Workflow area

Where Are We

12

Setting Temporal Constraints Problem Statement

In scientific workflow systems, temporal consistency is critical to ensure the timely completion of workflow instances. To monitor and guarantee the correctness of temporal consistency, temporal constraints are often set and then verified. However, most current work adopts user specified temporal constraints without considering system performance, and hence may result in frequent temporal violations that deteriorate the overall workflow execution effectiveness.

Granularity of temporal constraints Coarse-grained constraints refer to those assigned to the entire

workflow or workflow segments. Fine-grained constraints refer to those assigned to individual activities.

13

A Motivating Example

This workflow segment contains 12 activities which are modeled by SPN (Stochastic Petri Net) with additional graphic notations. For simplicity, we denote these activities as X1 to X12. The workflow process structures are composed with four SPN based building blocks, i.e. a choice block for data collection from two radars at different locations (activities X1 to X4), a compound block of parallelism and iteration for data updating and pre-processing (activities X6 to X10), and two sequence blocks for data transferring (activities X5 ,X11 to X12).

14

Two Basic Requirements Temporal constraints should be well balanced between user

requirements and system performance. It is common that clients often suggest coarse-grained temporal

constraints based on their own interest while with limited knowledge about the actual performance of workflow systems. Therefore, user specified constraints are normally prone to cause frequent temporal violations.

Temporal constraints should facilitate both overall coarse-grained control and local fine-grained control.

Both coarse-grained temporal constraints and fine-grained temporal constraints should be supported. However, note that coarse-grained temporal constraints and fine-grained temporal constraints are not in a simple relationship of linear culmination and decomposition. Meanwhile, it is impractical to set fine-grained temporal constraints manually for a large amount of activities in scientific workflows.

15

A Probabilistic Strategy Probability based temporal consistency

A novel probability based temporal consistency which utilise the weighted joint distribution of workflow acitivity durations is proposed to facilitate setting temporal constraints.

Two assumptions on activity durations Assumption 1: The distribution of activity durations can be obtained

from workflow system logs. Without losing generality, we assume all the activity durations follow the normal distribution model, which can be denoted as N(µ,σ2) .

Assumption 2: The activity durations are independent to each other. Exception handling of assumptions : Using normal transformation and

correlation analysis, or moreover, ignoring first when calculating joint distribution and then added up afterwards.

16

Weighted Joint Normal Distribution Joint normal distribution

If there are n independent variables of Xi~N (µi,σi2) and n real numbers θi,

where n is a limited natural number, then the joint distribution of these variables can be obtained with the following formula:

Weighted joint normal distribution For a scientific workflow process SW which consists of n activities, we

denote the activity duration distribution of activity ai as N (µi,σi2) with

(1≤i≤n). Then the weighted joint distribution is defined as:

where wi stands for the weight of activity ai that denotes the choice probability or iteration times associated with the workflow path where ai belongs to.

17

Probabilistic Specification of Activity Durations

Maximum Duration, Mean Duration, Minimum Duration The 3σ rule depicts that for any sample comes from

normal distribution model, it has a probability of 99.73% to fall into the range [µ-3 σ, µ+3 σ] of which is a systematic interval of 3 standard deviation around the mean. According to this, in our strategy, we have the following specification of activity durations:Maximum Duration D(ai)= µ+3 σ

Mean Duration M(ai)= µ

Minimum Duration d(ai)= µ-3 σ

18

Probability based Temporal Consistency

19

Setting Strategy

20

Stpe1: Weighted Joint Normal Distribution

Here, to illustrate and facilitate the calculation of the weighted joint distribution, we analyse basic SPN based building blocks, i.e. sequence, iteration, parallelism and choice. These four building blocks consist of basic control flow patterns and are widely used in workflow modelling and structure analysis. Most workflow process models can be easily built by their compositions, and similarly for the weighted joint distribution of most workflow processes.

21

Step2: Setting Coarse-grained Constraints

I Want the process be

completed in 48 hours

Let me check the probability

The negotiation process

22

Step2: Setting Coarse-grained Constraints

That’s not good, how

about 52 hours

Sir, its 70%, do you agree?

Adjust the constraint

23

Step2: Setting Coarse-grained Constraints

Err… how long will it take if I want to have

90%

Then, it increases to

85%

Adjust the probability

24

Step2: Setting Coarse-grained Constraints

Ok, that’s the deal! Let’s do

it!

It will take us 54 hours

Negotiation result

25

Step2: Setting Coarse-grained Constraints

Ok! But, sir, I need to remind you that this is only a guarantee from statistic sense. If we cannot make it, please

blame the stupid guy who invents the strategy!

Sorry, statistically, no predictions can be 100% sure!

26

Step3: Setting Fine-grained Constrains

Setting fine-grained constraints for individual activities Assume the probability gained from the last step is θ% that is

with a normal percentile of λ. Then the fine-grained constraints for individual activities are (µi +λσi).

For example, if the coarse-grained temporal constraints are of 90% consistency, that is a normal percentile of 1.28, then the fine-grained constraint for activity ai with a distribution of N(µI, σi) is (µi +1.28σi).

27

Evaluation--Specification

28

Setting Results: Coarse-grained Constraint

Negotiation for coarse-grained constraint

6300s

6360s

6390s

6400s

66%

75%

79%

81%

WS~N(6210,2182)

U(WS)=6400, λ=0.87

29

Setting Results: Fine-grained Constraint

30

Grid/Scientific Workflows Temporal QOS Framework

Setting Temporal Constraints in Scientific Workflows

SwinDeW-G Grid Workflow Management System Additional Information

Research areas in CITR Workflow Technology Program Data Mining Techniques in Workflow area Optimization Algorithms in Workflow area

Where Are We

31

SwinDeW-G Grid Workflow System SwinDeW-G stands for Swinburne Decentralised

Workflow for Grid. SwinDeW-G is a peer-to-peer based scientific grid workflow

system running on the SwinGrid (Swinburne service Grid) platform. Swinburne CITR (Centre for Information Technology Research) Node, Swinburne ESR (Enterprise Systems Research laboratory) Node, Swinburne Astrophysics Supercomputer Node, and Beihang CROWN (China R&D environment Over Wide-area Network) Node in China. They are running Linux, GT4 (Globus Toolkit) or CROWN grid toolkit 2.5 where CROWN is an extension of GT4 with more middleware, hence compatible with GT4.

32

Grid/Scientific Workflows Temporal QOS Framework

Setting Temporal Constraints in Scientific Workflows

SwinDeW-G Grid Workflow Management System Additional Information

Research areas in CITR Workflow Technology Program Data Mining Techniques in Workflow area Optimization Algorithms in Workflow area

Where Are We

33

Research Areas in WT http://www.swin.edu.au/ict/research/citr/wt/research.php Peer-to-peer based, service oriented and grid workflows

SwinDeW-A: SwinDeW with agent enhanced negotiation SwinDeW-B: SwinDeW incorporating BPLE4WS (past) SwinDeW-G: peer-to-peer based service grid workflow system SwinDeW-S: SwinDeW incorporating Web services (past) SwinDeW-V: temporal constraint verification in grid workflows SwinDeW: peer-to-peer based decentralised workflow system (past)

Service-oriented computing SwinGrid - a Swinburne Service Grid Platform which connects Swinburne

CITR nodes and Swinburne Supercomputer with external nodes nationally and internationally, forming a Grid computing environment.

34

Recent Publications in WT http://www.ict.swin.edu.au/personal/yyang/Publications.html X. Liu, J. Chen and Y. Yang, A Probabilistic Strategy for Setting Temporal Constraints in Scientific

Workflows, Proc. 6th International Conference on Business Process Management (BPM2008), Sept. 2008 Milan, Italy.

K. Ren, X. Liu, J. Chen, N. Xiao, J. Song, W. Zhang, A QSQL-based efficient Planning Algorithm for fully-automated Service Composition in Dynamic Service Environments, Proc. of IEEE International Conference on Services Computing (SCC2008), Honolulu, Hawaii, USA, July 2008.

J. Chen and Y. Yang, A Taxonomy of Grid Workflow Verification and Validation. Concurrency and Computation: Practice and Experience, Wiley, 20(4):347-360, 2008.

J. Chen and Y. Yang, Adaptive Selection of Necessary and Sufficient Checkpoints for Dynamic Verification of Temporal Constraints in Grid Workflow Systems. ACM Transactions on Autonomous and Adaptive Systems, 2(2):Article6, June 2007.

Q. He, J. Yan, R. Kowalczyk, H. Jin, Y. Yang, Lifetime Service Level Agreement Management with Autonomous Agents for Services Provision. Information Sciences, Elsevier, to appear.

K. Liu, J. Chen, Y. Yang and H. Jin, A Throughput Maximisation Strategy for Scheduling Transaction Intensive Workflows on SwinDeW-G. Concurrency and Computation: Practice and Experience, Wiley, to appear.

J. Yan, Y. Yang and G. K. Raikundalia. SwinDeW - A Peer-to-peer based Decentralized Workflow Management System. IEEE Transactions on Systems, Man and Cybernetics, Part A, 36(5):922-935, 2006.

35

Grid/Scientific Workflows Temporal QOS Framework

Setting Temporal Constraints in Scientific Workflows

SwinDeW-G Grid Workflow Management System Additional Information

Research areas in CITR Workflow Technology Program Data Mining Techniques in Workflow area Optimization Algorithms in Workflow area

Where Are We

36

Data Mining Techniques in Workflow area

Process Mining Overview

1) basic performance metrics

2) process modelStart

Register order

Prepareshipment

Ship goods

(Re)send bill

Receive paymentContact

customer

Archive order

End

3) organizational model 4) social network

5) performance characteristics

If …then …

6) auditing/security

From www.processmining.org

37

Process Mining

Registerorder

Prepareshipment

Shipgoods

Receivepayment

(Re)sendbill

Contactcustomer

Archiveorder

Materialis released

TO itemconfirmed

withoutdifferences

Warehouse/Stores

Transferorderitem

is confirmed

Paymentmust

be effected

PurchaseRequisition

Requirementfor materialhas arisen

Requisitionreleased

for schedulingagreement

schedule/SA release

InvoiceVerification

Purchaserequisitionreleased

for purchaseorder

Inbounddeliveryentered

Goodsreceived

Goodsreceiptposted

GoodsReceipt

Purchaseorder

created

Purchasing

Invoicereceived

Decide To Buy Computer

Choose Model

Save Money

Read Test Reviews

Check Bank Account

[reviews ok]

[bad reviews]

[enough]

Order Machine

Order Screen

Receive Machine

[desktop]

Receive Screen

Set Up And Connect

Plug In And Power On

[laptop]

Open Lid

Choose Operating System

Order Windows

Receive Windows

[windows]

Download Linux

[linux]

Install Operating System

Work Hard

[not enough]

[laptop]

[desktop]

From www.processmining.org

1. Process Discovery2. Conformance testing3. Log based verification

38

ProM Framework

From www.processmining.org

39

Other Workflow Mining Topics Successful Termination Prediction.

To choose an activity from a given set of potential activities which is the choice performed in the past that had more frequently led to a desired final configuration.

Identification of Critical Activities. To discover those activities that can be considered critical in the sense that they

are scheduled by the system in every successful execution. Failure/Success Characterization.

By analysing the past experience, a workflow administrator may be interested in knowing which discriminate factors characterize the failure or the success in the executions.

Workflow Optimization. The information collected into the logs of the system can be profitably used to

reason on the “optimality” of workflow executions. Workflow Performance Related Analysis and Prediction

Time series mining used in the prediction of activity durations, setting temporal constraints and dynamic temporal verification

40

References on Workflow Mining G. Greco, A. Guzzo, G. Manco and D. Sacca, Mining and Reasoning on

Workflows, IEEE Trans. on Knowledge and Data Engineering, Vol. 17, No. 4, pp.519-534, APRIL 2005.

W.M.P. van der Aalst, B.F. van Dongen, J. Herbst, L. Maruster, G. Schimm, and A.J.M.M. Weijters, Workflow Mining: A Survey of Issues and Approaches. Data and Knowledge Engineering, Vol. 47, No. 2, pp.237-267, 2003.

A.K.A. de Medeiros, W.M.P. van der Aalst, and A.J.M.M. Weijters, Workflow Mining: Current Status and Future Directions, CoopIS 2003, volume 2888 of Lecture Notes in Computer Science, pages 389-406. Springer-Verlag, Berlin, 2003.

W.M.P. van der Aalst, H.T. de Beer, and B.F. van Dongen, Process Mining and Verification of Properties: An Approach based on Temporal Logic, CoopIS 2005, volume 3760 of Lecture Notes in Computer Science, pages 130-147. Springer-Verlag, Berlin, 2005.

41

Grid/Scientific Workflows Temporal QOS Framework

Setting Temporal Constraints in Scientific Workflows

SwinDeW-G Grid Workflow Management System Additional Information

Research areas in CITR Workflow Technology Program Data Mining Techniques in Workflow area Optimization Algorithms in Workflow area

Where Are We

42

Grid Resource Management System

ResourceBroker

Grid Resource Manager

Grid Resource Manager

Grid Resource Manager

Information Services

MonitoringServices

SecurityServices

Core Grid Infrastructure Services

Grid Middlewar

e

PBS LSF …

Resource Resource Resource

Local Resource

Management

Higher-Level Services

User/Application

From http://www.coregrid.net

43

Grid Workflow Scheduling

Scheduler

Schedule

tim

e

Job-Queue

Machine 1

Scheduler

Schedule

tim

e

Job-Queue

Machine 2

Scheduler

Schedule

tim

e

Job-Queue

Machine 3

Grid-SchedulerGrid User

From http://www.coregrid.net

44

A taxonomy of Grid workflow scheduling algorithms

45

GA based Scheduling

Fundamentals for GA based Scheduling1. Encoding/Decoding2. Genetic Operators: Crossover, Mutationand Selection.3. Fitness Evaluation Function

46

Others Simulated Annealing Ant Colony

Workflow Rescheduling When any QOS constraints are violated, how to handle those

violations by rescheduling current task list to compensate, e.g. time or budget deficits.

47

Summary Grid/Scientific Workflows Temporal Verification and Temporal Adjustment to

Support Temporal QOS Framework Workflow Mining (More than process mining ) Optimization Algorithms for Workflow Scheduling and

Rescheduling

48

Useful Links www.swinflow.org

Our work on temporal verification in scientific/grid workflows

http://is.tm.tue.nl/staff/wvdaalst/ Home page of Pro. Wil van der Aalst, Workflow Research

http://www.buyya.com/ Home page of Dr. Rajkumar Buyya, Grid Research

http://www.cs.ucr.edu/~eamonn/ Home page of Eamonn Keogh, Time Series Mining http://www.cs.ucr.edu/~eamonn/time_series_data/ , UCR

Time Series Database

49

The End Any questions or comments?