Graph Based Methods For The Representation And Analysis Of.

58
Graph-Based Methods for the Representation and Analysis of Business Workflows Amitava Bagchi Indian Institute of Management Calcutta

Transcript of Graph Based Methods For The Representation And Analysis Of.

Page 1: Graph Based Methods For The Representation And Analysis Of.

Graph-Based Methods for theRepresentation and Analysis ofBusiness Workflows

Amitava Bagchi

Indian Institute of Management Calcutta

Page 2: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 2

References Mukherjee Arindam, Sen Anup K and Bagchi Amitava

(2004), Information analysis in workflows represented as task-precedence metagraphs, Proc WITS-2004, Workshop on Information Technology and Systems, Seattle, WA, USA, pp 32-37

Mukherjee Arindam, Sen Anup K and Bagchi Amitava (2005), Representation, Analysis and Verification of Business Processes: A Metagraph-Based Approach, Working Paper WPS-552, Indian Institute of Management Calcutta (http://www.iimcal.ac.in)

Page 3: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 3

Outline

Business Process & Workflow Metagraphs & Information Elements Task Precedence Metagraphs (TPMGs) Information Analysis: Graphical Algorithm Functional & Organizational Perspectives Workflow Verification

Page 4: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 4

Objectives To describe an AND/OR graph representation

scheme for business workflows

To present a graph traversal algorithm for the

analysis of information flow in such workflows

To extend the above method to task and

resource analyses

To outline how the structural correctness of

workflows can be verified

Page 5: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 5

Business Process & Workflow A business process consists of a set of related

tasks in one or more functional areas (such as

finance or marketing), which, when performed in

any one of several permissible orders, enables an

organization to achieve a business goal.

Ex: A loan appraisal system used in a bank

Page 6: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 6

Loan Appraisal: Business Process Example

Page 7: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 7

Legend (see fig p 6) PD applicant’s property data CD data on comparable properties AC applicant’s account data APD loan application data AV appraised value of property CR applicant’s credit rating LA loan amount RLA revised loan amount LR risk level of loan AR, MR, BR the loan risk level is acceptable, marginally bad,

bad BP current portfolio of bank’s loans RE bank’s current loan exposure YES the application is approved NO the application is rejected

Page 8: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 8

Business Process & Workflow

Workflow (or Workflow Instance): A specific

instance of flow of control in a business process;

it is a sub-graph of the given process graph In practice, the terms business process and

workflow are often used interchangeably.

Page 9: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 9

WorkflowInstance 1

Page 10: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 10

WorkflowInstance 2

Page 11: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 11

WorkflowInstance 3

Page 12: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 12

Business Process Modeling:Existing Approaches

Petri Nets & Related Formalisms Petri Nets (van der Aalst & van Hee 2002) Workflow Management Coalition (WfMC)

Guidelines (http://www.wfmc.org)

Metagraph-Based Formalisms Metagraphs (Basu & Blanning 2000, 1999, 1994)

Page 13: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 13

Petri Nets & Related Formalisms Main focus is on the precedence relationships

between tasks

Flow of information plays a subsidiary role

Commercial products such as IBM’s MQSeries

have adopted this convention

Widely used, and quite suitable for engineering

applications

Page 14: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 14

Metagraph-Based Formalisms A metagraph is a directed (hyper-)graph. It can be

viewed as a special type of AND/OR graph.

A metagraph is typically small in size (at most a few

hundred nodes) and is explicitly available, i.e., the

entire graph is supplied as input to a search

algorithm. So the expansion of a node just means

moving to its immediate successors.

In (implicit) game trees, new nodes actually get

added to the graph as they get created.

Page 15: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 15

Metagraph-Based Formalisms Each node in a metagraph contains one or more

information elements (items).

In Information Analysis, an input set of items is

supplied at start, specifying the business

information initially available.

Another set of items, called the output set, contains

the target set of items that are desired as output.

Page 16: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 16

Metagraph-Based Formalisms Each arc represents a task that converts one set

of items to another set of items.

The objective is to start from the input set of items,

perform the tasks in the given order of precedence

and derive all the items in the output set.

Page 17: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 17

Metagraph-Based Formalisms The metagraph convention puts more emphasis

on the flow of information, so has an advantage

over Petri Nets for business applications.

However, it is not widely used in practice because

it suffers from certain shortcomings.

Page 18: Graph Based Methods For The Representation And Analysis Of.

Metagraph for Loan Evaluation ProcessAccount

Data (AC)

ApplicantData (APD)

CreditRating(CR)

PropertyData (PD)

ComparablesData (CD)

Appr.Value(AV)

LoanAmt. (LA)

Bank’sPortfolio

(BP)

LoanRisk (LR)

Marg. BadRisk (MR)

LoanApproved

(YES)

BadRisk (BR)

LoanRejection

(NO)

AcceptRisk (AR)

RiskExposure

(RE)

e 2

Calcu

late A

pprais

ed

Value o

f Pro

perty

e5Calculate a new Loan

Amount

e6Calculate

Bank’s Risk

Exposure

e8

Marginal Risk

Assessment

e10

Bad Risk Assessment

e7

Acce pta ble R

i skA

sse ssme nt

e9Approvethe Loan

e11

Reject the

Loan

Applicaiton

e1Calculate

CreditRating

e4Calculate

LoanRisk

Page 19: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 19

Metagraphs The existing metagraph model for workflows has

three main shortcomings: Flow of control is not displayed with clarity and the

diagram appears cluttered The analysis makes use of symbolic matrices

which are not easy to manipulate A clear distinction is not always drawn between

OR joins & AND joins (or even between OR splits & AND splits)

Page 20: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 20

Task Precedence Metagraphs (TPMGs) A TPMG is a modified form of metagraph.

It is visually more appealing and is more like an

AND/OR graph in appearance.

It is less cluttered so the flow of control is

discerned more easily.

A TPMG is more general than a WfMC graph in

that AND & OR splits and joins are not always

required to be matched in pairs.

Page 21: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 21

Terminology

Tasks & Propagation Edges

Init Nodes & Prop Nodes

OR Nodes& AND Nodes

Split Nodes & Join Nodes

Page 22: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 22

Task Precedence Metagraphs (TPMGs) Edges are of two types:

Tasks: shown as bold arrows; a task converts the set of items at its start to another set of items, which cannot be obtained from any other task

Propagation Edges: shown as lightly drawn arrows; a propagation edge conveys an item from the outgoing end of a task to the incoming end of another task.

Page 23: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 23

Task Precedence Metagraphs (TPMGs) Nodes are also of two types

Init Nodes An init node has a single outgoing edge

corresponding to a task Is shown as a bold oval

Prop Nodes A prop node can have multiple outgoing edges, all of

which are propagation edges Is shown as a lightly drawn oval

Page 24: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 24

Task Precedence Metagraphs (TPMGs) Init and Prop Nodes

On every directed path, init nodes alternate

with prop nodes, i.e., a TPMG is a directed

bipartite graph, just like a Petri net.

Page 25: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 25

Task Precedence Metagraphs (TPMGs) Nodes are of two types, OR and AND. An OR node (identified with a + sign) shows

alternate paths for flow of control. An AND node (identified with a • sign) indicates

that flow of control takes place along all the edges at the same time.

An OR (or AND) node is either a split node or a join node.

Page 26: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 26

Task Precedence Metagraphs (TPMGs)

Split & Join Nodes

A split node is a node at which multiple paths

begin. It is always a prop node.

A join node is a node at which multiple paths

end. It is always an init node.

Page 27: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 27

Task Precedence Metagraphs (TPMGs)

However, a TPMG differs from a Petri Net in that

every node has an associated subset of labeled

items. This underscores the role of business

information in a business process.

Page 28: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 28

Information Analysis Given a workflow, we seek answers to questions

of the following type:

Suppose a set A of items is supplied. Starting

from A, can we produce all the items in another

given set B?

Is item ‘a’ essential for producing item ‘b’?

These can be formulated as graph search

problems.

Page 29: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 29

Information Analysis But a standard AND/OR graph search algorithm

such as AO* (Nilsson 1980) is not appropriate

for our purpose because a TPMG differs from an

AND/OR graph in some ways:

TPMG: Multiple start nodes

AND/OR Graph: One start node

Page 30: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 30

Information Analysis TPMG: Both AND joins and OR

joins

AND/OR Graph: Only OR joins

TPMG: Can have directed cycles AND/OR Graph: AO* assumes it is cycle-free

Note that a project scheduling network has only AND splits/joins and no OR splits/joins

Page 31: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 31

Algorithm InfAnalysis Algorithm InfAnalysis is an iterative graph search

algorithm Given:

An explicit TPMG An input set of items An output (target) set of items

Determines whether all the items in the output set can be derived.

Page 32: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 32

Algorithm InfAnalysis Algorithm InfAnalysis has some similarities with

A* and AO* and makes use of an edge-marking

method.

We think of the given TPMG as representing a

business process, and the marked solution sub-

graph produced by InfAnalysis as a workflow

instance.

Page 33: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 33

Algorithm InfAnalysisMakes use of four lists:

ITEMSET: initially contains the input set of items;

new items get added as nodes get expanded

TARGET: contains the items desired as output

FRONTIER: only holds init nodes; initially holds

those that have all their items in ITEMSET

STACK: needed for processing OR nodes;

remembers which OR alternative should be

processed next

Page 34: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 34

Algorithm InfAnalysis An active node is an init node in FRONTIER with

all its items in ITEMSET.

At each iteration, InfAnalysis looks for an active

node in FRONTIER, processes the correspond-

ing task, and updates ITEMSET & FRONTIER.

Page 35: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 35

Algorithm InfAnalysis If all items in TARGET belong to ITEMSET then a

solution has been found (success).

If there is no active node in FRONTIER then the

next OR alternative in STACK must be pursued.

If STACK is also empty then failure.

Page 36: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 36

Algorithm InfAnalysis Thus the algorithm traverses the given TPMG

exhaustively, looking for a workflow instance that

generates, for the given input set, a set of items

that contains the given output set.

When traversing a workflow instance, the edges

in the instance get marked (say by colouring red).

Page 37: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 37

Algorithm InfAnalysis When the next workflow instance is examined, the

marking at the corresponding OR split node is

changed.

The advantage of marking is that each instance

need not be traversed from scratch; the work done

earlier can be remembered and partly reused.

The algorithm assumes that the TPMG is

structurally valid.

Page 38: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 38

Algorithm InfAnalysis

Example: For the loan appraisal process, we want

to know whether, given the set of items S = { LA,

PD, CD, AC, APD, BP } as input, we can produce

the item YES as output.

A graph search algorithm is appropriate for such

problems. To keep the algorithm simple, we do not

indicate the edge markings.

Page 39: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 39

Algorithm InfAnalysisinitialize ITEMSET, FRONTIER, STACK;do while (TARGET is not a subset of ITEMSET) { if (there is an active node n in FRONTIER) then { remove n from FRONTIER; expand n, entering its init successors in FRONTIER,

OR split nodes in ORLIST, and new items in ITEMSET; } // else examine next workflow instance else if (STACK is not empty) then {

take next init successor p of OR node m on top of STACK; enter p in FRONTIER adding its items to ITEMSET;

if (m has no other successors) then pop m; } else { announce “failure”; exit; }

} // no remaining workflow instancesannounce “success”; exit;

Page 40: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 40

Algorithm InfAnalysis: Observations Works correctly on the example shown earlier

(TPMG for loan appraisal)

But for more complex TPMGs containing OR

split nodes that are not descendants of each

other, the STACK must be replaced by a more

flexible data structure

Page 41: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 41

Functional Perspective Queries that relate to the execution of tasks

rather than to the flow of information:

Which other tasks must be completed before a

given task t can start?

If a task t cannot be executed, which other tasks

become inoperable?

Page 42: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 42

Functional Perspective Algorithm InfAnalysis can be modified in a simple

way to answer such queries.

For example, to find the tasks that must be

completed before task t can start, consider the

set S of items contained in the init node that

immediately precedes t. Run InfAnalysis with the

given inputs and with S as the target set; the

required set of tasks are those in the marked

sub-graph.

Page 43: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 43

Organizational Perspective Queries that relate to resources (i.e., the

executors of tasks, whether human agents or

machines):

If a resource r is unavailable, some tasks will not

get performed. As a result, some other resources

might become idle. Which are the resources that

will become idle?

Page 44: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 44

Organizational Perspective Again, Algorithm InfAnalysis can be modified in a

simple way to answer such queries.

For example, to determine the other resources

that become idle when resource r is unavailable,

first find the set T of tasks that r executes. We

can determine which other tasks get held up

because the tasks in T cannot be executed. This

will tell us whether any resources have become

completely idle.

Page 45: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 45

Temporal Constraints The control structure of a workflow imposes

temporal constraints on tasks. If a task precedes

another task, it must be performed earlier.

If temporal information, such as the duration of

tasks, is supplied, then issues arise similar to

those in project scheduling.

However, the presence of directed cycles in

workflows causes additional complications.

Page 46: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 46

Structural Verification

A valid workflow always serves a business goal.

Given a business process W supplied in the form

of a TPMG, how do we tell whether W is valid?

To ensure the validity of W, some structural (i.e.,

syntactic) constraints must be imposed on W.

We now give examples of such structural

constraints.

Page 47: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 47

Structural Problem: Deadlock Deadlock: Caused when an OR

split node is nested with an AND join node.

In the figure, only one of the two outgoing edges at the OR split node 2 can be marked at any time. So execution cannot proceed beyond the AND join node 7.

A valid workflow must not have any deadlocks.

Page 48: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 48

Structural Problem: Lack of Synchronization

Lack of Synchronization: Caused when an AND split node is nested with an OR join node.

In the figure, since both the outgoing edges at the AND split node 2 will get marked, the task (7,8) will be executed twice.

A valid workflow must not suffer from lack of synchronization.

Page 49: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 49

Structural Problem: Non-Terminating Cycle Non-Terminating Cycle: Caused when control cannot

exit from a directed cycle.

This problem can be avoided when every directed

cycle is well-formed, i.e., it has an OR join node lying

on it through which control can enter, and an OR split

node lying on it through which control can exit (see

loan appraisal example).

Page 50: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 50

Other Structural Errors Examples of other structural errors that must be

eliminated:

Dangling Nodes: It should be ensured that if a

node in a TPMG contains items that are not

target items, then the node has a successor

task.

Page 51: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 51

Structural Verification The structural verification algorithm TPMG_SYN

traverses the workflow instances in the given TPMG

one by one looking for structural problems.

As soon as it locates a problem it terminates with an

appropriate error message.

If TPMG_SYN does not find a problem, the given

TPMG models a valid business process.

Page 52: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 52

Structural Verification TPMG_SYN has many similarities with Algorithm

InfAnalysis.

TPMG_SYN assumes for convenience that there is

one start node and one goal node. If this does not

hold for the given TPMG, an AND split node can be

added at the top and an OR join node at the bottom.

Page 53: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 53

Algorithm TPMG_SYN1 initialize ITEMSET, FRONTIER, STACK; finish = false;2 do while (finish == false) {3 if (there is a non-goal active node n in FRONTIER) then {4 remove n from FRONTIER;5 expand n, entering its init successors in FRONTIER,6 OR split nodes in ORLIST, and new items in ITEMSET;7 }8 else if (there is a goal node in FRONTIER) then9 announce “one workflow instance scanned”; 10 else if (STACK is not empty) then {11 take next init successor p of OR node m on top of

STACK;12 enter p in FRONTIER adding its items to ITEMSET;13 if (m has no other successors) then pop m;14 }15 else finish = true; 16 }17 announce “the given TPMG is valid”; exit;

Page 54: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 54

TPMG_SYN: Detection of Errors Illegal cycles and lack of synchronization can both

be detected at line 5 when node n is expanded.

We can just check whether as a result of the

expansion an OR join node has two marked

incoming edges. This would be illegal in general,

but in some situations it can indicate the presence

of a legal directed cycle.

Page 55: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 55

TPMG_SYN: Detection of Errors Deadlock can be detected at line 8 when a goal

node is not found but inactive nodes are present in

FRONTIER.

Dangling nodes can also be detected at line 5

when node n is expanded.

Page 56: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 56

Workflow Verification Note that some semantic constraints are imposed

by the meanings of the items contained in the

TPMG nodes.

TPMG_SYN when appropriately modified can

perform certain types of semantic verification of

TPMGs.

Page 57: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 57

Workflow Verification A similar verification procedure can be devised for

workflows drawn using Petri Nets or any other

WfMC convention.

Page 58: Graph Based Methods For The Representation And Analysis Of.

ISI_Dec_05 58

Thank You!