Resilience in Transactional Networks

22
05-01-2013 3rd BCGL Conference 1/22 Resilience in Transaction-Oriented Networks Dmitry Zinoviev*, Hamid Benbrahim, Greta Meszoely + , Dan Stefanescu* *Mathematics and Computer Science Department + Sawyer School of Management Suffolk University, Boston

description

 

Transcript of Resilience in Transactional Networks

Page 1: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 1/22

Resilience in Transaction-Oriented Networks

Dmitry Zinoviev*, Hamid Benbrahim,Greta Meszoely+, Dan Stefanescu*

*Mathematics and Computer Science Department+Sawyer School of Management

Suffolk University, Boston

Page 2: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 2/22

Outline

Transaction­oriented networks Network model and its interpretations Simulation results:

Dense and sparse networks

Throughput amplification

Equivalence of excessive traffic and faulty nodes

Network as a four­phase matter

Conclusion and future work

Page 3: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 3/22

Transaction-Oriented Networks

Used to execute distributed transactions (compound operations that succeed or fail atomically)

Interpretations: Distributed database transactions (original, HPC­related interpretation)

Financial transactions (e.g., loans)

Transportation (e.g., multi­leg flights)

How resilient are these networks to externally and internally induced failures?

Page 4: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 4/22

Transactions and Network

Network

Incoming transactions

Committed transactions

Aborted transactions

Page 5: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 5/22

Network Model Overview

Random Erdös–Rényi network, N=1,600 identical nodes representing network hosts, density d.

Each node can simultaneously execute up to C almost independent subtransactions. Each subtransaction takes constant time 

0 to complete. The 

network is simulated for the duration of S0.

Each node can be used for injecting transactions into the network and for terminating transactions. Transactions are injected uniformly across the network. The delays between subsequent transactions are drawn from the exponential distribution E(1/r).

Each transaction has L=N(10,4) subtransactions.

Page 6: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 6/22

Opportunistic Routing

The node for the next subtransaction is chosen uniformly at random from all neighbors of the current node.

If the next node is disabled, then another neighbor is chosen. If all neighbors are disabled, the subtransaction is aborted, and the master 

transaction rolls back. If a transaction is aborted, all other transactions that crossed path with it in the past 

T time units (T=100), are also aborted with probability p

0=.01.

We observed very little dependence of the simulated network measures on p0.

Page 7: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 7/22

Node Shutdown

When a node is overloaded (load > C), it shuts down. A node may fail randomly after an initial delay drawn from the exponential 

distribution E(Tf).

Once disabled, a node is not restarted. All subtransactions currently executed at a disabled node are aborted.

Page 8: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 8/22

Simulation Framework

Custom­built network simulator in C++ In each experiment, the network has been simulated for a variety of combinations 

of node capacities and densities (C, d): d  {0.01, 0.011, 0.015, 0.025, 0.04, 0.055, 0.075, 0.1, 0.2, 0.3, 0.5, 0.6, 

0.75, 0.85, 0.99}

C  {2, 3, 4, ... 22}

Red color indicates sparse networks (they behave diferently from the dense networks)

Page 9: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 9/22

Failing by Overloading

Start with a fully functional network. Gradually increase the injection rate from 

0 to r0 until at least 10­6 of all 

transactions abort (superconductive mode ⇒ resistive mode).

The fraction of aborted transactions monotonically increases, until at some rate r

1  the network chokes (resistive 

mode ⇒ dielectric mode). 

Define 0 = r

0 / r

1.

r0 and r

1 slightly depend on the simulation 

running time. Our results have been obtained for S=84,600

0 (“one day”).

Page 10: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 10/22

Phase Transition Injection Rates

r1, smaller d

r0, smaller d

dense

Page 11: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 11/22

Quadratic Amplification

Both r0(C) and r

1(C) can be approximated by a power function:

The exponents i for the dense networks are ~1.7 and ~2.1, respectively. Both 

i's 

tend to 1 as d tends to 0.

The mantissas Ai for the dense networks are ~0.7 and ~2.8, respectively. Both A

increase and possibly diverge as d tends to 0.

Doubling node capacity almost quadruples the throughput.

r0,1C ≈A0,1C−20,1

Page 12: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 12/22

Failing by Internal Faults

Start with a fully functional network. Gradually increase the injection rate 

from 0 to r0.

At the fixed injection rate, fail random nodes after random delays. 

Let m0 be the smallest fraction of 

failed nodes that causes the network to choke.

Page 13: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 13/22

Phase Transition Fault Rate

smaller d

dense

Page 14: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 14/22

Faulty Nodes Effect

Estimation of m0:

For the dense networks, A tends to [0...0.23] That is, it takes no more 23% of internally faulty nodes to choke a dense network 

with infinite buffer space in the presence of the highest superconductive injection rate.

m0C ≈A−1 erf log C−2/− A1

2

Page 15: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 15/22

Failing by Overloading and Internal Faults

Start with a fully functional network. Gradually increase the injection rate 

from 0 to r () and simultaneously fail random nodes after random delays, until the network chokes.

Page 16: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 16/22

Phase Space Summary (C=4, d=.2)

dielectric

resistive

superconductive

Page 17: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 17/22

Equivalence of Excessive Traffic and Node Failures

dense

Page 18: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 18/22

Equivalence of Excessive Traffic and Node Failures

To a first approximation, the relationship between the network resilience parameters 

0 and m

0 is almost linear, with the slope of ­1

Tolerating additional superconductive traffic 0

is equivalent to disabling extra 

network nodes m0 due to internal faults:

≈−m0

Page 19: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 19/22

A Closer Look at the Resistive Phase

r1

r0

???

Page 20: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 20/22

What Happens around the “knee”?

The “knee” is visible only in sparse networks Network state at the end of the simulation run: red circles correspond to faulty 

nodes, cyan circles—to healthy nodes

Page 21: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 21/22

How Many Nodes Are in the GC?

Percentage of faulty (red) and healthy (blue) nodes in the respective giant component for various r's

The phase transition happens when all faulty nodes join the giant component

Two “resistive” phases: “resistive­A” (truly “resistive”) and “resistive­B” (“resistive­dielectric”)

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

0%

20%

40%

60%

80%

100%

r

All faulty nodes join 

the giant component!

Page 22: Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 22/22

Conclusion

Random transactional networks can stay in four phases of interest: “superconductive” (no transactions fail), “resistive­A and ­B'' (some transactions fail), and “dielectric” (all transactions fail)

Injection rates associated with the phase transitions, scale almost quadratically with respect to the node capacity

At the resistive­to­dielectric phase transition, the effects of excessive network load and internal, spontaneous, and irreparable node faults are equivalent and almost perfectly anticorrelated

The phase transition between two “resistive” phases can be attributed to the evolution of the giant component of faulty nodes