Resilience in Transactional Networks

05-01-2013 3rd BCGL Conference 1/22

Resilience in Transaction-Oriented Networks

Dmitry Zinoviev*, Hamid Benbrahim,Greta Meszoely+, Dan Stefanescu*

*Mathematics and Computer Science Department+Sawyer School of Management

Suffolk University, Boston


Outline

Transactionoriented networks Network model and its interpretations Simulation results:

Dense and sparse networks

Throughput amplification

Equivalence of excessive traffic and faulty nodes

Network as a fourphase matter

Conclusion and future work


Transaction-Oriented Networks

Used to execute distributed transactions (compound operations that succeed or fail atomically)

Interpretations: Distributed database transactions (original, HPCrelated interpretation)

Financial transactions (e.g., loans)

Transportation (e.g., multileg flights)

How resilient are these networks to externally and internally induced failures?


Transactions and Network

Network

Incoming transactions

Committed transactions

Aborted transactions


Network Model Overview

Random Erdös–Rényi network, N=1,600 identical nodes representing network hosts, density d.

Each node can simultaneously execute up to C almost independent subtransactions. Each subtransaction takes constant time

0 to complete. The

network is simulated for the duration of S0.

Each node can be used for injecting transactions into the network and for terminating transactions. Transactions are injected uniformly across the network. The delays between subsequent transactions are drawn from the exponential distribution E(1/r).

Each transaction has L=N(10,4) subtransactions.


Opportunistic Routing

The node for the next subtransaction is chosen uniformly at random from all neighbors of the current node.

If the next node is disabled, then another neighbor is chosen. If all neighbors are disabled, the subtransaction is aborted, and the master

transaction rolls back. If a transaction is aborted, all other transactions that crossed path with it in the past

T time units (T=100), are also aborted with probability p

0=.01.

We observed very little dependence of the simulated network measures on p0.


Node Shutdown

When a node is overloaded (load > C), it shuts down. A node may fail randomly after an initial delay drawn from the exponential

distribution E(Tf).

Once disabled, a node is not restarted. All subtransactions currently executed at a disabled node are aborted.


Simulation Framework

Custombuilt network simulator in C++ In each experiment, the network has been simulated for a variety of combinations

of node capacities and densities (C, d): d {0.01, 0.011, 0.015, 0.025, 0.04, 0.055, 0.075, 0.1, 0.2, 0.3, 0.5, 0.6,

0.75, 0.85, 0.99}

C {2, 3, 4, ... 22}

Red color indicates sparse networks (they behave diferently from the dense networks)


Failing by Overloading

Start with a fully functional network. Gradually increase the injection rate from

0 to r0 until at least 106 of all

transactions abort (superconductive mode ⇒ resistive mode).

The fraction of aborted transactions monotonically increases, until at some rate r

1 the network chokes (resistive

mode ⇒ dielectric mode).

Define 0 = r

0 / r

1.

r0 and r

1 slightly depend on the simulation

running time. Our results have been obtained for S=84,600

0 (“one day”).


Phase Transition Injection Rates

r1, smaller d

r0, smaller d

dense


Quadratic Amplification

Both r0(C) and r

1(C) can be approximated by a power function:

The exponents i for the dense networks are ~1.7 and ~2.1, respectively. Both

i's

tend to 1 as d tends to 0.

The mantissas Ai for the dense networks are ~0.7 and ~2.8, respectively. Both A

i

increase and possibly diverge as d tends to 0.

Doubling node capacity almost quadruples the throughput.

r0,1C ≈A0,1C−20,1


Failing by Internal Faults

Start with a fully functional network. Gradually increase the injection rate

from 0 to r0.

At the fixed injection rate, fail random nodes after random delays.

Let m0 be the smallest fraction of

failed nodes that causes the network to choke.


Phase Transition Fault Rate

smaller d

dense


Faulty Nodes Effect

Estimation of m0:

For the dense networks, A tends to [0...0.23] That is, it takes no more 23% of internally faulty nodes to choke a dense network

with infinite buffer space in the presence of the highest superconductive injection rate.

m0C ≈A−1 erf log C−2/− A1

2


Failing by Overloading and Internal Faults

Start with a fully functional network. Gradually increase the injection rate

from 0 to r () and simultaneously fail random nodes after random delays, until the network chokes.


Phase Space Summary (C=4, d=.2)

dielectric

resistive

superconductive


Equivalence of Excessive Traffic and Node Failures

dense


Equivalence of Excessive Traffic and Node Failures

To a first approximation, the relationship between the network resilience parameters

0 and m

0 is almost linear, with the slope of 1

Tolerating additional superconductive traffic 0

is equivalent to disabling extra

network nodes m0 due to internal faults:

≈−m0


A Closer Look at the Resistive Phase

r1

r0

???


What Happens around the “knee”?

The “knee” is visible only in sparse networks Network state at the end of the simulation run: red circles correspond to faulty

nodes, cyan circles—to healthy nodes


How Many Nodes Are in the GC?

Percentage of faulty (red) and healthy (blue) nodes in the respective giant component for various r's

The phase transition happens when all faulty nodes join the giant component

Two “resistive” phases: “resistiveA” (truly “resistive”) and “resistiveB” (“resistivedielectric”)

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5

0%

20%

40%

60%

80%

100%

r

All faulty nodes join

the giant component!


Conclusion

Random transactional networks can stay in four phases of interest: “superconductive” (no transactions fail), “resistiveA and B'' (some transactions fail), and “dielectric” (all transactions fail)

Injection rates associated with the phase transitions, scale almost quadratically with respect to the node capacity

At the resistivetodielectric phase transition, the effects of excessive network load and internal, spontaneous, and irreparable node faults are equivalent and almost perfectly anticorrelated

The phase transition between two “resistive” phases can be attributed to the evolution of the giant component of faulty nodes

Resilience in Transactional Networks

Technology

Transcript of Resilience in Transactional Networks