Post on 17-Jan-2016
description
Dept. of Computer Science & Engineering, CUHK
Fault Tolerance and Performance Analysis in Wireless CORBA
Chen Xinyu
2002-12-09
Supervisor:
Markers:
Prof. Michael R. Lyu
Prof. Jerome Yen
Prof. John C.S. Lui
Outline
Motivation
Wireless CORBA
Fault Tolerant Wireless CORBA
Performance and Availability Analysis
Conclusions and Future Work
Motivation
Mobile Computing Permanent failures
Physical damage
Transient failures Mobile host Wireless link Environmental conditions
Fault Tolerant CORBA Entity replication
Visited Domain
Home Domain
Terminal Domain
Wireless CORBA Architecture
Access Bridge
Access Bridge
Access Bridge
Access Bridge
Static Host
Static Host
Terminal Bridge
GIOP
Tunnel
ab1
ab2
mh1
GTP Messages
Visited Domain
ab1
ab2
Wireless CORBA Architecture
Access Bridge
Access Bridge
Static Host
Static Host
Home Domain
Home Location
Agent
Terminal Domain Terminal
Bridge
GIOP
Tunnelmh1
mh1
Terminal Domain Terminal
Bridge
GIOP
Tunnel
GIOP
Tunnel
mh1
Terminal Domain Terminal
BridgeGIOP Tunnel
mh1
Terminal Domain Terminal
Bridge
Access Bridge
Access Bridge
Outline
Motivation
Wireless CORBA
Fault Tolerant Wireless CORBA
Performance and Availability Analysis
Conclusions and Future Work
Basic Concepts
Checkpoint the saved program’s states during failure-free
execution
Repair brings the failed device back to normal operation
Rollback reloads the program’s states saved at the most
recent checkpoint
Recovery the reprocessing of the program, starting from the
most recent checkpoint, applying the logged messages and until the point just before the failure
Device, Wireless & Mobile Issues
Device Issues Slow processor Small memory Small disk space Low power supply Physical damage
Applying mobile host as stable storage
a large number of system messages or a large size of information carried in one message
Checkpoints and Logs
collection
Wireless Issues High bit error rate Little bandwidth Long transfer delay
Mobile Issue Handoff
Applying Access Bridge as stable storage
Uncoordinated checkpointing Pessimistic message logging
Fault Tolerance Architecture
Client Object
Terminal Bridge
Recovery Mechanism
ORB
Platform
Mobile Host
Recovery Mechanism
Logging Mechanism
Platform
Access Bridge
Mobile Side
Fixed Side
Mobile Support Station
ORB
Recovery Mechanism
Logging Mechanism
ORB
Platform
Static Server
GIOP Tunnel
Multicast Messages
Object Replica
Mobile Host Handoff
Access Bridge 1
Access Bridge 2
Access Bridge 3
Home
Location
Agent
HandoffLocation Update
Home
Location
Agent
Mobile Host Handoff
Access Bridge 1
Access Bridge 2
Access Bridge 3
HandoffLocation Update
Home
Location
Agent
Mobile Host Crash
Access Bridge 1
Access Bridge 2
Access Bridge 3
Home
Location
Agent
Mobile Host Recovery
Access Bridge 1
Access Bridge 2
Access Bridge 3Collect last checkpoint
and succeeded message logs
Sorted by Ack. SN
Reconnect
Messages Replay
Outline
Motivation
Wireless CORBA
Fault Tolerant Wireless CORBA
Performance and Availability Analysis
Conclusions and Future Work
Assumptions
Failure occurrence, message arrival and handoff event
homogeneous Poisson process with parameter , and respectively
Failures do not occur when the program is in the repair or rollback process
A failure is detected as soon as it occurs
Execution without Checkpointing
RY0
X0
R
F1
H1Z0
0 t
Fj
Hk
mj(1) mj(N)m1(n1)m0(N)
X(N)
Repair Handoff
H H
Conditional Execution Time & LST
LST and Expectation of Program Execution Time
Ci
Execution with Equi-number Checkpointing
R+CYi(0)
Xi(0)
R+C
Fi(1)
Hi(1)Z i(0)
0 t
Fi(j)
Hi(k)
mij(1) mij(a)mi1(ni1)mi0(a)
Xi(N,a)
Repair + Rollback Handoff
Ci-1
Checkpointing
H H CC
Conditional Execution Time & LST
LST and Expectation of Program Execution Time
Average Availability
uptime interval: a program produces useful work towards its completion
downtime interval: Repair and rollback Handoff Checkpoint creation Wasted Computation
average availability: how much of the time an MH is in uptime interval during an execution
Optimal Checkpointing Interval
Beneficial Condition
Equi-number Checkpointing
Equi-number checkpointing with respect to message number Message number in each checkpointing interval is
not changed
Equi-number checkpointing with respect to checkpoint number Checkpoint number is not changed
Equi-number Checkpointing with respect to Checkpoint Number
Equi-number Checkpointing with respect to Message Number
Comparison Between Checkpointing and Without Checkpointing
Average Availability vs. Message Arrival Rate and Handoff Rate
Conclusions
Fault tolerant wireless CORBA Equi-number checkpoiting strategy LST and expectation of program
execution time Average availability Optimal checkpointing interval Beneficial condition
Future Work
Analysis model The message queuing effect during repair and
recovery
Failure detector Distributed consensus with link failures, process
failures, and mobile disconnections Leads to a faster solution Reduces communication costs
Fault tolerance in Ad Hoc network Without infrastructure support Self-organizing and adaptive
Thank You