Reliability and Redundancy: How to Minimize Downtime and ... Workshop... · Downtime and Maximize...
Transcript of Reliability and Redundancy: How to Minimize Downtime and ... Workshop... · Downtime and Maximize...
Reliability and Redundancy: How to Minimize Downtime and Maximize Service When System Components Fail
Reliability and Redundancy: How to Minimize Downtime and Maximize Service When System Components Fail
Roy McClellan and Dave ChapmanRoy McClellan and Dave ChapmanRoy McClellan and Dave Chapman
March 11, 2010
Roy McClellan and Dave Chapman
March 11, 2010
Overview Overview
• “reliability” - equipment and network designs that are
tolerant to failures and faults
– System should continue to operate with all specified features if any single system controller and/or critical device fails
– First level of backup should not reduce the operational capabilities
• “reliability” - equipment and network designs that are
tolerant to failures and faults
– System should continue to operate with all specified features if any single system controller and/or critical device fails
– First level of backup should not reduce the operational capabilities – First level of backup should not reduce the operational capabilities of P25 digital trunked system
• This presentation will address how different P25 digital
network elements impact reliability
– RFSS Core and Sites
– IP Network
– Transport Network
– First level of backup should not reduce the operational capabilities of P25 digital trunked system
• This presentation will address how different P25 digital
network elements impact reliability
– RFSS Core and Sites
– IP Network
– Transport Network
Page 2
P25 Network OverviewP25 Network Overview
File
Servers
RFSS System
Controllers
ISSI Media
Gateways
P25 RFSS Core
IP Consoles
& Control
Other P25 RF
Sub-Systems
Gateways
P25 Single Site Cell
Trunked Site
Controller P25
Base
Stations
P25 Simulcast Cell
P25
Base
Stations
Other VendorP25 RF
Sub-Systems
& Control
Rooms
Legacy RFSub-Systems
Trunked Simulcast Site
Controller
Page 3
P25 Network Reliability – System ControllersP25 Network Reliability – System Controllers
File
Servers
RFSS System
Controllers
ISSI Media
Gateways
P25 RFSS Core
IP Consoles
& Control
Other P25 RF
Sub-Systems
Gateways
P25 Single Site Cell
Trunked Site
Controller P25
Base
Stations
P25 Simulcast Cell
P25
Base
Stations
Other VendorP25 RF
Sub-Systems
& Control
Rooms
Legacy RFSub-Systems
Trunked Simulcast Site
Controller
Page 4
P25 Network Reliability – Base StationsP25 Network Reliability – Base Stations
File
Servers
RFSS System
Controllers
ISSI Media
Gateways
P25 RFSS Core
IP Consoles
& Control
Other P25 RF
Sub-Systems
Gateways
P25 Single Site Cell
Trunked Site
Controller P25
Base
Stations
P25 Simulcast Cell
P25
Base
Stations
Other VendorP25 RF
Sub-Systems
& Control
Rooms
Legacy RFSub-Systems
Trunked Simulcast Site
Controller
Page 5
P25 Network Reliability - Site NetworkP25 Network Reliability - Site Network
Page 6
Optical Switch Optical Switch
P25 Network Reliability – Site Network RedundancyP25 Network Reliability – Site Network Redundancy
• Components are cross-connected as shown in earlier
diagram
– At sites with T1 microwave connections, if a path has multiple T1’s the T1’s are distributed among the two routers
– Where newer “IP based” microwave used, each router uses part of the bandwidth
• Components are cross-connected as shown in earlier
diagram
– At sites with T1 microwave connections, if a path has multiple T1’s the T1’s are distributed among the two routers
– Where newer “IP based” microwave used, each router uses part of the bandwidthof the bandwidth
– Devices with two Ethernet interfaces are cross-connected to two switches
• Dynamic routing protocols manage the path
– In event of disruption, redundant path around failure dynamically selected
– Normal traffic routing resumes when failed component becomes available
– Global Load Balancing Protocol (GLBP) utilized for balancing traffic to and from routers
of the bandwidth
– Devices with two Ethernet interfaces are cross-connected to two switches
• Dynamic routing protocols manage the path
– In event of disruption, redundant path around failure dynamically selected
– Normal traffic routing resumes when failed component becomes available
– Global Load Balancing Protocol (GLBP) utilized for balancing traffic to and from routersPage 7
P25 Network Reliability – Trunked NetworkP25 Network Reliability – Trunked Network
QueueActive
Conventional Trunking
ActiveAvailableChannels
Queue
Inactive
Ava
ila
ble
Ch
an
ne
ls
TrunkingControl
Page 8Trunked network is inherently more reliable
P25 Network Reliability – Sub-system IndependenceP25 Network Reliability – Sub-system Independence
File
Servers
RFSS System
Controllers
ISSI Media
Gateways
P25 RFSS Core
IP Consoles
& Control
Other P25 RF
Sub-Systems
Gateways
P25 Single Site Cell
Trunked Site
Controller P25
Base
Stations
P25 Simulcast Cell
P25
Base
Stations
Other VendorP25 RF
Sub-Systems
& Control
Rooms
Legacy RFSub-Systems
Trunked Simulcast Site
Controller
Page 9
P25 Network Reliability – Failsoft ModeP25 Network Reliability – Failsoft Mode
File
Servers
RFSS System
Controllers
ISSI Media
Gateways
P25 RFSS Core
IP Consoles
& Control
Other P25 RF
Sub-Systems
• Subscribers will go into channel hunting mode and will
operate in conventional mode if the radio fails to find a
control channel
• Communication with primary dispatch centers
supporting Failsoft interface is available
P25 Single Site Cell
Trunked Site
Controller P25
Base
Stations
P25 Simulcast Cell
P25
Base
Stations
Other VendorP25 RF
Sub-Systems
Gateways& Control
Rooms
Legacy RFSub-Systems
Trunked Simulcast Site
Controller
Page 10
P25 Network Reliability – Simulcast Reliability P25 Network Reliability – Simulcast Reliability
Trunked Site
Controller
P25 Simulcast Cell
P25 Base
StationsP25 Base
Stations
P25 Base
StationsIf not a master, there
CCH
CA
CB
Page 11
Stations Stations Stations
Satellite Master Satellite
Satellite Master Satellite
Satellite Master Satellite
If not a master, there
is a coverage
degradation on one
channel due to the
loss of failed RF
repeater
P25 Network Reliability – Simulcast Reliability P25 Network Reliability – Simulcast Reliability
Trunked Site
Controller
P25 Simulcast Cell
P25 Base
StationsP25 Base
Stations
P25 Base
StationsTrunked Site
CCH
CA
CB
Page 12
Stations Stations Stations
Satellite Master Satellite
Satellite Master Satellite
Satellite Master Satellite
Trunked Site
Controller re-locates
master RF Repeater
on second master
capable site after
specified period of
time
P25 Network Reliability – Simulcast Reliability P25 Network Reliability – Simulcast Reliability
Trunked Site
Controller
P25 Simulcast Cell
P25 Base
StationsP25 Base
Stations
P25 Base
Stations
CCH
CA
CB
Site controller re-
Page 13
Stations Stations Stations
Satellite Master Satellite
Satellite Master Satellite
Satellite Master Satellite
Site controller re-
assigns a new
control channel on a
channel having full
RF coverage
Transport Network Reliability – Network DesignTransport Network Reliability – Network Design
• Rely on link layer techniques such as SONET &
Resilient Packet Rings to protect against link failures
• Failure detection triggers routing reconvergence/
restoration
• Restoration techniques can be enhanced to bypass
the failed equipment before routing convergence
• Rely on link layer techniques such as SONET &
Resilient Packet Rings to protect against link failures
• Failure detection triggers routing reconvergence/
restoration
• Restoration techniques can be enhanced to bypass
the failed equipment before routing convergence the failed equipment before routing convergence
(protection techniques)
– MPLS
– Sub-second convergence successfully achieved in unicastand multicast IP networks
the failed equipment before routing convergence
(protection techniques)
– MPLS
– Sub-second convergence successfully achieved in unicastand multicast IP networks
Page 14
SONET Network DiagramSONET Network Diagram
Page 15Optical
SwitchIP Path SONET (Fiber or Microwave)
SONET Network DiagramSONET Network Diagram
Page 16
Automatic Loopback
Optical
SwitchIP Path SONET (Fiber or Microwave)
Transport Network Reliability - SONETTransport Network Reliability - SONET
• Ring topology - two sets of fiber strands are used, one for
sending and receiving and the other as the spare set
– If a fiber cut occurs, the switch on either side of the break re-routes the traffic in the other direction using the backup ring
– Re-route happens at physical later, and no end devices are aware of the issue or need to take corrective action
• Ring topology - two sets of fiber strands are used, one for
sending and receiving and the other as the spare set
– If a fiber cut occurs, the switch on either side of the break re-routes the traffic in the other direction using the backup ring
– Re-route happens at physical later, and no end devices are aware of the issue or need to take corrective actionaware of the issue or need to take corrective action
– Failover takes <50 msec, with no noticeable impact to voice traffic
– Even IP routes do not typically update
aware of the issue or need to take corrective action
– Failover takes <50 msec, with no noticeable impact to voice traffic
– Even IP routes do not typically update
Transport Network Reliability - MPLS Altera Corporation
Page 18
1a. Existing routing protocols establish the reachability of the destination networks
1b. Label distribution protocol (LDP) establishes label-to-destination network
mappings
2. Ingress edge label switching router (LSR) receives a packet, performs layer-3
value-added services, and labels the packets
3. LSR switches the packet using label swapping
4. Egress edge LSR removes the label and delivers the packet
Transport Network Reliability – MPLSTransport Network Reliability – MPLS
• Multi Protocol Label Switching (MPLS) optimizes the
traffic flow between critical network resources
• MPLS offers robust recovery framework that goes
beyond protection rings of SONET/SDH
• MPLS meets the requirements of real-time applications
with recovery times of less than 50 ms (comparable to
• Multi Protocol Label Switching (MPLS) optimizes the
traffic flow between critical network resources
• MPLS offers robust recovery framework that goes
beyond protection rings of SONET/SDH
• MPLS meets the requirements of real-time applications
with recovery times of less than 50 ms (comparable to with recovery times of less than 50 ms (comparable to
SONET rings)
– One-to-one local protection
• MPLS-Traffic Engineering (TE) maintains separate backup paths for
each Label Switch Path (LSP)
– Many-to-one local protection
• MPLS-TE maintains single backup path to protect a set of primary
LSPs traversing the network
with recovery times of less than 50 ms (comparable to
SONET rings)
– One-to-one local protection
• MPLS-Traffic Engineering (TE) maintains separate backup paths for
each Label Switch Path (LSP)
– Many-to-one local protection
• MPLS-TE maintains single backup path to protect a set of primary
LSPs traversing the network
Page 19
Transport Network Reliability - Microwave Transport Network Reliability - Microwave
• Failure areas to watch for
• Where to build in redundancy
– Failure of redundant MW links will result in sites falling back to localized site trunking mode
– Every user within the coverage areas of effected sites will be operate in the trunking mode with other users operating within
• Failure areas to watch for
• Where to build in redundancy
– Failure of redundant MW links will result in sites falling back to localized site trunking mode
– Every user within the coverage areas of effected sites will be operate in the trunking mode with other users operating within operate in the trunking mode with other users operating within the same cells
• How to configure non-redundant elements to maximize
reliability
– Frequency diversity
– Path diversity
– Microwave Monitored Hot StandBy
operate in the trunking mode with other users operating within the same cells
• How to configure non-redundant elements to maximize
reliability
– Frequency diversity
– Path diversity
– Microwave Monitored Hot StandBy
Page 20
• From our discussion today, you should understand
that a distributed system that makes us of….
– Redundant components , Failsoft mode, sub-system independence and other recovery mechanisms
– Coupled with the inherent reliability of a trunked network
– A ring topology for rapid recovery from transport disruptions
• From our discussion today, you should understand
that a distributed system that makes us of….
– Redundant components , Failsoft mode, sub-system independence and other recovery mechanisms
– Coupled with the inherent reliability of a trunked network
– A ring topology for rapid recovery from transport disruptions
SummarySummary
– A ring topology for rapid recovery from transport disruptions
– Along with the fault tolerance, adaptive routing, and disaster recovery capabilities of IP
• Results in a radio network that can survive multiple
failures and continue to provide a level of
communication to its users
– A ring topology for rapid recovery from transport disruptions
– Along with the fault tolerance, adaptive routing, and disaster recovery capabilities of IP
• Results in a radio network that can survive multiple
failures and continue to provide a level of
communication to its users
Page 21
VPN B
VPN A
MPLSBackbone
VPN A
VPN B
Transport Network Reliability - MPLS
VPN B
Backhaul Network
VPN A
VPN B
Page 23