04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools
description
Transcript of 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools
CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Flexi NG Maintenance Procedures and Tools
2 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Nokia Solutions and Networks Academy
Legal notice Intellectual Property Rights All copyrights and intellectual property rights for Nokia Solutions and Networks training documentation, product documentation and slide presentation material, all of which are forthwith known as Nokia Solutions and Networks training material, are the exclusive property of Nokia Solutions and Networks. Nokia Solutions and Networks owns the rights to copying, modification, translation, adaptation or derivatives including any improvements or developments. Nokia Solutions and Networks has the sole right to copy, distribute, amend, modify, develop, license, sublicense, sell, transfer and assign the Nokia Solutions and Networks training material. Individuals can use the Nokia Solutions and Networks training material for their own personal self-development only, those same individuals cannot subsequently pass on that same Intellectual Property to others without the prior written agreement of Nokia Solutions and Networks. The Nokia Solutions and Networks training material cannot be used outside of an agreed Nokia Solutions and Networks training session for development of groups without the prior written agreement of Nokia Solutions and Networks.
4 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Module contents
Tools and utilities Fault management Backup and restore configuration Upgrade Flexi NG software
5 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Module objectives
- Efficiently utilize the troubleshooting tools provided in Flexi NG - Understand the Flexi NG alarm and logging systems - Backup and restore Flexi NG system - Upgrade Flexi NG software
CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Tools and utilities
7 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Tools and utilities
The Flexi NG provides a wealth of tools and utilities for configuring, monitoring and troubleshooting the system
- Generic OS (Linux) tools • ifconfig, netstat • ping, tcpdump, traceroute
FlexiPlatform tools • fsclish, fshascli • Statistics (PM9), logging system (syslog-ng), Alarm system
Application tools • Subscriber (IMSI) trace • Session database dump
Hardware maintenance tools • Shelf Manager CLI and fsclish
8 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
fsclish or SCLI
• defined and structured commands with context-sensitive help and auto-completion of commands
• an interactive fsclish shell
9 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Managed object states
- With fsclish you can monitor and alter managed object states - Remember that MO in FlexiPlatform can mean:
• Cluster • Node • Recovery Group • Recovery Unit
10 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
fshascli
- An alternative platform command for Managed Objects is fshascli - It can be used, e.g., in case fsclish for some reason cannot be launched - For example,
fshascli –s /ClusterServer
11 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
nginfo script
- Symptom data collection tool for gathering configuration and status snapshot into a tar file
- The resulting tar files can be added to problem reports - List or unpack contents with “tar” command - NOTE: It is recommended to use nginfo troubleshooting tool in a
live network only during low operational traffic volumes - nginfo –v displays the delivery label - nginfo –s generates a status summary
12 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
nginfo -s With nginfo –s tool you can query at once the MO statuses from the HAS Cluster status check:
********************* / Cluster OK ********************* Node status check: ****************** /CLA-0 Node OK /CLA-1 Node OK /AS7-0 Node OK /AS7-1 Node OK /AS10-0 Node OK /AS10-1 Node OK ****************** Service status check: ********************** NodeHA Service OK NodeOS Service OK OSProxy Service OK NetworkManager Service OK ConfMgmtActivator Service OK ClusterHA Service OK
13 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Continuous data collection
• When the NGInfoOmDataCollector RG in CLA nodes and NGInfoDataCollector RG in AS nodes are unlocked, Flexi NG starts to collect data periodically for troubleshooting purposes
• It is recommended to keep them unlocked • The collected data is saved to the /var/SS_nginfo_om_data_collector/results directory in the CLA node
• Application internal statistics, system CPU and memory usage ratings are examples of periodically collected data
• When nginfo is run, it copies the files from the results directory to the result file of nginfo
14 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Configuration dump
• Allows dumping the current configurations in textual format including both application and platform configuration
• To list the current configurations, enter the following command: • show ng config • To dump the output into a file, run the command:
fsclish -c "show ng config" >ng_config.txt • The dump is copied to the file ng_config.txt, which is stored in the current working directory. • Note: that execution of this command takes some time • In the output, application configurations (ng and ng-admin hierarchies) are displayed first in the
alphabetical order of command groups, and then generic configurations (other hierarchies) follow similarly
• Only actual configurations are included in the dump, that is, runtime information like alarms, internal status information, hardware information are not displayed.
15 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
ifconfig
- ifconfig – displays network interface configuration and status # ifconfig –a
bond0 Link encap:Ethernet HWaddr 00:A0:A5:62:EE:B2
inet addr:169.254.0.4 Bcast:0.0.0.0 Mask:255.255.255.0
inet6 addr: fe80::a0:a500:d62:eeb2/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:5764520 errors:0 dropped:0 overruns:0 frame:0
TX packets:3564164 errors:1 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
...
eth4 Link encap:Ethernet HWaddr 00:A0:A5:63:B1:3C
inet addr:10.31.140.100 Bcast:10.31.140.255 Mask:255.255.255.0
inet6 addr: fe80::a0:a500:763:b13c/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:219298 errors:0 dropped:0 overruns:0 frame:0
TX packets:49139 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
Interrupt:216 Base address:0xcc00 Memory:fd8e0000-fd900000
16 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Displaying interface status show networking interface runtime node AS-0 iface ethrtm2_1 Showing runtime status of interfaces
ethrtm2_1 index : 7 node : AS-0 type : Ethernet flags : UP BROADCAST RUNNING MULTICAST FP_OUTPUT speed : 1G MAC : 00:00:50:4e:40:81 MTU : 1500 admin state : up oper state : up Transceiver : SGMII Rx packets : total : 2 bytes : 200 error : 0 Tx packets : total : 1349835735 bytes : 951617099392625 error : 0
17 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
IP addressing: fsclish
- With fsclish you can check the IP addressing in the system
show networking instance vrfgi address . . . ethrtm1_1 type : dedicated address : 10.31.171.10/29 owner : /AS7-0 ethrtm1_2 type : dedicated address : 10.31.139.80/24 owner : /AS7-0
18 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
check the networking-service configuration
• show networking-service dns
19 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
host-ping
• fsclish provides the host-ping utility host-ping node-name AS7-0 source-vrf 1 destination-address 10.2.20.5 source-interface 10.131.179.241
ID of any existing VRF. The allowed VRF ID value range is from 1 to 599. The default VRF (ID = 0, Name = default) always exists in Flexi NG.
20 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
host-traceroute
- fsclish provides the host-traceroute utility host-traceroute node-name AS7-0 source-vrf 1 destination-address 10.31.171.9 source-interface ethrtm1_1
21 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Monitoring routes: fsclish
With fsclish you can configure Flexi NG and view configuration data, such as routes
root@CLA-0 [SAEMUC] > show routing instance default node AS7-0 route Codes: C - Connected, S - Static, I - IGRP, R - RIP, B - BGP, O - OSPF E - OSPF external, A - Aggregate, K - Kernel Remnant, H - Hidden P - Suppressed S 10.10.205/24 via 10.31.139.254 ethrtm1_4 cost 0 age 74222 S 10.31.136/24 via 169.254.0.110 bond0 cost 0 age 74953 C 10.31.139/24 is directly connected ethrtm1_4 S 10.31.171/29 via 10.31.171.14 ethrtm1_1 cost 0 age 74027 C 10.31.171.8/29 is directly connected ethrtm1_1 C 10.31.171.241/32 is directly connected lo C 10.31.171.242/32 is directly connected lo C 169.254/24 is directly connected bond0 C 169.254.1/24 is directly connected bond1
22 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
tcpdump - Monitor traffic in network interfaces - See ”man tcpdump” for help - NOTE: As user plane traffic is handled predominantly in the fastpath environment, tcpdump cannot by default be
used for capturing user plane packets
# tcpdump -i ethrtm1_1 vlan 359 and proto 89
tcpdump: listening on ethrtm1_3
02:14:32.442122 I 802.1Q vlan#10 P0 10.2.10.33 > 224.0.0.5: OSPFv2-hello 80:
RID 10.2.10.33 backbone [|ospf] [tos 0xc0] [ttl 0]
02:14:32.833796 I 802.1Q vlan#10 P0 10.2.10.35 > 224.0.0.5: OSPFv2-hello 80:
RID 10.2.1.35 backbone [|ospf] [tos 0xc0] [ttl 0]
02:14:34.366025 I 802.1Q vlan#10 P0 10.2.10.34 > 224.0.0.5: OSPFv2-hello 80:
RID 10.2.1.34 backbone [|ospf] [tos 0xc0] [ttl 0]
^C
14 packets received by filter
0 packets dropped by kernel
#
Tcpdump is an essential tool for analysing IP-based traffic on all Unix platforms. The user supplies the input interface (physical or logical), optional parameters and a filter specification. Without filters, tcpdump shows all packets to both directions across the interface.
23 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Displaying the runtime information for forwarding table • show networking [instance <vrf_name>] forwarding-table runtime node
<node_name> • show routing [instance <vrf_name>] route runtime mobile
24 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Capturing data to files
- Network traffic can be captured into a binary file with –w <filename> with tcpdump
- Capture file can be read with –r <filename> option - In Service Blades everything is in a memory file system
• Store capture file temporarily e.g. under /var/tmp directory • Use filters to keep resulting file size small • /var/tmp refers to directory /mnt/mstate/AS7-0/var/tmp on the
CLA from where the AS node has mounted its filesystem (over NFS)
Network traffic can be captured with tcpdump into a binary file with –w <filename> option. The file can be read later on with –r <filename> option. Note that especially if you do not apply any filters, the resulting file can grow quite large if there is a lot of traffic.
25 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Capturing user plane packets with tcpdump
• tcpdump cannot by default capture user plane traffic because it is a Linux process and user plane packets are handled in the fastpath environment
• When the tcpdump is set to capture user plane traffic, the packets matching the tcpdump filter are passed from fastpath to the control plane (Linux environment)
• Hence, it can should only be used with filters that limit the captured traffic to the necessary only
• User plane traffic capturing will cause performance degradation • Should be disabled immediately after completing the debugging
- To enable tcpdump for user plane traffic fsfastpath set-tap-enable on
26 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Subscriber trace
• The operator can activate subscriber trace in Flexi NG • Traffic capture for a specific IMSI • Up to 50 subscribers can be traced simultaneously • Alarm threshold for the used disk space can be set
- Different types of data can be collected for a subscriber • Signaling traffic capture • User plane traffic capture • Low-level logs consists of Event logs and Internal logs
- The trace files are stored in /var/log directory on the CLA where the /TraceCtrl RG is active
- The user plane captures are encrypted by default
Once configured, the trace activation is automatically distributed to each service blade in the system, and collected data is aggregated in the management blade.
27 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Subscriber trace logs
• Used to gather detail info in the system for TSH of suspected software fault • Event logs
– generate information about software events and IMSI database dump – Possible software events : subscriber create, session create, bearer create, subscriber update,
session update, bearer update, subscriber delete, session delete, bearer delete – In combined S-GW and P-GW mode, only one actual subscriber session exists, only one set of
event logs produced. – The file name format: TRACE_imsi_date_time_EVENT.txt
• Internal logs • collect low-level internal log entries as configured in the generic log level configurations • file name format: TRACE_imsi_date_time_INTERNAL.txt
Subscriber trace logs are used to gather detailed information about what happens in the system for a particular end-user. This information is mainly intended for troubleshooting purposes when a fault in software is suspected, and should only be enabled when advised by NSN Customer Support.
28 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Event logs Subscriber trace can generate information about software events and IMSI DB dump related to a traced subscriber. The events are always related to external signaling. A single event log can represent single or multiple actual request/response messaging flows which are exchanged during the overall event. Due to this, an event log does not indicate the actual signaling that took place. The software events are structured according to the affected entity (subscriber, session, and bearer). For example, a single new PDP context creation for a subscriber without existing connections triggers three events: one create event for a subscriber, one for a session, and one for a bearer. This allows natural traceability, for example, for primary and secondary PDP contexts (and for default and dedicated bearers in LTE). The following software events can be logged for each traced subscriber: • subscriber create • session create • bearer create • subscriber update • session update • bearer update • subscriber delete • session delete • bearer delete The IMSI DB dump event can be logged for each traced subscriber for the following scenarios: • session create • session update • session delete
Subscriber trace logs, cont.
29 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
In combined S-GW and P-GW mode, only one actual subscriber session exists. Thus, only one set of event logs is always produced. The file name format for event logs is TRACE_imsi_date_time_EVENT.txt Internal logs Subscriber trace can collect low-level internal log entries which are related to the traced subscriber. The internal logs are generated as configured in the generic log level configurations. Configuration of those log levels are independent of the subscriber trace feature configuration. When the subscriber trace is configured to collect the internal logs for a certain subscriber, the system, that follows the generic log level configurations, generates logs for this subscriber. These logs are then forwarded to the subscriber trace output files. Specific log entries collected with subscriber trace functionality are also visible in the system level log files. All the application level logs related to a particular subscriber must be traced, but the common system level logs are visible only in the syslog. The file name format for event logs is TRACE_imsi_date_time_INTERNAL.txt
Subscriber trace logs, cont.
30 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Capture points for subscriber trace
31 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Configuring subscriber tracing
- To enable the feature in general set ng feature subscriber-trace-functionality enable
- To activate the tracing, you need to specify the IMSIs and the data that will be collected for the user add ng trace subscriber-trace imsi 244060000005021 gather-events enable gather-logs enable gather-signaling-traffic enable gather-user-traffic enable
- To generate the trace files you need to disable tracing set ng trace subscriber-trace imsi 244060000005021 gather-events disable gather-logs disable gather-signaling-traffic disable gather-user-traffic disable
32 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Disk usage monitoring for subscriber trace - To set an alarm threshold for disk usage
• When the configured maximum size is reached, all traces for all subscribers are paused and alarm 71507 OUT OF DISK STORAGE is raised
set ng trace general subscriber-trace-max-disk-usage <subscriber-trace-max-disk-usage in MB>
• The centralized part of the subscriber trace feature (trace_ctrl process) • The disk usage monitoring is only active on the CLA on which the
centralized trace_ctrl process is active Tracing resumes automatically again when the monitoring function detects that the aggregate file size is below 80% of the configured maximum. Integer: 100 - 4000. The default value is 2000.
33 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Session database dump
- With fsclish it is possible to check the session database using a subscriber’s IMSI, MSISDN or UE IP as the key
show ng trace data-base-dump filter imsi <imsi> show ng trace data-base-dump filter msisdn <msisdn> • show ng trace data-base-dump filter vrf <vrf>
[addressv4<addressv4>] [addressv6 <addressv6>] • Output is automatically adapted to configured Flexi NG mode (different in GGSN and LTE) • Contains information for subscriber, session and bearer(s)
• Subscriber: IMSI, MSISDN, IMEISV, Node • Session, e.g.: RAT type, Assigned IP Address, APN, session profile, active PCC rule bases, … • Bearer, e.g.: TEID, QoS settings, throughput counters, dynamic PCC rule names,
34 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Triggering individual session disconnections
• Triggering individual session disconnections based on the subscriber’s IMSI (full IMSI required, no wildcards allowed).
• Optionally, and in addition to the subscriber‘s IMSI, session disconnections can be triggered based on the EPS bearer identification (EBI) value or Network layer Service Access Point Identifier (NSAPI) value.
• In case default bearer is disconnected with the command, the related dedicated bearers are also disconnected
- Valid for the GGSN, P-GW, S-GW and combined S-GW and P-GW modes. - Trigger session disconnections using command: set ng-admin subscriber disconnect imsi <imsi> [bearer-id <bearer-id>] - Only one disconnection CLI command can be executed at a time - Visible in external signaling with suitable cause code values and in CDRs (closing
reason)
35 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Session disconnections are triggered based on the subscriber’s IMSI and optionally, based on the EBI/NSAPI value using the ng-admin subscriber disconnect command. This command is responsible for triggering and not performing the actual session disconnections. Triggering session disconnection configuration instructions are valid for the GGSN, PGW, S-GW and combined S-GW and P-GW modes. The S-GW node does not trigger dedicated bearer deletion messages towards the PGW node. In the case of P-GW or combined S-GW and P-GW mode, where an S5 PMIP PDN connection linked to the specified IMSI exists, EBI triggering is not applicable since EBI is currently not supported by the PMIP protocol. In such cases, the ng-admin subscriber disconnect command silently ignores the parsed EBI (if one is given as input) and all bearers connected to the indicated IMSI are triggered for deletion. In the case of combined S-GW and P-GW mode, with S-GW and P-GW in different nodes, it is possible that bearer disconnection is triggered in one node (for example, SGW) and the same node triggers bearer deletion messages towards the other node (PGW) before the latter is triggered for session disconnection by the ng-admin subscriber disconnect command. In such a case, the bearer deletion sequence in the second node (P-GW) is actually triggered by the first node (S-GW) instead of the CLI command. For the ng-admin subscriber disconnect command, the following restrictions apply: • Must not be used to trigger multiple but only occasional disconnections. Multiple disconnections are not supported and
therefore, must not be attempted by, for example, creating automated loops. • Not supported during in-service upgrade.
Triggering individual session disconnections, cont.
36 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Monitoring snapshot statistics
- Flexi NG collects runtime statistics to a specific part of the file system • /cdafs/pmfs pseudo-filesystem structure contains a subdirectory for
each KPI - You can check the current statistics, e.g., with the following command that
provides a snapshot of the statistics related to a specific session profile show stats data current name 3000_Session_profile/NODE-AS10-0/SESSIONPROFILES-SUM/snapshot.txt
37 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
To monitor the user plane CPU load
• The output shows the fastpath CPU load on a given node • show fastpath-cpu-load node-name AS7-0 • The load is displayed as a percentage of the maximum capacity
• The load consists mostly of user plane traffic, but internal messaging and control plane signaling traffic can have a minor influence on the percentage
38 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Hardware management tools
• The HW elements of the system are principally managed with the ATCA Shelf Manager CLI (clia). Please refer to the NED documentation about clia commands.
• Some HW commands are included in fsclish show hwi {brief|verbose} [container] show hardware state list show hardware state node <node name> set hardware power on node <node name> set hardware power off node <node name> set hardware restart node <node name>
- FlexiPlatform hwcli tool can also be used for HW maintenance, e.g. hwcli -t
CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
The Logging System
40 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Logging in the FlexiPlatform • Each node runs a copy of the syslog-ng daemon • All nodes send their logs to the central syslog-ng daemon running on the active CLA
Application
/dev/log
NE
Active CLA
Proxy syslog-ng syslog-ng
daemon /srv/Log/log/
/var/log/local
TCP 610
Application
/dev/log Local logs
Cluster-wide logs
/var/log
If local filesystem exists
Proxy syslog-ng
41 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Syslog in the FlexiPlatform 5 Cat Each node runs a syslog-ng proxy that reads logs from /dev/log and kernel messages from /proc/kmsg. The syslog-ng proxy forwards all the logs over TCP to syslog-ng master that runs in the active CLA node. The CLA where the Recovery Group /Log has the Recovery Unit (FSLogServer) running in active role is considered to be active from the log system point of view. Additionally the syslog-ng proxies write all logs to a named pipe /tmp/coroner_fifo which is used by the coroner process to locate log entries that should be converted into alarms. If the node has local disks the syslog-ng proxy also writes logs to the local disk (into the /var/log directory). In Flexi NG, only the CLAs own local disks. The syslog-ng master listens to TCP port 601 and writes the logs to the local disks in the active CLA. The directory for these log files is /srv/Log/log/. There is the master-syslog file that contains cluster-wide log records. In addition, for each node a separate log file is automatically generated (e.g., syslog-AS10-0.log). The syslog-ng proxies and the syslog-ng master are instances of the same executable. They use different configuration files to differentiate their behavior. The syslog-ng proxies are started by the operating system’s init scripts when the node is booted and use the Node IP address as their source IP address. The syslog-ng master, is started by HAS and it binds to the redundant IP address of /Log recovery group.
Logging in the FlexiPlatform, cont.
42 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
• Distributed components running on every node, which locally collect the node’s input data, and forward the data to the centralized server located in a management blade.
• Centralized server located in a management blade, which aggregates the data coming in from all nodes, and presents the data in the format suitable for each standard interface.
• The external interfaces are available from the management blades, providing full access to aggregated alarms, statistics and logs
43 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Log Recovery Group
- The /Log Recovery Group runs on Active / Coldstandby redundancy on the CLAs • Runs on the same node as /SSH Recovery Group (stalker)
- The CLA where the active FSLogServer Recovery Unit is running updates the cluster-wide logs in the /var/log/master-syslog file • Soft link to file /srv/Log/log/syslog • Node-specific syslog files are automatically extracted from the same file
(e.g., syslog-AS10-0.log)
44 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
# fshascli -v /Log /Log: RecoveryGroup /Log specialConstraints=(serviceInterruptionRequiresForce) RecoveryUnit /CLA-0/FSLogServer recoveryUnitType=(HARecoveryUnit) Process /CLA-0/FSLogServer/MasterSyslogDaemon command=(/sbin/syslog-ng -F -p /var/run/master-syslog-ng.pid -f /etc/syslog-ng/master-syslog-ng.conf) status=(nonHA) startMethod=(requested) severity=(modest) RecoveryUnit /CLA-1/FSLogServer recoveryUnitType=(HARecoveryUnit) Process /CLA-1/FSLogServer/MasterSyslogDaemon command=(/sbin/syslog-ng -F -p /var/run/master-syslog-ng.pid -f /etc/syslog-ng/master-syslog-ng.conf) status=(nonHA) startMethod=(requested) severity=(modest)
Log Recovery Group, cont.
45 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Log files The log directory contains
– Cluster-wide system logs – Local logs for the CLA – AS node specific logs – Debug logs – Alarm logs – Audit logs
File Contents
auth.log All logs with facility AUTH and with any severity debug All logs with severity set at DEBUG except for facilities AUTH and AUTHPRIV
syslog All logs with severity set between INFO and EMERG except for facilities AUTH and AUTHPRIV
User’s console All logs with severity set at EMERG
Note: The logs are rotated weekly The log files will be rotated if the log file size exceeds 50 Megabytes Copies of 10 previous log files are saved and compressed
46 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Syslog facilities
• Syslog entries are generated with two parameter to aid in filtering the desired logs into different log destinations (files, external servers, console…)
• The parameters are known as facility and loglevel • The facility defines what type of an application generated the log • The level defines the importance or criticality of the log entry
– Log-error (the default in NG10) – Log-info – Log-debug – Log-debug2
47 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Syslog Facility Source of logs
auth or authpriv Login authentication
cron cron subsystem
daemon System server processes
kern The Linux kernel
ftp Messages from FTP daemons
lpr Print spooling subsystem
mail mail subsystem
news news subsystem
syslog Messages from syslog itself
user The default for unspecified log entries
uucp From the uucp applications
localN locally defined facilities (N=0-7)
Syslog facilities, cont.
48 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Syslog facilities, cont. Syslog level
Description
EMERG
A panic condition. This is normally broadcast to all users
ALERT
A condition that should be corrected immediately
CRIT
Critical conditions, e.g., hard device errors
ERR
Errors
WARNING
Warning messages
NOTICE
Minor errors that do not necessary require special attention
INFO Informational messages
DEBUG normally uses only when debugging a program
49 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Configuring the log levels
- The log levels can be configured in fsclish • Per node • Per process
- For example, to enable debug logs on a node for a specific process set ng trace general log-level process session-controller node AS7-0 log-level log-debug
NOTE: It is recommended that only the log-error log level is used in real runtime environment. The other log levels should only be used in a laboratory test environment because log configurations can have a negative effect on performance.
50 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Sending Logs to an External Log Server
• The syslog-ng in the FlexiPlatform can be configured to send a copy of all log messages to a centralised external syslog-ng server
• To configure the functionality, add the following lines to the central syslog-ng configuration file:
/etc/master-syslog-ng.conf # Additional destination definitions
################################## destination udp-to-logserver { udp("logserver-ip-or-name" localip("Directory") port(610) ); } destination tcp-to-logserver { tcp("logserver-ip-or-name" localip("Directory") port(610) ); }; # Additional log commands
############################################# log { source(src); destination(tcp-to-logserver); };
51 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Operator log
• System processes write information about events in chronological order to operator log • These logs are meant for operating personnel as an indication of specific problems for example
regarding external interfaces, and as the usage is strictly defined and controlled they are all enabled by default
• can be accessed in the /var/log/syslog-operator.log • This file contains log entries of pmip-sig, gtp-c, dia_client, radius_client and dhcp-sig processes • All possible data is not recorded to syslog-operator.log file, as this would cause a slowdown on the
system. Therefore, operator log levels are used to generate only the needed operator log information. Log Level Log Code Description
notice LOG_NOTICE Written for error response messages coming from peer network elements
info LOG_INFO Mainly used for logging incoming and outgoing messages, both internal and external.
52 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
More detailed information can be acquired by configuring lower level logs, but these will have significant performance impact. Due to this, a new concept (with limited scope and coverage) called operator logs have been introduced. Example of operator log entries on AS nodes, at the log level notice: Apr 23 12:24:12 notice AS7-0 session_ctrl[1544]: [0]: [2181038110]: FAILURE_RESP_RECV_FROM_PEER IF=GTPV2 IMSI=240080000000001 IMEI=001000800000010 MSISDN=467080000000001 SourceAddress=IPV4=7.23.186.31 AdditionalInfo=Create Session Response received with Cause SYSTEM_FAILURE[72] (gtp_sc_gtpc_if.c:682) //206515 Apr 23 12:29:52 notice AS7-0 session_ctrl[1544]: [0]: [2181038142]: FAILURE_RESP_RECV_FROM_PEER IF=GTPV2 IMSI=240080000000001 IMEI=001000800000010 MSISDN=467080000000001 SourceAddress=IPV4=7.23.186.13 AdditionalInfo=Update Bearer Response received with Cause REQUEST_REJECTED[94] (gtp_sc_gtpc_if.c:620) //233340 Apr 23 12:34:19 notice AS7-0 session_ctrl[1544]: [0]: [2264924168]: FAILURE_RESP_RECV_FROM_PEER IF=PMIP IMSI=240080000000001 IMEI=001000800000010 MSISDN=467080000000001 SourceAddress=IPV6=2001:490:FF0:C203:0:0:717:BA11 AdditionalInfo=PROXY BINDING ACKNOWLEDGE received with status PROXY_REG_NOT_ENABLED[152] (pmip.c:669) //243466
Operator log, cont.
53 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Apr 23 13:27:58 notice AS7-0 session_ctrl[15566]: [0]: [2181038085]: FAILURE_RESP_RECV_FROM_PEER IF=GTPV1 IMSI=240080000000001 IMEI=4901371095200109 MSISDN=91240080000000001 SourceAddress=IPV4=7.23.184.11 AdditionalInfo=Update PDP Context Response received with Cause SYSTEM_FAILURE[204] (gtp_sc_gtpc_if.c:464) //35368 Apr 23 12:54:21 notice AS7-0 session_ctrl[1544]: [0]: [2550136848]: FAILURE_RESP_RECV_FROM_PEER IF=DHCPv4 IMSI=240080000000001 IMEI= MSISDN=91467080000000001 SourceAddress=IPV4=8.23.186.18 AdditionalInfo=Received DHCPNAK message (dhcp.c:138) //299160 Apr 24 08:29:10 notice AS10-0 session_ctrl[13046]: [0]: [2298478702]: FAILURE_RESP_RECV_FROM_PEER IF=DIAMETER IMSI=240080000000001 IMEI= MSISDN=91467080000000001 SourceAddress=IPV4=8.23.186.16 AdditionalInfo=CREDIT CONTROL ANSWER(Update) Failure Response Received with Cause DIA_RESOURCES_EXCEEDED[5006] on Gx interface from PCRF (dia_gx.c:4359) //318513 Apr 24 08:56:43 notice AS10-0 session_ctrl[13046]: [0]: [2164260892]: FAILURE_RESP_RECV_FROM_PEER IF=RADIUS IMSI=240080000000001 IMEI=001000800000010 MSISDN=467080000000001 SourceAddress=IPV4=8.23.186.12 AdditionalInfo=Radius Access Reject Response received with Cause RC_REJECT_FROM_SERVER[12] (sc_if_radius.c:2154) //364294
Operator log, cont.
54 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Configuring operator log levels
• Set the operator log level with the following command: • set ng trace general operator-log log-level <log-level> • Use grep ‘<string>’ <file name> command to view operator log
entries in different fields. • Use less <file name> command to print the contents of a operator log to
the screen page by page. • Use tail <file name> command shows the last entries of a operator log,
which can be helpful for identifying faults that have just occurred
55 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
MySQL logs
- In Flexi NG, the alarms are stored in MySQL database - For troubleshooting purposes, the database log files can be useful - They are located in /var/mnt/local/MySQL_DB_Alarm
• mysql.err (the MySQL server log file) • my.cnf (MySQL config file) • odbc.ini (ODBC configuration file)
56 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Audit Trail Log
• Provides a centralized security event logging mechanism • Mainly used for authentication and authorization related events • Utilizes syslog as a transport • Logs are stored in /var/log/auth.log on the CLA where the
FSLogServer RU is running active – Soft link to /srv/Log/log/fsaudit/auth.log
• Logs are easily modifiable to NetAct format and transferrable to NetAct • Protected access to root and accounts belonging to _nokfsuiseclog
group
57 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Configuring the Audit Trail Log root@CLA-0 [FP] > set logging auditlog [MX] message - Message that will logged in the syslog. All white space characters found in
the message will be replaced by a single space. Note : Include the message in inverted commas. eg. "Any Message".
[OX] address - Remote address. By convention, it is the address of the remote client. [OX] audit-id - audit id. [OX] executable - executable file name. [OX] facility - facility value. The default is authpriv. [OX] hostname - Remote hostname. By convention, it is the hostname of the remote client, if
it is available. [OX] priority - Message priority. This is optional, but if left out, info will be used by
default. [OX] process-id - process id. [OX] server-address - Server address. By convention, it is the address used on the server. Useful
for multi-homed applications. [OX] server-port - Server port. By convention, It is the port on the server from which the
request came. [OX] session-id - session id. [OX] target-id - target id. [OX] user - user name. [OX] user-id - user id.
58 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Data Provided by Audit Trail Httpd
• Authentication logs • Access logs
SCLI, RUIM, sudo • Authentication logs • Access logs
FTP • Authentication logs (direct , NE3S) • Access logs (direct , NE3S)
LDAP • Authentication logs • Access logs
SSH • Authentication logs
CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Alarm Management
60 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
What is Alarm Management? • Part of Fault Management (FM) • FM detects, isolates and corrects failures (if possible) • An application uses alarms to
– Indicate faults that require corrective actions – Indicate a potential or impending fault
• The corrective action can be: – Automatic (for example HAS performs a switchover) – Manual (for example the operator fixes a fault by replacing a broken component)
The alarm system indicates potential faults in the system as well as faults that require corrective actions. After an alarm is raised, the fault causing the alarm must be solved. The solution can be an automatic recovery or a manual corrective action. For a potential fault, the solution consists of preventive actions. Alarms are typically used in situations where it is possible to give instructions for corrective actions in the alarm description, such as replacing a hardware unit. If a system restart is needed, the whole alarm processing will be restarted, because no alarm stays active over a system restart.
61 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
NE
WebUI NetAct Mgmt App.
Alarm DB (MySQL)
Alarm Agent
Tomcat NE3S
Alarm Agent
SCLI
SCLI
Alarm Processor
Syslog Convenience library
Application
Alarm Agent
The Alarm System
62 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
FlexiPlatform Alarm terminology • The alarms occurring in the FlexiPlatform are based on 3GPP 32.111 series Release 4 specifications
– Most of the 3GPP requirements are based on ITU-T recommendation X.733 • An Alarm is generally a single instance which can be uniquely identified from other alarms of the
same type by instance specific attributes (Specific ID, Managed Object, Time etc.) • Alarm Type defines a class of alarms that define the specific problem (70001), event type
(communications), Severity (Minor) and other type specific attributes that is shared among alarms of the same alarm type
• Alarm event is an occurrence of a task performed on an alarm by an application or operator – For example: raising, clearing or acknowledging an alarm
Type-specific alarm parameters are either dynamic or static. You can only modify the dynamic parameters, which are • Default Severity • Autoacknowledged • Clearing Delay • Informing Delay • Time to Live • Switchover Update
63 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Communication through syslog
• Alarm raise and clear notifications are sent by applications through convenience libraries – The libraries place the alarms in syslog
• Syslog records are forwarded to a dedicated file (alarms) • Alarm processor parses alarm notifications from the alarms file
– The cluster-wide alarms are located on the CLA where the FSLogServer RU is running active in the file /var/log/master-alarms
– Soft link to /srv/Log/log/fsaudit/alarms
Following is the syslog configuration for the alarm filtering to a file (in this case local alarms):
destination local-alarms { file("/var/log/alarms" template("$FULLDATE $MSGONLY\n") template_escape(no) perm(0644)); };
filter raiseftr { match("ALARM RAISE"); }; filter clearftr { match("ALARM CANCEL“); }; log { source(src); filter(raiseftr); destination(local-alarms); }; log { source(src); filter(clearftr); destination(local-alarms); };
64 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Alarm Processor (SS_AlProcessor)
• Alarm processor is the alarm system’s core • It is a standalone java application • It processes alarm notifications from the alarm file and stores them persistently
in the alarm system database • It processes notifications by grouping them into batches for handling
correlation dependencies between events
65 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Alarm Processor – Status Check
• The alarm processor cannot raise alarms about its own failure • The alarm processor is monitored via heartbeats • Heartbeating assumes raising/clearing a special heartbeat alarm • Heartbeat alarm (70246) is raised/cleared (70247) repetitively through
predefined time interval (5 minutes by default) • Lack of 70247/70246 notifications indicates a failed alarm processor
2006 May 3 09:24:03 ALARM CANCEL SP=70247 MO=MOID Wildcard AP=fshaProcessInstanceName=AlarmProcessor,fshaRecoveryUnitName=FSAlarmSystemServer…. 2006 May 3 09:24:03 ALARM RAISE SP=70246 MO=fshaProcessInstanceName\
=AlarmProcessor,fshaRecoveryUnitName=FSAlarmSystemServer,\...
66 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Managing Alarms
• Alarms have three attributes to determine the status of the alarm: – Acknowledgement status – Severity – Clearing status
• A new alarm is always unacknowledged and not cleared with the severity level being one of the allowed values
• When an operator sees an alarm and starts to work on it, the alarm should be acknowledged (i.e. a human is working on it) by using the alarm browser
• If the cause alarm is not fixed by the operator, the operator can unacknowledge the alarm to signal that the problem is not being worked on
• Once the cause of the alarm has been solved, the alarm can be cleared by the operator using the alarm browser or automatically by the system after a predefined time
67 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Alarm data • Based on 3GPP 32.111 series Release 4 specifications ITU-T recommendation X.733 • Alarm types
- Type-specific - Instance-specific
• Type-specific alarm type - Defines attributes specific to an alarm type - Static or dynamic - Only dynamic parameters can be modified
• Instance-specific alarm type - A specific alarm raised by an application
The following dynamic parameters of type-specific alarm type can be changed dynamically • Default Severity • Autoacknowledged • Clearing Delay • Informing Delay • Time to Live • Switchover Update
The type-specific alarm parameters and their values are listed in the Type-specific alarm parameters table. The instance-specific alarm parameters and their values are listed in the Instance-specific alarm parameters table.
68 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Alarm filtering and correlation • Filtering based on identifying fields • Alarm identifying fields
- Managed Object Id - Specific Problem - Identifying Additional Information - Application Id
• Alarm processor makes the alarm correlation by using the information provided by high availability services
- During repair or recovery action - Change of state of the MO
When two alarm instances have the same values for all the identifying fields, the alarm instances are interpreted to be the same alarm. If the same alarm is raised when it is already active, the new alarm is filtered out. An exception to this case is when an alarm is repeated with different severity. In this case, the alarm is considered changed and is updated in the alarm system database.
69 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Severity levels of the alarms
• Critical - Service-affecting condition occurred and immediate action required
• Major - Service-affecting condition has developed and urgent corrective action required
• Minor - Condition does not affect services and corrective actions are needed for preventing
serious faults • Warning
- Indicates the detection of a service-affecting fault before detecting any defects • Intermediate
- Indicates that severity of the alarm cannot be indicated • Cleared
- Indicates one or more previously reported alarms
70 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Viewing alarm data with fsclish
• Check the summary of active alarms. • show alarm active-summary brief
root@CLA-0 [FP] > show alarm
[X] active - shows the list of all active alarms
[X] alarmcount - the count of the number of Alarms
[X] history - shows the alarm history
[X] historycount - the count of the number of Alarm Events (History)
CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Backup and restore
72 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Backup types - Full
- The full backup contains the file system image as well as configuration database • copy of software volumes (system image), • configuration volumes • the master state volume • application file systems • plug-ins • databases
- Full backup can be used for restoring the system from scratch - It is recommended to make a full backup after commissioning, before and
after SW upgrade - Backup files need to be moved to a safe place - Old backup files should be deleted from the system hard disk
73 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Backup types - Partial
- In the partial backup the administrator can manually specify which parts of the file system are backed up • configuration volumes • the master state volume • application file systems • plug-ins • databases
- The partial backup cannot be used for restoring a crashed system but, e.g., a corrupted database can be restored
- A partial backup should be made at regular intervals - Backups can be scheduled with the standard cron utility
74 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Backup file - Backup file is created in directory /mnt/backup on the CLA hard disk
• ISO file • NetworkElementName_<full|partial>_backup_YYYYMMDD_hhmm.iso
- The image contains a METADATA file, base image for mips64 architecture and a base image for i386 architecture. • Thus the backup image may contain several software volumes (one for each
architecture) as follows: —rw—r——r—— root/root 95 2007-08-23 METADATA —rw—r——r—— root/root 30318904 2007-08-23 R_FP5_1.29.i386.img.gz —rw—r——r—— root/root 50324321 2007-08-23 R_FP5_1.29.mips64.img.gz —rw—r——r—— root/root 50324321 2007-08-23 R_FP5_1.29-INITIAL.img.gz
• The METADATA file contains information specific to the backup iso image, such as the delivery label, type of each image contained, the size of each image and the name of the backup creator
75 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Making a backup - Preparations Before taking a backup, check that
• You have root access rights • The network element is up and running in its normal working state. • There is enough free disk space for the backup archive file. You can estimate the
amount of disk space needed from the previous backup archive files. The disk space you need during a full backup is at least twice the size of the old backup archive file.
• If necessary, free disk space by transferring backups to an external server and deleting unnecessary files.
• The database recovery groups are unlocked. To check whether the database recovery group is locked or unlocked execute the following command:
show has state managed-object <mo-name> • To unlock the database recovery group execute the following command: set has unlock managed-object <mo-name>
76 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Making a full backup
- Log in the CLA where the Recovery Unit FSClusterStateServer has active role • Establish an SSH connection to the directory service ssh directory
- Check the status of the Cluster State recovery unit. To check the status on CLA node: show has state managed-object /CLA-[0,1]/FSClusterStateServer • If the Cluster State recovery unit is not active on the current node, you must
perform a switchover of the recovery units with set has switchover managed-object /CLA-[0,1]/FSClusterStateServer
- A full software backup is made by executing the following commands: start backup full commit backup
- Check from the backup logs that the backup succeeds - The backup file is created in /mnt/backup directory
77 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Making a partial backup 1/2
- Log in the CLA where the Recovery Unit FSClusterStateServer has active role • Establish an SSH connection to the directory service ssh directory
- Check the status of the Cluster State recovery unit. To check the status on CLA node: show has state managed-object /CLA-[0,1]/FSClusterStateServer
• If the Cluster State recovery unit is not active on the current node, you must perform a switchover of the recovery units with
set has switchover managed-object /CLA-[0,1]/FSClusterStateServer
78 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Making a partial backup 2/2
- You can start the default type of partial backup with start backup partial commit backup
• Note: default partial backup does not include the backup of software delivery
- You can also manually specify what parts of the file system you want to backup start backup selective add backup delivery add backup config add backup state add backup filesystem add backup plugin add backup database commit backup
79 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Backup logs
- A backup operation creates entries in the following logs: • The syslog in the /var/log directory. • A cumulative backup.log in the /mnt/backup/log directory. • A backup-specific NetworkElementName_<full|partial>_backup_YYYYMMDD_hhmm.log in the
/mnt/backup/log directory. - The syslog contains the following kinds of entries relating to the backup process:
• When the backup was started and by whom. For example: Feb 17 10:49:42 info CLA-0 fsbackup INFO full backup started
• Information on whether the backup process was interrupted. For example: Feb 17 12:34:59 err CLA-0 fsbackup ERROR user interrupted
• The result of the backup process (succeeded or failed). For example: Feb 17 14:47:33 info CLA-0 fsbackup INFO created backup file
/mnt/backup/VirtCluster_partial_backup_20090217_1440.iso Feb 17 14:47:34 info CLA-0 fsbackup INFO partial backup completed
successfully Feb 17 10:54:04 err CLA-0 fsbackup ERROR full backup failed
80 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Transferring the backup file to external server
- It is recommended to move the backups to a safe location such as an external file server
- In the active CLA node, compute an MD5 checksum of the existing backup file, e.g. md5sum ATCA19_1.23.rw.2735_partial_backup_20090515_1326.iso >
backup_20090515_1326.md5 - Transfer the backup image and the checksum file to the desired location with scp scp /mnt/backup/<backup image> <username>@<external server IP address>:<target location> scp /mnt/backup/<checksum file> <username>@<external server IP address>:<target location>
81 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Restoration
- Complete system crash requires a valid full backup file and recommissioning of the system • A broken system image might also cause a reboot loop. This may be due
to a logical disk crash, a corrupted file system or the accidental removal of files or directories on the system image
- A database, for example, can be restored without recommissioning from a full or partial backup file
82 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Restoring the whole system - Complete system crash requires restoration from a full backup - Before starting, verify that
• You have root access rights • The installation medium, that is, the field engineering workstation (FEWS),
is available • A full backup image is available in the external storage server • Delivery label is not changed during commissioning
- In a full restoration the system is re-commissioned by using a backup image as base build • Check the Commissioning Guide for detailed instructions
- After the commissioning is finished, the following commands are executed: start restore backup-iso <backup.iso> full commit restore
83 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Partial restoration - The partial restoration is performed in the runtime environment to restore
• broken software volumes • configuration volumes • master state volume • Application file systems • plug-ins • Databases
- Log in the active CLA node and perform the restoration commands start restore backup-iso <backup-iso> partial commit restore
• Note: default partial restoration does not include restoring software delivery
A partial restore may be necessary, for example, if the databases are faulty, but the other parts of the system work normally and the network element is accessible. Partial restore is done in the runtime environment and can be performed either from a partial backup image or from a full backup image.
84 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Selective partial restoration
- You can also specify more accurately which parts of the file system to restore start restore backup-iso <backup.iso> selective add restore delivery add restore config add restore state add restore filesystem add restore database add restore plugin commit restore
CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Upgrading Flexi NG software
86 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
In-Service Upgrade • In Flexi NG, the upgrades between sequential CD levels can be performed without service downtime in
2N HA deployment • Always-on PDP contexts/sessions are supported also during upgrades
– No service break for end-users
Sessions
Services
OK
OK
OK
OK
OK
Flexi NG
SB
Octeon/Linux
gtpc_sig session_ctrl
radius_client
gwup_proxy
lib_client
Octeon/SE
gwup
MB
x86/Linux
cdr_collector
lic_client
conf_observer
Gn
Gn
Bp X1_1
X2, X3
Gi
Radius
87 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
High level 2N service node upgrade procedure
Transaction sync
1) Active and standby running same sw delivery
2) Standby node is upgraded (incl. reboot)
3) Active peer detects that standby is again up, and starts data warming
4) After data warming is complete, a controlled switchover (SWO) is initiated
5) Repeat step 2
6) Repeat step 3 (sync in opposite direction)
Data warming
Switchover
Node is down
88 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Software upgrade – ISU Rollback
• The in-service upgrade rollback procedure cancels the ongoing in-service upgrade
• The rollback can only be performed before the upgrade has been committed with the commit sw-manage new command
• If the upgrade has already been committed, you need to downgrade the delivery
• The rollback command rollback sw-manage reverses the upgrade procedure. This results in the following actions:
• The old software delivery is activated in the nodes running the new delivery, including a reboot of the nodes
• The upgrade state exits • The configuration of the old delivery is restored.
89 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.
Downgrading deliveries
• The Flexi NG can have multiple SW deliveries installed at the same time. • Operator can choose one of the installed deliveries as startup software • The downgrade to a previous SW delivery requires system restart (service
break) 1. Operator selects previous delivery (sw_v1.0) as startup image 2. Operator restarts Flexi NG 3. System boots with previous SW image