04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

88
CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved. Flexi NG Maintenance Procedures and Tools

description

FNG NSN Product

Transcript of 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

Page 1: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Flexi NG Maintenance Procedures and Tools

Page 2: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

2 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Nokia Solutions and Networks Academy

Legal notice Intellectual Property Rights All copyrights and intellectual property rights for Nokia Solutions and Networks training documentation, product documentation and slide presentation material, all of which are forthwith known as Nokia Solutions and Networks training material, are the exclusive property of Nokia Solutions and Networks. Nokia Solutions and Networks owns the rights to copying, modification, translation, adaptation or derivatives including any improvements or developments. Nokia Solutions and Networks has the sole right to copy, distribute, amend, modify, develop, license, sublicense, sell, transfer and assign the Nokia Solutions and Networks training material. Individuals can use the Nokia Solutions and Networks training material for their own personal self-development only, those same individuals cannot subsequently pass on that same Intellectual Property to others without the prior written agreement of Nokia Solutions and Networks. The Nokia Solutions and Networks training material cannot be used outside of an agreed Nokia Solutions and Networks training session for development of groups without the prior written agreement of Nokia Solutions and Networks.

Page 3: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

4 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Module contents

Tools and utilities Fault management Backup and restore configuration Upgrade Flexi NG software

Page 4: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

5 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Module objectives

- Efficiently utilize the troubleshooting tools provided in Flexi NG - Understand the Flexi NG alarm and logging systems - Backup and restore Flexi NG system - Upgrade Flexi NG software

Page 5: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Tools and utilities

Page 6: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

7 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Tools and utilities

The Flexi NG provides a wealth of tools and utilities for configuring, monitoring and troubleshooting the system

- Generic OS (Linux) tools • ifconfig, netstat • ping, tcpdump, traceroute

FlexiPlatform tools • fsclish, fshascli • Statistics (PM9), logging system (syslog-ng), Alarm system

Application tools • Subscriber (IMSI) trace • Session database dump

Hardware maintenance tools • Shelf Manager CLI and fsclish

Page 7: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

8 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

fsclish or SCLI

• defined and structured commands with context-sensitive help and auto-completion of commands

• an interactive fsclish shell

Page 8: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

9 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Managed object states

- With fsclish you can monitor and alter managed object states - Remember that MO in FlexiPlatform can mean:

• Cluster • Node • Recovery Group • Recovery Unit

Page 9: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

10 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

fshascli

- An alternative platform command for Managed Objects is fshascli - It can be used, e.g., in case fsclish for some reason cannot be launched - For example,

fshascli –s /ClusterServer

Page 10: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

11 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

nginfo script

- Symptom data collection tool for gathering configuration and status snapshot into a tar file

- The resulting tar files can be added to problem reports - List or unpack contents with “tar” command - NOTE: It is recommended to use nginfo troubleshooting tool in a

live network only during low operational traffic volumes - nginfo –v displays the delivery label - nginfo –s generates a status summary

Page 11: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

12 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

nginfo -s With nginfo –s tool you can query at once the MO statuses from the HAS Cluster status check:

********************* / Cluster OK ********************* Node status check: ****************** /CLA-0 Node OK /CLA-1 Node OK /AS7-0 Node OK /AS7-1 Node OK /AS10-0 Node OK /AS10-1 Node OK ****************** Service status check: ********************** NodeHA Service OK NodeOS Service OK OSProxy Service OK NetworkManager Service OK ConfMgmtActivator Service OK ClusterHA Service OK

Page 12: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

13 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Continuous data collection

• When the NGInfoOmDataCollector RG in CLA nodes and NGInfoDataCollector RG in AS nodes are unlocked, Flexi NG starts to collect data periodically for troubleshooting purposes

• It is recommended to keep them unlocked • The collected data is saved to the /var/SS_nginfo_om_data_collector/results directory in the CLA node

• Application internal statistics, system CPU and memory usage ratings are examples of periodically collected data

• When nginfo is run, it copies the files from the results directory to the result file of nginfo

Page 13: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

14 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Configuration dump

• Allows dumping the current configurations in textual format including both application and platform configuration

• To list the current configurations, enter the following command: • show ng config • To dump the output into a file, run the command:

fsclish -c "show ng config" >ng_config.txt • The dump is copied to the file ng_config.txt, which is stored in the current working directory. • Note: that execution of this command takes some time • In the output, application configurations (ng and ng-admin hierarchies) are displayed first in the

alphabetical order of command groups, and then generic configurations (other hierarchies) follow similarly

• Only actual configurations are included in the dump, that is, runtime information like alarms, internal status information, hardware information are not displayed.

Page 14: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

15 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

ifconfig

- ifconfig – displays network interface configuration and status # ifconfig –a

bond0 Link encap:Ethernet HWaddr 00:A0:A5:62:EE:B2

inet addr:169.254.0.4 Bcast:0.0.0.0 Mask:255.255.255.0

inet6 addr: fe80::a0:a500:d62:eeb2/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1

RX packets:5764520 errors:0 dropped:0 overruns:0 frame:0

TX packets:3564164 errors:1 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

...

eth4 Link encap:Ethernet HWaddr 00:A0:A5:63:B1:3C

inet addr:10.31.140.100 Bcast:10.31.140.255 Mask:255.255.255.0

inet6 addr: fe80::a0:a500:763:b13c/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:219298 errors:0 dropped:0 overruns:0 frame:0

TX packets:49139 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

Interrupt:216 Base address:0xcc00 Memory:fd8e0000-fd900000

Page 15: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

16 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Displaying interface status show networking interface runtime node AS-0 iface ethrtm2_1 Showing runtime status of interfaces

ethrtm2_1 index : 7 node : AS-0 type : Ethernet flags : UP BROADCAST RUNNING MULTICAST FP_OUTPUT speed : 1G MAC : 00:00:50:4e:40:81 MTU : 1500 admin state : up oper state : up Transceiver : SGMII Rx packets : total : 2 bytes : 200 error : 0 Tx packets : total : 1349835735 bytes : 951617099392625 error : 0

Page 16: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

17 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

IP addressing: fsclish

- With fsclish you can check the IP addressing in the system

show networking instance vrfgi address . . . ethrtm1_1 type : dedicated address : 10.31.171.10/29 owner : /AS7-0 ethrtm1_2 type : dedicated address : 10.31.139.80/24 owner : /AS7-0

Page 17: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

18 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

check the networking-service configuration

• show networking-service dns

Page 18: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

19 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

host-ping

• fsclish provides the host-ping utility host-ping node-name AS7-0 source-vrf 1 destination-address 10.2.20.5 source-interface 10.131.179.241

ID of any existing VRF. The allowed VRF ID value range is from 1 to 599. The default VRF (ID = 0, Name = default) always exists in Flexi NG.

Page 19: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

20 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

host-traceroute

- fsclish provides the host-traceroute utility host-traceroute node-name AS7-0 source-vrf 1 destination-address 10.31.171.9 source-interface ethrtm1_1

Page 20: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

21 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Monitoring routes: fsclish

With fsclish you can configure Flexi NG and view configuration data, such as routes

root@CLA-0 [SAEMUC] > show routing instance default node AS7-0 route Codes: C - Connected, S - Static, I - IGRP, R - RIP, B - BGP, O - OSPF E - OSPF external, A - Aggregate, K - Kernel Remnant, H - Hidden P - Suppressed S 10.10.205/24 via 10.31.139.254 ethrtm1_4 cost 0 age 74222 S 10.31.136/24 via 169.254.0.110 bond0 cost 0 age 74953 C 10.31.139/24 is directly connected ethrtm1_4 S 10.31.171/29 via 10.31.171.14 ethrtm1_1 cost 0 age 74027 C 10.31.171.8/29 is directly connected ethrtm1_1 C 10.31.171.241/32 is directly connected lo C 10.31.171.242/32 is directly connected lo C 169.254/24 is directly connected bond0 C 169.254.1/24 is directly connected bond1

Page 21: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

22 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

tcpdump - Monitor traffic in network interfaces - See ”man tcpdump” for help - NOTE: As user plane traffic is handled predominantly in the fastpath environment, tcpdump cannot by default be

used for capturing user plane packets

# tcpdump -i ethrtm1_1 vlan 359 and proto 89

tcpdump: listening on ethrtm1_3

02:14:32.442122 I 802.1Q vlan#10 P0 10.2.10.33 > 224.0.0.5: OSPFv2-hello 80:

RID 10.2.10.33 backbone [|ospf] [tos 0xc0] [ttl 0]

02:14:32.833796 I 802.1Q vlan#10 P0 10.2.10.35 > 224.0.0.5: OSPFv2-hello 80:

RID 10.2.1.35 backbone [|ospf] [tos 0xc0] [ttl 0]

02:14:34.366025 I 802.1Q vlan#10 P0 10.2.10.34 > 224.0.0.5: OSPFv2-hello 80:

RID 10.2.1.34 backbone [|ospf] [tos 0xc0] [ttl 0]

^C

14 packets received by filter

0 packets dropped by kernel

#

Tcpdump is an essential tool for analysing IP-based traffic on all Unix platforms. The user supplies the input interface (physical or logical), optional parameters and a filter specification. Without filters, tcpdump shows all packets to both directions across the interface.

Page 22: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

23 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Displaying the runtime information for forwarding table • show networking [instance <vrf_name>] forwarding-table runtime node

<node_name> • show routing [instance <vrf_name>] route runtime mobile

Page 23: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

24 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Capturing data to files

- Network traffic can be captured into a binary file with –w <filename> with tcpdump

- Capture file can be read with –r <filename> option - In Service Blades everything is in a memory file system

• Store capture file temporarily e.g. under /var/tmp directory • Use filters to keep resulting file size small • /var/tmp refers to directory /mnt/mstate/AS7-0/var/tmp on the

CLA from where the AS node has mounted its filesystem (over NFS)

Network traffic can be captured with tcpdump into a binary file with –w <filename> option. The file can be read later on with –r <filename> option. Note that especially if you do not apply any filters, the resulting file can grow quite large if there is a lot of traffic.

Page 24: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

25 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Capturing user plane packets with tcpdump

• tcpdump cannot by default capture user plane traffic because it is a Linux process and user plane packets are handled in the fastpath environment

• When the tcpdump is set to capture user plane traffic, the packets matching the tcpdump filter are passed from fastpath to the control plane (Linux environment)

• Hence, it can should only be used with filters that limit the captured traffic to the necessary only

• User plane traffic capturing will cause performance degradation • Should be disabled immediately after completing the debugging

- To enable tcpdump for user plane traffic fsfastpath set-tap-enable on

Page 25: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

26 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Subscriber trace

• The operator can activate subscriber trace in Flexi NG • Traffic capture for a specific IMSI • Up to 50 subscribers can be traced simultaneously • Alarm threshold for the used disk space can be set

- Different types of data can be collected for a subscriber • Signaling traffic capture • User plane traffic capture • Low-level logs consists of Event logs and Internal logs

- The trace files are stored in /var/log directory on the CLA where the /TraceCtrl RG is active

- The user plane captures are encrypted by default

Once configured, the trace activation is automatically distributed to each service blade in the system, and collected data is aggregated in the management blade.

Page 26: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

27 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Subscriber trace logs

• Used to gather detail info in the system for TSH of suspected software fault • Event logs

– generate information about software events and IMSI database dump – Possible software events : subscriber create, session create, bearer create, subscriber update,

session update, bearer update, subscriber delete, session delete, bearer delete – In combined S-GW and P-GW mode, only one actual subscriber session exists, only one set of

event logs produced. – The file name format: TRACE_imsi_date_time_EVENT.txt

• Internal logs • collect low-level internal log entries as configured in the generic log level configurations • file name format: TRACE_imsi_date_time_INTERNAL.txt

Subscriber trace logs are used to gather detailed information about what happens in the system for a particular end-user. This information is mainly intended for troubleshooting purposes when a fault in software is suspected, and should only be enabled when advised by NSN Customer Support.

Page 27: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

28 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Event logs Subscriber trace can generate information about software events and IMSI DB dump related to a traced subscriber. The events are always related to external signaling. A single event log can represent single or multiple actual request/response messaging flows which are exchanged during the overall event. Due to this, an event log does not indicate the actual signaling that took place. The software events are structured according to the affected entity (subscriber, session, and bearer). For example, a single new PDP context creation for a subscriber without existing connections triggers three events: one create event for a subscriber, one for a session, and one for a bearer. This allows natural traceability, for example, for primary and secondary PDP contexts (and for default and dedicated bearers in LTE). The following software events can be logged for each traced subscriber: • subscriber create • session create • bearer create • subscriber update • session update • bearer update • subscriber delete • session delete • bearer delete The IMSI DB dump event can be logged for each traced subscriber for the following scenarios: • session create • session update • session delete

Subscriber trace logs, cont.

Page 28: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

29 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

In combined S-GW and P-GW mode, only one actual subscriber session exists. Thus, only one set of event logs is always produced. The file name format for event logs is TRACE_imsi_date_time_EVENT.txt Internal logs Subscriber trace can collect low-level internal log entries which are related to the traced subscriber. The internal logs are generated as configured in the generic log level configurations. Configuration of those log levels are independent of the subscriber trace feature configuration. When the subscriber trace is configured to collect the internal logs for a certain subscriber, the system, that follows the generic log level configurations, generates logs for this subscriber. These logs are then forwarded to the subscriber trace output files. Specific log entries collected with subscriber trace functionality are also visible in the system level log files. All the application level logs related to a particular subscriber must be traced, but the common system level logs are visible only in the syslog. The file name format for event logs is TRACE_imsi_date_time_INTERNAL.txt

Subscriber trace logs, cont.

Page 29: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

30 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Capture points for subscriber trace

Page 30: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

31 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Configuring subscriber tracing

- To enable the feature in general set ng feature subscriber-trace-functionality enable

- To activate the tracing, you need to specify the IMSIs and the data that will be collected for the user add ng trace subscriber-trace imsi 244060000005021 gather-events enable gather-logs enable gather-signaling-traffic enable gather-user-traffic enable

- To generate the trace files you need to disable tracing set ng trace subscriber-trace imsi 244060000005021 gather-events disable gather-logs disable gather-signaling-traffic disable gather-user-traffic disable

Page 31: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

32 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Disk usage monitoring for subscriber trace - To set an alarm threshold for disk usage

• When the configured maximum size is reached, all traces for all subscribers are paused and alarm 71507 OUT OF DISK STORAGE is raised

set ng trace general subscriber-trace-max-disk-usage <subscriber-trace-max-disk-usage in MB>

• The centralized part of the subscriber trace feature (trace_ctrl process) • The disk usage monitoring is only active on the CLA on which the

centralized trace_ctrl process is active Tracing resumes automatically again when the monitoring function detects that the aggregate file size is below 80% of the configured maximum. Integer: 100 - 4000. The default value is 2000.

Page 32: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

33 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Session database dump

- With fsclish it is possible to check the session database using a subscriber’s IMSI, MSISDN or UE IP as the key

show ng trace data-base-dump filter imsi <imsi> show ng trace data-base-dump filter msisdn <msisdn> • show ng trace data-base-dump filter vrf <vrf>

[addressv4<addressv4>] [addressv6 <addressv6>] • Output is automatically adapted to configured Flexi NG mode (different in GGSN and LTE) • Contains information for subscriber, session and bearer(s)

• Subscriber: IMSI, MSISDN, IMEISV, Node • Session, e.g.: RAT type, Assigned IP Address, APN, session profile, active PCC rule bases, … • Bearer, e.g.: TEID, QoS settings, throughput counters, dynamic PCC rule names,

Page 33: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

34 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Triggering individual session disconnections

• Triggering individual session disconnections based on the subscriber’s IMSI (full IMSI required, no wildcards allowed).

• Optionally, and in addition to the subscriber‘s IMSI, session disconnections can be triggered based on the EPS bearer identification (EBI) value or Network layer Service Access Point Identifier (NSAPI) value.

• In case default bearer is disconnected with the command, the related dedicated bearers are also disconnected

- Valid for the GGSN, P-GW, S-GW and combined S-GW and P-GW modes. - Trigger session disconnections using command: set ng-admin subscriber disconnect imsi <imsi> [bearer-id <bearer-id>] - Only one disconnection CLI command can be executed at a time - Visible in external signaling with suitable cause code values and in CDRs (closing

reason)

Page 34: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

35 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Session disconnections are triggered based on the subscriber’s IMSI and optionally, based on the EBI/NSAPI value using the ng-admin subscriber disconnect command. This command is responsible for triggering and not performing the actual session disconnections. Triggering session disconnection configuration instructions are valid for the GGSN, PGW, S-GW and combined S-GW and P-GW modes. The S-GW node does not trigger dedicated bearer deletion messages towards the PGW node. In the case of P-GW or combined S-GW and P-GW mode, where an S5 PMIP PDN connection linked to the specified IMSI exists, EBI triggering is not applicable since EBI is currently not supported by the PMIP protocol. In such cases, the ng-admin subscriber disconnect command silently ignores the parsed EBI (if one is given as input) and all bearers connected to the indicated IMSI are triggered for deletion. In the case of combined S-GW and P-GW mode, with S-GW and P-GW in different nodes, it is possible that bearer disconnection is triggered in one node (for example, SGW) and the same node triggers bearer deletion messages towards the other node (PGW) before the latter is triggered for session disconnection by the ng-admin subscriber disconnect command. In such a case, the bearer deletion sequence in the second node (P-GW) is actually triggered by the first node (S-GW) instead of the CLI command. For the ng-admin subscriber disconnect command, the following restrictions apply: • Must not be used to trigger multiple but only occasional disconnections. Multiple disconnections are not supported and

therefore, must not be attempted by, for example, creating automated loops. • Not supported during in-service upgrade.

Triggering individual session disconnections, cont.

Page 35: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

36 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Monitoring snapshot statistics

- Flexi NG collects runtime statistics to a specific part of the file system • /cdafs/pmfs pseudo-filesystem structure contains a subdirectory for

each KPI - You can check the current statistics, e.g., with the following command that

provides a snapshot of the statistics related to a specific session profile show stats data current name 3000_Session_profile/NODE-AS10-0/SESSIONPROFILES-SUM/snapshot.txt

Page 36: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

37 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

To monitor the user plane CPU load

• The output shows the fastpath CPU load on a given node • show fastpath-cpu-load node-name AS7-0 • The load is displayed as a percentage of the maximum capacity

• The load consists mostly of user plane traffic, but internal messaging and control plane signaling traffic can have a minor influence on the percentage

Page 37: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

38 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Hardware management tools

• The HW elements of the system are principally managed with the ATCA Shelf Manager CLI (clia). Please refer to the NED documentation about clia commands.

• Some HW commands are included in fsclish show hwi {brief|verbose} [container] show hardware state list show hardware state node <node name> set hardware power on node <node name> set hardware power off node <node name> set hardware restart node <node name>

- FlexiPlatform hwcli tool can also be used for HW maintenance, e.g. hwcli -t

Page 38: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

The Logging System

Page 39: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

40 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Logging in the FlexiPlatform • Each node runs a copy of the syslog-ng daemon • All nodes send their logs to the central syslog-ng daemon running on the active CLA

Application

/dev/log

NE

Active CLA

Proxy syslog-ng syslog-ng

daemon /srv/Log/log/

/var/log/local

TCP 610

Application

/dev/log Local logs

Cluster-wide logs

/var/log

If local filesystem exists

Proxy syslog-ng

Page 40: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

41 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Syslog in the FlexiPlatform 5 Cat Each node runs a syslog-ng proxy that reads logs from /dev/log and kernel messages from /proc/kmsg. The syslog-ng proxy forwards all the logs over TCP to syslog-ng master that runs in the active CLA node. The CLA where the Recovery Group /Log has the Recovery Unit (FSLogServer) running in active role is considered to be active from the log system point of view. Additionally the syslog-ng proxies write all logs to a named pipe /tmp/coroner_fifo which is used by the coroner process to locate log entries that should be converted into alarms. If the node has local disks the syslog-ng proxy also writes logs to the local disk (into the /var/log directory). In Flexi NG, only the CLAs own local disks. The syslog-ng master listens to TCP port 601 and writes the logs to the local disks in the active CLA. The directory for these log files is /srv/Log/log/. There is the master-syslog file that contains cluster-wide log records. In addition, for each node a separate log file is automatically generated (e.g., syslog-AS10-0.log). The syslog-ng proxies and the syslog-ng master are instances of the same executable. They use different configuration files to differentiate their behavior. The syslog-ng proxies are started by the operating system’s init scripts when the node is booted and use the Node IP address as their source IP address. The syslog-ng master, is started by HAS and it binds to the redundant IP address of /Log recovery group.

Logging in the FlexiPlatform, cont.

Page 41: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

42 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

• Distributed components running on every node, which locally collect the node’s input data, and forward the data to the centralized server located in a management blade.

• Centralized server located in a management blade, which aggregates the data coming in from all nodes, and presents the data in the format suitable for each standard interface.

• The external interfaces are available from the management blades, providing full access to aggregated alarms, statistics and logs

Page 42: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

43 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Log Recovery Group

- The /Log Recovery Group runs on Active / Coldstandby redundancy on the CLAs • Runs on the same node as /SSH Recovery Group (stalker)

- The CLA where the active FSLogServer Recovery Unit is running updates the cluster-wide logs in the /var/log/master-syslog file • Soft link to file /srv/Log/log/syslog • Node-specific syslog files are automatically extracted from the same file

(e.g., syslog-AS10-0.log)

Page 43: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

44 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

# fshascli -v /Log /Log: RecoveryGroup /Log specialConstraints=(serviceInterruptionRequiresForce) RecoveryUnit /CLA-0/FSLogServer recoveryUnitType=(HARecoveryUnit) Process /CLA-0/FSLogServer/MasterSyslogDaemon command=(/sbin/syslog-ng -F -p /var/run/master-syslog-ng.pid -f /etc/syslog-ng/master-syslog-ng.conf) status=(nonHA) startMethod=(requested) severity=(modest) RecoveryUnit /CLA-1/FSLogServer recoveryUnitType=(HARecoveryUnit) Process /CLA-1/FSLogServer/MasterSyslogDaemon command=(/sbin/syslog-ng -F -p /var/run/master-syslog-ng.pid -f /etc/syslog-ng/master-syslog-ng.conf) status=(nonHA) startMethod=(requested) severity=(modest)

Log Recovery Group, cont.

Page 44: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

45 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Log files The log directory contains

– Cluster-wide system logs – Local logs for the CLA – AS node specific logs – Debug logs – Alarm logs – Audit logs

File Contents

auth.log All logs with facility AUTH and with any severity debug All logs with severity set at DEBUG except for facilities AUTH and AUTHPRIV

syslog All logs with severity set between INFO and EMERG except for facilities AUTH and AUTHPRIV

User’s console All logs with severity set at EMERG

Note: The logs are rotated weekly The log files will be rotated if the log file size exceeds 50 Megabytes Copies of 10 previous log files are saved and compressed

Page 45: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

46 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Syslog facilities

• Syslog entries are generated with two parameter to aid in filtering the desired logs into different log destinations (files, external servers, console…)

• The parameters are known as facility and loglevel • The facility defines what type of an application generated the log • The level defines the importance or criticality of the log entry

– Log-error (the default in NG10) – Log-info – Log-debug – Log-debug2

Page 46: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

47 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Syslog Facility Source of logs

auth or authpriv Login authentication

cron cron subsystem

daemon System server processes

kern The Linux kernel

ftp Messages from FTP daemons

lpr Print spooling subsystem

mail mail subsystem

news news subsystem

syslog Messages from syslog itself

user The default for unspecified log entries

uucp From the uucp applications

localN locally defined facilities (N=0-7)

Syslog facilities, cont.

Page 47: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

48 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Syslog facilities, cont. Syslog level

Description

EMERG

A panic condition. This is normally broadcast to all users

ALERT

A condition that should be corrected immediately

CRIT

Critical conditions, e.g., hard device errors

ERR

Errors

WARNING

Warning messages

NOTICE

Minor errors that do not necessary require special attention

INFO Informational messages

DEBUG normally uses only when debugging a program

Page 48: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

49 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Configuring the log levels

- The log levels can be configured in fsclish • Per node • Per process

- For example, to enable debug logs on a node for a specific process set ng trace general log-level process session-controller node AS7-0 log-level log-debug

NOTE: It is recommended that only the log-error log level is used in real runtime environment. The other log levels should only be used in a laboratory test environment because log configurations can have a negative effect on performance.

Page 49: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

50 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Sending Logs to an External Log Server

• The syslog-ng in the FlexiPlatform can be configured to send a copy of all log messages to a centralised external syslog-ng server

• To configure the functionality, add the following lines to the central syslog-ng configuration file:

/etc/master-syslog-ng.conf # Additional destination definitions

################################## destination udp-to-logserver { udp("logserver-ip-or-name" localip("Directory") port(610) ); } destination tcp-to-logserver { tcp("logserver-ip-or-name" localip("Directory") port(610) ); }; # Additional log commands

############################################# log { source(src); destination(tcp-to-logserver); };

Page 50: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

51 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Operator log

• System processes write information about events in chronological order to operator log • These logs are meant for operating personnel as an indication of specific problems for example

regarding external interfaces, and as the usage is strictly defined and controlled they are all enabled by default

• can be accessed in the /var/log/syslog-operator.log • This file contains log entries of pmip-sig, gtp-c, dia_client, radius_client and dhcp-sig processes • All possible data is not recorded to syslog-operator.log file, as this would cause a slowdown on the

system. Therefore, operator log levels are used to generate only the needed operator log information. Log Level Log Code Description

notice LOG_NOTICE Written for error response messages coming from peer network elements

info LOG_INFO Mainly used for logging incoming and outgoing messages, both internal and external.

Page 51: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

52 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

More detailed information can be acquired by configuring lower level logs, but these will have significant performance impact. Due to this, a new concept (with limited scope and coverage) called operator logs have been introduced. Example of operator log entries on AS nodes, at the log level notice: Apr 23 12:24:12 notice AS7-0 session_ctrl[1544]: [0]: [2181038110]: FAILURE_RESP_RECV_FROM_PEER IF=GTPV2 IMSI=240080000000001 IMEI=001000800000010 MSISDN=467080000000001 SourceAddress=IPV4=7.23.186.31 AdditionalInfo=Create Session Response received with Cause SYSTEM_FAILURE[72] (gtp_sc_gtpc_if.c:682) //206515 Apr 23 12:29:52 notice AS7-0 session_ctrl[1544]: [0]: [2181038142]: FAILURE_RESP_RECV_FROM_PEER IF=GTPV2 IMSI=240080000000001 IMEI=001000800000010 MSISDN=467080000000001 SourceAddress=IPV4=7.23.186.13 AdditionalInfo=Update Bearer Response received with Cause REQUEST_REJECTED[94] (gtp_sc_gtpc_if.c:620) //233340 Apr 23 12:34:19 notice AS7-0 session_ctrl[1544]: [0]: [2264924168]: FAILURE_RESP_RECV_FROM_PEER IF=PMIP IMSI=240080000000001 IMEI=001000800000010 MSISDN=467080000000001 SourceAddress=IPV6=2001:490:FF0:C203:0:0:717:BA11 AdditionalInfo=PROXY BINDING ACKNOWLEDGE received with status PROXY_REG_NOT_ENABLED[152] (pmip.c:669) //243466

Operator log, cont.

Page 52: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

53 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Apr 23 13:27:58 notice AS7-0 session_ctrl[15566]: [0]: [2181038085]: FAILURE_RESP_RECV_FROM_PEER IF=GTPV1 IMSI=240080000000001 IMEI=4901371095200109 MSISDN=91240080000000001 SourceAddress=IPV4=7.23.184.11 AdditionalInfo=Update PDP Context Response received with Cause SYSTEM_FAILURE[204] (gtp_sc_gtpc_if.c:464) //35368 Apr 23 12:54:21 notice AS7-0 session_ctrl[1544]: [0]: [2550136848]: FAILURE_RESP_RECV_FROM_PEER IF=DHCPv4 IMSI=240080000000001 IMEI= MSISDN=91467080000000001 SourceAddress=IPV4=8.23.186.18 AdditionalInfo=Received DHCPNAK message (dhcp.c:138) //299160 Apr 24 08:29:10 notice AS10-0 session_ctrl[13046]: [0]: [2298478702]: FAILURE_RESP_RECV_FROM_PEER IF=DIAMETER IMSI=240080000000001 IMEI= MSISDN=91467080000000001 SourceAddress=IPV4=8.23.186.16 AdditionalInfo=CREDIT CONTROL ANSWER(Update) Failure Response Received with Cause DIA_RESOURCES_EXCEEDED[5006] on Gx interface from PCRF (dia_gx.c:4359) //318513 Apr 24 08:56:43 notice AS10-0 session_ctrl[13046]: [0]: [2164260892]: FAILURE_RESP_RECV_FROM_PEER IF=RADIUS IMSI=240080000000001 IMEI=001000800000010 MSISDN=467080000000001 SourceAddress=IPV4=8.23.186.12 AdditionalInfo=Radius Access Reject Response received with Cause RC_REJECT_FROM_SERVER[12] (sc_if_radius.c:2154) //364294

Operator log, cont.

Page 53: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

54 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Configuring operator log levels

• Set the operator log level with the following command: • set ng trace general operator-log log-level <log-level> • Use grep ‘<string>’ <file name> command to view operator log

entries in different fields. • Use less <file name> command to print the contents of a operator log to

the screen page by page. • Use tail <file name> command shows the last entries of a operator log,

which can be helpful for identifying faults that have just occurred

Page 54: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

55 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

MySQL logs

- In Flexi NG, the alarms are stored in MySQL database - For troubleshooting purposes, the database log files can be useful - They are located in /var/mnt/local/MySQL_DB_Alarm

• mysql.err (the MySQL server log file) • my.cnf (MySQL config file) • odbc.ini (ODBC configuration file)

Page 55: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

56 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Audit Trail Log

• Provides a centralized security event logging mechanism • Mainly used for authentication and authorization related events • Utilizes syslog as a transport • Logs are stored in /var/log/auth.log on the CLA where the

FSLogServer RU is running active – Soft link to /srv/Log/log/fsaudit/auth.log

• Logs are easily modifiable to NetAct format and transferrable to NetAct • Protected access to root and accounts belonging to _nokfsuiseclog

group

Page 56: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

57 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Configuring the Audit Trail Log root@CLA-0 [FP] > set logging auditlog [MX] message - Message that will logged in the syslog. All white space characters found in

the message will be replaced by a single space. Note : Include the message in inverted commas. eg. "Any Message".

[OX] address - Remote address. By convention, it is the address of the remote client. [OX] audit-id - audit id. [OX] executable - executable file name. [OX] facility - facility value. The default is authpriv. [OX] hostname - Remote hostname. By convention, it is the hostname of the remote client, if

it is available. [OX] priority - Message priority. This is optional, but if left out, info will be used by

default. [OX] process-id - process id. [OX] server-address - Server address. By convention, it is the address used on the server. Useful

for multi-homed applications. [OX] server-port - Server port. By convention, It is the port on the server from which the

request came. [OX] session-id - session id. [OX] target-id - target id. [OX] user - user name. [OX] user-id - user id.

Page 57: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

58 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Data Provided by Audit Trail Httpd

• Authentication logs • Access logs

SCLI, RUIM, sudo • Authentication logs • Access logs

FTP • Authentication logs (direct , NE3S) • Access logs (direct , NE3S)

LDAP • Authentication logs • Access logs

SSH • Authentication logs

Page 58: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Alarm Management

Page 59: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

60 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

What is Alarm Management? • Part of Fault Management (FM) • FM detects, isolates and corrects failures (if possible) • An application uses alarms to

– Indicate faults that require corrective actions – Indicate a potential or impending fault

• The corrective action can be: – Automatic (for example HAS performs a switchover) – Manual (for example the operator fixes a fault by replacing a broken component)

The alarm system indicates potential faults in the system as well as faults that require corrective actions. After an alarm is raised, the fault causing the alarm must be solved. The solution can be an automatic recovery or a manual corrective action. For a potential fault, the solution consists of preventive actions. Alarms are typically used in situations where it is possible to give instructions for corrective actions in the alarm description, such as replacing a hardware unit. If a system restart is needed, the whole alarm processing will be restarted, because no alarm stays active over a system restart.

Page 60: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

61 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

NE

WebUI NetAct Mgmt App.

Alarm DB (MySQL)

Alarm Agent

Tomcat NE3S

Alarm Agent

SCLI

SCLI

Alarm Processor

Syslog Convenience library

Application

Alarm Agent

The Alarm System

Page 61: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

62 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

FlexiPlatform Alarm terminology • The alarms occurring in the FlexiPlatform are based on 3GPP 32.111 series Release 4 specifications

– Most of the 3GPP requirements are based on ITU-T recommendation X.733 • An Alarm is generally a single instance which can be uniquely identified from other alarms of the

same type by instance specific attributes (Specific ID, Managed Object, Time etc.) • Alarm Type defines a class of alarms that define the specific problem (70001), event type

(communications), Severity (Minor) and other type specific attributes that is shared among alarms of the same alarm type

• Alarm event is an occurrence of a task performed on an alarm by an application or operator – For example: raising, clearing or acknowledging an alarm

Type-specific alarm parameters are either dynamic or static. You can only modify the dynamic parameters, which are • Default Severity • Autoacknowledged • Clearing Delay • Informing Delay • Time to Live • Switchover Update

Page 62: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

63 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Communication through syslog

• Alarm raise and clear notifications are sent by applications through convenience libraries – The libraries place the alarms in syslog

• Syslog records are forwarded to a dedicated file (alarms) • Alarm processor parses alarm notifications from the alarms file

– The cluster-wide alarms are located on the CLA where the FSLogServer RU is running active in the file /var/log/master-alarms

– Soft link to /srv/Log/log/fsaudit/alarms

Following is the syslog configuration for the alarm filtering to a file (in this case local alarms):

destination local-alarms { file("/var/log/alarms" template("$FULLDATE $MSGONLY\n") template_escape(no) perm(0644)); };

filter raiseftr { match("ALARM RAISE"); }; filter clearftr { match("ALARM CANCEL“); }; log { source(src); filter(raiseftr); destination(local-alarms); }; log { source(src); filter(clearftr); destination(local-alarms); };

Page 63: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

64 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Alarm Processor (SS_AlProcessor)

• Alarm processor is the alarm system’s core • It is a standalone java application • It processes alarm notifications from the alarm file and stores them persistently

in the alarm system database • It processes notifications by grouping them into batches for handling

correlation dependencies between events

Page 64: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

65 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Alarm Processor – Status Check

• The alarm processor cannot raise alarms about its own failure • The alarm processor is monitored via heartbeats • Heartbeating assumes raising/clearing a special heartbeat alarm • Heartbeat alarm (70246) is raised/cleared (70247) repetitively through

predefined time interval (5 minutes by default) • Lack of 70247/70246 notifications indicates a failed alarm processor

2006 May 3 09:24:03 ALARM CANCEL SP=70247 MO=MOID Wildcard AP=fshaProcessInstanceName=AlarmProcessor,fshaRecoveryUnitName=FSAlarmSystemServer…. 2006 May 3 09:24:03 ALARM RAISE SP=70246 MO=fshaProcessInstanceName\

=AlarmProcessor,fshaRecoveryUnitName=FSAlarmSystemServer,\...

Page 65: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

66 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Managing Alarms

• Alarms have three attributes to determine the status of the alarm: – Acknowledgement status – Severity – Clearing status

• A new alarm is always unacknowledged and not cleared with the severity level being one of the allowed values

• When an operator sees an alarm and starts to work on it, the alarm should be acknowledged (i.e. a human is working on it) by using the alarm browser

• If the cause alarm is not fixed by the operator, the operator can unacknowledge the alarm to signal that the problem is not being worked on

• Once the cause of the alarm has been solved, the alarm can be cleared by the operator using the alarm browser or automatically by the system after a predefined time

Page 66: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

67 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Alarm data • Based on 3GPP 32.111 series Release 4 specifications ITU-T recommendation X.733 • Alarm types

- Type-specific - Instance-specific

• Type-specific alarm type - Defines attributes specific to an alarm type - Static or dynamic - Only dynamic parameters can be modified

• Instance-specific alarm type - A specific alarm raised by an application

The following dynamic parameters of type-specific alarm type can be changed dynamically • Default Severity • Autoacknowledged • Clearing Delay • Informing Delay • Time to Live • Switchover Update

The type-specific alarm parameters and their values are listed in the Type-specific alarm parameters table. The instance-specific alarm parameters and their values are listed in the Instance-specific alarm parameters table.

Page 67: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

68 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Alarm filtering and correlation • Filtering based on identifying fields • Alarm identifying fields

- Managed Object Id - Specific Problem - Identifying Additional Information - Application Id

• Alarm processor makes the alarm correlation by using the information provided by high availability services

- During repair or recovery action - Change of state of the MO

When two alarm instances have the same values for all the identifying fields, the alarm instances are interpreted to be the same alarm. If the same alarm is raised when it is already active, the new alarm is filtered out. An exception to this case is when an alarm is repeated with different severity. In this case, the alarm is considered changed and is updated in the alarm system database.

Page 68: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

69 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Severity levels of the alarms

• Critical - Service-affecting condition occurred and immediate action required

• Major - Service-affecting condition has developed and urgent corrective action required

• Minor - Condition does not affect services and corrective actions are needed for preventing

serious faults • Warning

- Indicates the detection of a service-affecting fault before detecting any defects • Intermediate

- Indicates that severity of the alarm cannot be indicated • Cleared

- Indicates one or more previously reported alarms

Page 69: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

70 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Viewing alarm data with fsclish

• Check the summary of active alarms. • show alarm active-summary brief

root@CLA-0 [FP] > show alarm

[X] active - shows the list of all active alarms

[X] alarmcount - the count of the number of Alarms

[X] history - shows the alarm history

[X] historycount - the count of the number of Alarm Events (History)

Page 70: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Backup and restore

Page 71: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

72 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Backup types - Full

- The full backup contains the file system image as well as configuration database • copy of software volumes (system image), • configuration volumes • the master state volume • application file systems • plug-ins • databases

- Full backup can be used for restoring the system from scratch - It is recommended to make a full backup after commissioning, before and

after SW upgrade - Backup files need to be moved to a safe place - Old backup files should be deleted from the system hard disk

Page 72: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

73 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Backup types - Partial

- In the partial backup the administrator can manually specify which parts of the file system are backed up • configuration volumes • the master state volume • application file systems • plug-ins • databases

- The partial backup cannot be used for restoring a crashed system but, e.g., a corrupted database can be restored

- A partial backup should be made at regular intervals - Backups can be scheduled with the standard cron utility

Page 73: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

74 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Backup file - Backup file is created in directory /mnt/backup on the CLA hard disk

• ISO file • NetworkElementName_<full|partial>_backup_YYYYMMDD_hhmm.iso

- The image contains a METADATA file, base image for mips64 architecture and a base image for i386 architecture. • Thus the backup image may contain several software volumes (one for each

architecture) as follows: —rw—r——r—— root/root 95 2007-08-23 METADATA —rw—r——r—— root/root 30318904 2007-08-23 R_FP5_1.29.i386.img.gz —rw—r——r—— root/root 50324321 2007-08-23 R_FP5_1.29.mips64.img.gz —rw—r——r—— root/root 50324321 2007-08-23 R_FP5_1.29-INITIAL.img.gz

• The METADATA file contains information specific to the backup iso image, such as the delivery label, type of each image contained, the size of each image and the name of the backup creator

Page 74: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

75 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Making a backup - Preparations Before taking a backup, check that

• You have root access rights • The network element is up and running in its normal working state. • There is enough free disk space for the backup archive file. You can estimate the

amount of disk space needed from the previous backup archive files. The disk space you need during a full backup is at least twice the size of the old backup archive file.

• If necessary, free disk space by transferring backups to an external server and deleting unnecessary files.

• The database recovery groups are unlocked. To check whether the database recovery group is locked or unlocked execute the following command:

show has state managed-object <mo-name> • To unlock the database recovery group execute the following command: set has unlock managed-object <mo-name>

Page 75: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

76 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Making a full backup

- Log in the CLA where the Recovery Unit FSClusterStateServer has active role • Establish an SSH connection to the directory service ssh directory

- Check the status of the Cluster State recovery unit. To check the status on CLA node: show has state managed-object /CLA-[0,1]/FSClusterStateServer • If the Cluster State recovery unit is not active on the current node, you must

perform a switchover of the recovery units with set has switchover managed-object /CLA-[0,1]/FSClusterStateServer

- A full software backup is made by executing the following commands: start backup full commit backup

- Check from the backup logs that the backup succeeds - The backup file is created in /mnt/backup directory

Page 76: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

77 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Making a partial backup 1/2

- Log in the CLA where the Recovery Unit FSClusterStateServer has active role • Establish an SSH connection to the directory service ssh directory

- Check the status of the Cluster State recovery unit. To check the status on CLA node: show has state managed-object /CLA-[0,1]/FSClusterStateServer

• If the Cluster State recovery unit is not active on the current node, you must perform a switchover of the recovery units with

set has switchover managed-object /CLA-[0,1]/FSClusterStateServer

Page 77: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

78 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Making a partial backup 2/2

- You can start the default type of partial backup with start backup partial commit backup

• Note: default partial backup does not include the backup of software delivery

- You can also manually specify what parts of the file system you want to backup start backup selective add backup delivery add backup config add backup state add backup filesystem add backup plugin add backup database commit backup

Page 78: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

79 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Backup logs

- A backup operation creates entries in the following logs: • The syslog in the /var/log directory. • A cumulative backup.log in the /mnt/backup/log directory. • A backup-specific NetworkElementName_<full|partial>_backup_YYYYMMDD_hhmm.log in the

/mnt/backup/log directory. - The syslog contains the following kinds of entries relating to the backup process:

• When the backup was started and by whom. For example: Feb 17 10:49:42 info CLA-0 fsbackup INFO full backup started

• Information on whether the backup process was interrupted. For example: Feb 17 12:34:59 err CLA-0 fsbackup ERROR user interrupted

• The result of the backup process (succeeded or failed). For example: Feb 17 14:47:33 info CLA-0 fsbackup INFO created backup file

/mnt/backup/VirtCluster_partial_backup_20090217_1440.iso Feb 17 14:47:34 info CLA-0 fsbackup INFO partial backup completed

successfully Feb 17 10:54:04 err CLA-0 fsbackup ERROR full backup failed

Page 79: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

80 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Transferring the backup file to external server

- It is recommended to move the backups to a safe location such as an external file server

- In the active CLA node, compute an MD5 checksum of the existing backup file, e.g. md5sum ATCA19_1.23.rw.2735_partial_backup_20090515_1326.iso >

backup_20090515_1326.md5 - Transfer the backup image and the checksum file to the desired location with scp scp /mnt/backup/<backup image> <username>@<external server IP address>:<target location> scp /mnt/backup/<checksum file> <username>@<external server IP address>:<target location>

Page 80: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

81 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Restoration

- Complete system crash requires a valid full backup file and recommissioning of the system • A broken system image might also cause a reboot loop. This may be due

to a logical disk crash, a corrupted file system or the accidental removal of files or directories on the system image

- A database, for example, can be restored without recommissioning from a full or partial backup file

Page 81: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

82 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Restoring the whole system - Complete system crash requires restoration from a full backup - Before starting, verify that

• You have root access rights • The installation medium, that is, the field engineering workstation (FEWS),

is available • A full backup image is available in the external storage server • Delivery label is not changed during commissioning

- In a full restoration the system is re-commissioned by using a backup image as base build • Check the Commissioning Guide for detailed instructions

- After the commissioning is finished, the following commands are executed: start restore backup-iso <backup.iso> full commit restore

Page 82: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

83 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Partial restoration - The partial restoration is performed in the runtime environment to restore

• broken software volumes • configuration volumes • master state volume • Application file systems • plug-ins • Databases

- Log in the active CLA node and perform the restoration commands start restore backup-iso <backup-iso> partial commit restore

• Note: default partial restoration does not include restoring software delivery

A partial restore may be necessary, for example, if the databases are faulty, but the other parts of the system work normally and the network element is accessible. Partial restore is done in the runtime environment and can be performed either from a partial backup image or from a full backup image.

Page 83: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

84 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Selective partial restoration

- You can also specify more accurately which parts of the file system to restore start restore backup-iso <backup.iso> selective add restore delivery add restore config add restore state add restore filesystem add restore database add restore plugin commit restore

Page 84: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Upgrading Flexi NG software

Page 85: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

86 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

In-Service Upgrade • In Flexi NG, the upgrades between sequential CD levels can be performed without service downtime in

2N HA deployment • Always-on PDP contexts/sessions are supported also during upgrades

– No service break for end-users

Sessions

Services

OK

OK

OK

OK

OK

Flexi NG

SB

Octeon/Linux

gtpc_sig session_ctrl

radius_client

gwup_proxy

lib_client

Octeon/SE

gwup

MB

x86/Linux

cdr_collector

lic_client

conf_observer

Gn

Gn

Bp X1_1

X2, X3

Gi

Radius

Page 86: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

87 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

High level 2N service node upgrade procedure

Transaction sync

1) Active and standby running same sw delivery

2) Standby node is upgraded (incl. reboot)

3) Active peer detects that standby is again up, and starts data warming

4) After data warming is complete, a controlled switchover (SWO) is initiated

5) Repeat step 2

6) Repeat step 3 (sync in opposite direction)

Data warming

Switchover

Node is down

Page 87: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

88 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Software upgrade – ISU Rollback

• The in-service upgrade rollback procedure cancels the ongoing in-service upgrade

• The rollback can only be performed before the upgrade has been committed with the commit sw-manage new command

• If the upgrade has already been committed, you need to downgrade the delivery

• The rollback command rollback sw-manage reverses the upgrade procedure. This results in the following actions:

• The old software delivery is activated in the nodes running the new delivery, including a reboot of the nodes

• The upgrade state exits • The configuration of the old delivery is restored.

Page 88: 04 CN33574EN31GLA0 FNG31 Maintenance Procedures and Tools

89 CN33574EN31GLA0 ©2014 Nokia Solutions and Networks. All rights reserved.

Downgrading deliveries

• The Flexi NG can have multiple SW deliveries installed at the same time. • Operator can choose one of the installed deliveries as startup software • The downgrade to a previous SW delivery requires system restart (service

break) 1. Operator selects previous delivery (sw_v1.0) as startup image 2. Operator restarts Flexi NG 3. System boots with previous SW image