OS90515 NetAct Admin Tasks and Monitoring

49
OS90515EN81GLA0 ©2014 Nokia Solutions and Networks. All rights reserved. System Self Monitoring and Administration Tasks NetAct 8 Administration 1 OS9051-81A

description

OS90515 NetAct Admin Tasks and Monitoring

Transcript of OS90515 NetAct Admin Tasks and Monitoring

Folie 1System Self Monitoring and Administration Tasks
NetAct 8 Administration 1
Nokia Solutions and Networks Academy
Legal notice
Intellectual Property Rights
*
Module Objectives
At the end of this module the student will be able to:
Describe the entities supervised by Self Monitoring.
Manage the Self Monitoring Subsystem.
Describe the functionality of HP Systems Insight Manager in the Self Monitoring subsystem.
Report the system services status of NetAct.
Examine the current state of NetAct system through the summary reports and los of the Preventive Health Check tool.
Set up the global view and management of logs using the Log Monitor Tool.
Prepare NetAct system for Hardware Maintenance.
Manage NetAct system Shutdown
Module Content
NetAct Self Monitoring
NetAct Self Monitoring
Overview
Self Monitoring is based on the O&M Agent (version 3.4)
Self Monitoring groups the mechanisms by which NetAct monitors itself in both hardware and software
Self Monitoring produces alarms (FM) and counters (PM) that are administered and consumed by NetAct itself
The results of the self-monitoring can be seen in the performance management reports for the PM data and as alarms for the FM data
*
NetAct Self Monitoring
NetAct Self Monitoring
Hardware Monitoring
The monitored NetAct hardware is integrated during NetAct installation to HP SIM, which monitors NetAct’s own hardware. When the integration is complete, the HP SIM service starts to monitor NetAct hardware resources, including the servers hardware, DCN routers and switches, firewalls, and storage devices.
HP SIM triggers alarms in NetAct Monitor when a pre-defined alarm severity criteria is met
Alarm ID
Alarm Text
Managed Object
Hardware / Server
Hardware / DCN
Hardware / Storage
NetAct Self Monitoring
Hardware Monitoring: Hardware Alarms
If there is a need to find out more about a particular hardware alarm, it is possible to check the information from HP SIM
HW alarms need to be cleared first from HP SIM to automatically cancel the corresponding alarm in NetAct Monitor
*
NetAct Self Monitoring
Software Monitoring
*
Administering Self Monitoring
Administering Self Monitoring
Alarms (FM): Adaptation for monitoring the NetAct System (com.nsn.netact-8)
*
Administering Self Monitoring
Stopping O&M Agent
Log in to any NetAct virtual machine (VM), and switch to the root user
To stop the Integration Framework component in each VM, enter:
[root]# smanager.pl stop service OMAgentAF-<vm name>
Log in to the VM where the Core component of O&M Agent is running, and switch to the root user.. To stop the Core component of O&M Agent, enter:
[root]# smanager.pl stop service OMAgent
Starting O&M Agent
Log in to the VM where the Core component of O&M Agent is stopped, and switch to the root user.
Check if O&M Agent is running by using the command
‘[root]# smanager.pl status’
If not, then to start the Core component of O&M Agent, enter:
‘[root]#smanager.pl start service OMAgentHA’
To start the Integration Framework component in each virtual machine,
‘[root]# smanager.pl start service OMAgentAF-<vm name>’
*
Administering Self-Monitoring
Specific objects are automatically created in NetAct for Self Monitoring
NetAct instance
NE3SWS object
Administering Self-Monitoring
Log Files
Administering Self-Monitoring
HP Systems Insight Manager
HP SIM is used to monitor NetAct servers, switches, storage devices etc.
HP SIM automatically discovers and identifies HP devices but must be integrated with non HP devices like storage at the time of NetAct installation
HP SIM agents collect FM and PM data from the hardware and forward it to the HP SIM service, which sends it to the O&M Agent
HP SIM classifies all hardware alarms into one of the following categories
Server hardware faults
DCN hardware faults
Storage hardware faults
Administering Self-Monitoring
HP SIM uses agents to collect FM and PM data
Log in as ‘root user’ to the servers and check if the following agents are started using the ‘service’ command. If not, start the agents using the ‘service’ command:
hp-health
hp-ilo
hp-snmp-agents
hpsmhd
hpvca
Administering Self-Monitoring
Administering Self-Monitoring
*
Administering Self-Monitoring
*
Administering Self-Monitoring
*
Administering Self-Monitoring
HP Systems Insight Manager: Checking status, Starting and Stopping
Log in to any NetAct virtual machine (VM), and switch to the root user.
To check the HP SIM status, enter:
[root]# /opt/cpf/bin/smanager.pl status service hpsim
Log in to the node where the HP SIM service is running. To check the HP SIM status, enter:
[root]# /opt/mx/bin/mxstatus –v
[root]# /opt/cpf/bin/smanager.pl start service <hp sim service name>
To Stop HP SIM, use the command
[root]# /opt/cpf/bin/smanager.pl stop service <hp sim service name>
*
Administering Self-Monitoring
3. Select Release
*
Administering Self-Monitoring
3. Select Release
Click on ID for more details
*
Administering Self-Monitoring
3. Select Release
Click on ID for more details
*
Administering Self-Monitoring
2. Select PM Adaptation for Self Monitoring
3. Select Release
Click on ID for details
*
NetAct 8 Administration Tasks
Revising the Status of NetAct Services
Service Manager
Log in to any virtual machine (VM) that is not the vCenter VM, and then switch to the root user.
View the status of all services by entering: smanager.pl status
*
Revising the Status of NetAct Services
Service Manager
Started
The service is running and being monitored by the Service Monitoring application
Stopping
Stopped
The service was stopped by an smanager command or was manually stopped
Recovering
The service has failed to recover, and the Service Monitoring application is trying to start the service
Failed
Maintenance
The service is not monitored by the Service Monitoring application
No connection
The smanager application cannot establish a connection with the service
Frozen
Config Error
Locked
Preventive Health Check (PHC)
Diagnostic Tool
The tool is modular and covers test cases for both RHEL and NetAct. The RHEL module test cases perform the operating system and hardware-related tests. NetAct test cases are grouped into the following sub-modules:
Configuration management
Fault management
High availability
Websphere Application Server
PHC is a command-line tool that is used to check the current state of the NetAct system through the summary report it generates and the logs that it collects. It is especially useful when troubleshooting:
after every NetAct deployment
when there are failures within NetAct
*
Preventive Health Check (PHC)
Collect symptoms to resolve the problems.
Check the health of the system after NetAct deployment (after fresh installation or upgrade).
Verify all nodes using central remote execution utility.
Generate consolidated summary report and collect logs from all the nodes on the node where the tool is invoked.
System Administrator can run the tool from one VM using the mhcf user, and as a result, get the status of all the VMs. Every tool execution creates a new folder that contains the summary of all the findings. The reports generated are available in .html, .log, and .xml file formats. All logs collected are saved in a .tar file.
*
Preventive Health Check (PHC)
Running PHC
Preventive Health Check is installed on all cluster nodes. Therefore, you can run the health check from any node.
To run Preventive Health Check:
Log in as mhcf user to the node where you want to run the health check.
Change the directory to /opt/oss/NSN-mhcf:
Start Preventive Health Check by executing:
./mhcf.pl –c <configuration file> -loglevel <value>
where:
• <value> - value of the log level (default value 4)
./mhcf.pl -c data/cfg_netact8.xml -loglevel 4
To list all the available test cases, execute:
./mhcf.pl -c data/cfg_netact8.xml –list
Preventive Health Check (PHC)
PHC Test Cases: RHEL
Test case
Checks the hardware specification such as processor architecture, processor type and speed, processor count, RAM size, and hard disk drive size
RHEL message Log Check
Checks for error messages (in messages.log file) in the /var/log directory. Test fails if errors are found
RHEL secure Log Check
Checks for error messages (in secure log file) in the /var/log directory. Test fails if errors are found
Node Performance Check – System Load Check
Checks that the load on the CPU (application and operating system) is less than 80%. Test fails if it exceeds 80%. You can check the current CPU load in the log file
Duplicate IP Check
Checks that the system IP is used in another machine in the network and reports the MAC address of the same. Test fails if the test case finds same IP
NTP Synch Check
*
Preventive Health Check (PHC)
Configuration Management Cases
NetAct Information Cases
Test case title
Test case Description
CM Import File Count
Checks that the number of files in CM Import folders is less than 10000 threshold. Test fails if number of files exceeds the threshold. Check south import directory to view the current number of files.
Test case title
Test case Description
NetAct Base Info
Lists system information such as RHEL version, Java version, Oracle version, ClusterID, ClusterName, and Domain name. Test fails if it is not able to fetch the required information
DB Size Check
Checks that the DB usage is below 80% threshold. Test fails if usage exceeds threshold
Default System users Check
*
Preventive Health Check (PHC)
Fault Management Cases
Test case title
Test case Description
FM Alarm Threshold
Checks that the average alarm insertion time is less than the threshold of 120 seconds for each alarm in the database. Test fails if insertion time is greater than the threshold.
DB Buffer checks and FM alarm checks
Checks that: the FX alarm table buffer hit ratio is greater than 98 the active alarm count is less than 100000 threshold. Test fails if one of the condition is not met
GEP Status
Checks that the GEP instance of servicemix is running. Test fails if GEP instance of servicemix is not running.
FM Trap Delivery Check
Checks that the trap delivery is working. This is done by checking if mediation is listening to the traps and traps are auto forwarded. Test fails if any of these checks fail because the OFaS files are not written to the import directory
GEP Process Memory Check
Logs the Used Memory, Committed Memory, and Max Memory. If Preventive Health Check is unable to retrieve these data, thetest fails.
FM Import File Count
*
Preventive Health Check (PHC)
Fault Management Cases
Test case title
Test case Description
PM files in import folder
Check that the number of files in: SMI folder is less than the threshold of 20000. error folder is equal to 0. The test case fails if any of these checks fails. The current number of files in each check is recorded in the log file
PM files in export folder
*
Preventive Health Check (PHC)
WebSphere Application Server Cases
Test Case Title
Test Case Description
Websphere status : nodeagent
Checks that the Node Agent process is running. Test fails if the process is not running
Websphere status : cluster
Checks that the JMSCluster and SOLCluster processes are running. Test fails if the either of these process is not running
Websphere status : listener
Checks that the Websphere listener process is running. Test fails if the process is not running
Websphere status : jdbc_connection
Checks that the WAS JDBC is connected to the database. Test fails if the WAS JDBC is not connected
Temporary directory disk space Check
Checks that the free space is 1048576 bytes in the /tmp directory of the node where was services are running. Test fails if free disk space is less than the threshold
NTP Sync Check
Checks that the nodes are synchronized with NTP server. Test fails if it is not synchronized
Jetty Webserver Test
Checks that the Jetty webserver is running. Test fails if it is not running
WAS Heap Dump Checks
*
Preventive Health Check (PHC)
PHC Test Cases: NetAct Module
High Availability Cases (one case per HA service – brief list (complete list in the document)
Test case title
Test case Description
NFS service check
 
Net FS service check
 
OMAgent AF service check
 
Dirsrv service check
 
tomcat_vcentplg service check
 
db service check
 
Pulse-jbi service check
 
Pulse-j2ee service check
 
OM Agent service check
 
hpsim service check
 
*
Log Monitor Tool
Central Point for Visualizing all VMs logs
As there are many virtual machines representing a single NetAct cluster, it is difficult to check logs on each of these virtual machines and troubleshoot the problems in NetAct system.
*
Log Monitor Tool
Central Point for Visualizing all VMs logs
When log monitor is enabled, logs are mounted from all virtual machines, and log monitor creates the following directories
Logs
Application Logs
Log Monitor Tool
To enable collecting the logs from the virtual machines:
Log in to vcenterselfmon virtual machine as omc user.
To view log directories of:
NetAct nodes excluding all Performance Manager virtual machines, execute:
./logmonitorconsole.sh -enable
Or
NetAct and NetAct Performance Manager VMs except VMs running was_linas services, execute:
./logmonitorconsole.sh -enableWithPM
This invokes the tool and collects the logs from the virtual machines. The logs collected are stored at /var/opt/oss/logc directory.
A log directory specific to Log Monitor tool is created during installation. However, log folders specific to virtual machine are created when the tool is invoked. When enabled, the script collects logs at run time.
Log directories can be viewed only as omc user
*
Log Monitor Tool
To disable collecting the logs from the virtual machines:
Log in to vcenterselfmon virtual machine as omc user.
To disable collecting:
./logmonitorconsole.sh -disable
Or
NetAct and NetAct Performance Manager VMs except VMs running was_linas services, execute:
./logmonitorconsole.sh -disableWithPM
NetAct nodes excluding all Performance Manager virtual machines, execute:
./logmonitorconsole.sh -status
Or
NetAct and NetAct Performance Manager VMs except VMs running was_linas services, execute:
./logmonitorconsole.sh -statusWithPM
*
Administration Tasks
*
Administration Tasks
*
Administration Tasks
Shutting down NetAct
When there is a need to shut down the NetAct environment, shut down the applications and services first.
This is the recommended shutdown order of NetAct:
Check the status of the services.
Shut down the NetAct services.
Shut down the NetAct virtual machines.
Shut down the ESXi hosts.
Shut down the vCenter host .
*
Administration Tasks
Shutting down NetAct: Shutting down the NetAct services
Log in to the virtual machine (VM) hosting the dirsrv service, and switch to the root user.
To stop the vManager service so that it does not handle VM reset operations, enter:
[root]# /opt/cpf/install/bin/cpfvmanager_configure.sh --disable_vcenter--disable_alarms –-stop
[root]# smanager.pl stop NetAct
Note:
Some services may be unstoppable. This is expected, if the service name appears with a (u) upon checking its status.
When the VMs are restarted later, the services also restart automatically. As an alternative, you can also restart the services manually.
If you plan to manually restart the services later, enter:
[root]# smanager.pl maintenance all on
*
Administration Tasks
Shutting down NetAct: Shutting down the NetAct VMs
Perform the following procedure on all VMs in the system, except on the vCenter Virtual Machine:
Access the vSphere Web client
From the vCenter home page, click vCenter → VMs and Templates.
Select your vCenter server, and click the Related Objects tab.
Select the Virtual Machines tab.
Right-click a VM that is not the vCenter VM, and then select Shut Down Guest OS.
To confirm the shut down, click Yes.
*
Administration Tasks
Shutting down NetAct: Shutting down the ESXi hosts
Perform the following procedure on all ESXi hosts in the system, except on the host where the vCenter VM resides.
From the vCenter Web client, click the Hosts tab.
Right-click the host that you want to shut down, and then select Enter Maintenance Mode.
From the Confirm Maintenance Mode dialog, deselect the “Move powered-off and suspended virtual machines to other hosts in the cluster” checkbox, and then clickYes.
Right-click the host that you entered into maintenance mode, and then select Shut Down.
On the confirmation dialog, enter a reason for shutting down the host, and then click OK.
From the Recent Tasks pane, check that the task is completed.
Repeat steps 2 through 6 for all other remaining hosts, except on the host where the vCenter VM resides.
Refresh the site to check that the host is properly shut down and that state is saved on the vCenter.
Warning:
*
Administration Tasks
Shutting down NetAct: Shutting down the vCenter host
Shutting down the ESXi host where the vCenter virtual machine (VM) is located will also automatically shut down the vCenter virtual machine. Make sure to shut down this host last.
Right-click the host where the vCenter VM is residing, and then select Shut Down.
On the confirmation dialog, enter a reason for shutting down the host, and then click OK.
Refresh the site to check that the host is properly shut down.