Troubleshoot EMC CelerraVNX Integration

3
3035 KB Article 8/11/2015 Updated Usage Category NSS 9.0: NSS 9.5: Affected versions How to: Troubleshoot EMC Celerra/VNX Integration Summary The purpose of this document is to serve as an information bank for EMC-related problems. It will cover the most common problems and the recommended steps and tools needed in order to solve them. The document is divided into three parts: Quotas not locking Quotas not updating Error 1450 Quotas Not Locking If the EMC quotas aren't locking, the first thing to look for is if the NSS server receives heartbeats fromthe CEPA-server. A failure to receive heartbeats will result in quotas not locking. There are three main ways to check for heartbeat errors: 1. In the Notice field in Quota Server. 2. In the System\History-tab in Quota Server. 3. In Application Event log in Windows. This is the most common message if no heartbeats are received: "Failed to receive heartbeats from EMC CEPA. Locking on EMC quotas will not be fully operational. Please check if it is installed and configured." The three most common reasons behind the failure to receive heartbeats: 1. Failed communication fromthe CEPA-server, i.e. stopped/crashed EMC CAVA-service. 2. Missing or incorrectly configured endpoint at HKEY_LOCAL_MACHINE\SOFTWARE\EMC\Celerra Event Enabler\CEPP\CQM\Configuration. 3. Endpoint still claimed after a NSS Quota Server service stop. It can take several seconds for the QS Service instance to de-register as the end-point for CEPA, this is carried out during service shutdown. If the service is restarted before de-registration is completed, then its attempt to connect to CEPA is refused (as the end-point is still ‘claimed’) – so no heart-beats to QS. To reset the connection it is necessary to restart the QS Service – ensuring that the original shutdown has actually completed. Note: Although it can appear in the service manager that services are stopped, it is more reliable to monitor the process in task manager as applications can send ‘completed’ messages before actions are actually completed (QS is sometimes guilty of this). Endpoint check: Make sure that the endpoint is correctly configured on the CEPA-server(s). If NSS and CEPA runs on the same machine, the endpoint can be set to only Northern. If the CEPA-server is external it needs to point to the direction (IP address) where all the information should be sent to - in this case the NSS server(s). A single NSS server: Northern@<IP> Multiple NSS servers: Northern@<IP1>;Northern@<IP2> etc. Disregard the brackets when setting the endpoint. It should look like this: [email protected] If you are receiving heartbeats the problemis either: 1. Account & permissions-related. 2. A communication problembetween CEPP and CEPA. 1 a) EMC CAVA Service running with the wrong account No errors are displayed in this case, which makes it difficult to troubleshoot. The only obvious symptomis that the files cannot be blocked. Make sure that the EMC CAVA service runs with an account that has administrative rights on the CIFS servers managed by Quota Server. 1 b) EMC CAVA Service running with the SYSTEM account In case the CQM application is co-resident (i.e. NSS services and CEPA server are the same host), the EMC CAVA can run with the Local Systemaccount. However, this configuration is strongly not recommended. The Local Systemaccount can be easily affected by security policies forced on the server, preventing connection fromthe network, for example. 1 c) NSS Services running with the wrong account The NSS Quota Server service account should belong at least to both “Backup Operators” and “Power Users” groups in the VNX / Celerra CIFS server. If not, quotas may not be locked, without any errors logged in the NSS trace files or in the Data Mover log files. 2. Communication problembetween CEPP and CEPA

description

some issues in celerra/vnx and its troubleshooting

Transcript of Troubleshoot EMC CelerraVNX Integration

Page 1: Troubleshoot EMC CelerraVNX Integration

3035KB Article

8/11/2015Updated

UsageCategory

NSS 9.0:NSS 9.5:

Affected versions

How to:Troubleshoot EMC Celerra/VNX Integration

SummaryThe purpose of this document is to serve as an information bank for EMC-related problems. It will cover the most common problems and therecommended steps and tools needed in order to solve them.

The document is divided into three parts:

Quotas not lockingQuotas not updatingError 1450

Quotas Not LockingIf the EMC quotas aren't locking, the first thing to look for is if the NSS server receives heartbeats from the CEPA-server. A failure toreceive heartbeats will result in quotas not locking. There are three main ways to check for heartbeat errors:

1. In the Notice field in Quota Server.2. In the System\History-tab in Quota Server.3. In Application Event log in Windows.

This is the most common message if no heartbeats are received:

"Failed to receive heartbeats from EMC CEPA. Locking on EMC quotas will not be fully operational. Please check if it is installed andconfigured."

The three most common reasons behind the failure to receive heartbeats:

1. Failed communication from the CEPA-server, i.e. stopped/crashed EMC CAVA-service.2. Missing or incorrectly configured endpoint at HKEY_LOCAL_MACHINE\SOFTWARE\EMC\Celerra EventEnabler\CEPP\CQM\Configuration.3. Endpoint still claimed after a NSS Quota Server service stop.

It can take several seconds for the QS Service instance to de-register as the end-point for CEPA, this is carried out during serviceshutdown. If the service is restarted before de-registration is completed, then its attempt to connect to CEPA is refused (as the end-pointis still ‘claimed’) – so no heart-beats to QS. To reset the connection it is necessary to restart the QS Service – ensuring that the originalshutdown has actually completed.

Note: Although it can appear in the service manager that services are stopped, it is more reliable to monitor the process in task manageras applications can send ‘completed’ messages before actions are actually completed (QS is sometimes guilty of this).

Endpoint check:

Make sure that the endpoint is correctly configured on the CEPA-server(s). If NSS and CEPA runs on the same machine, the endpoint can beset to only Northern. If the CEPA-server is external it needs to point to the direction (IP address) where all the information should be sentto - in this case the NSS server(s).

A single NSS server: Northern@<IP>Multiple NSS servers: Northern@<IP1>;Northern@<IP2> etc.

Disregard the brackets when setting the endpoint. It should look like this: [email protected]

If you are receiving heartbeats the problem is either:

1. Account & permissions-related.2. A communication problem between CEPP and CEPA.

1 a) EMC CAVA Service running with the wrong account

No errors are displayed in this case, which makes it difficult to troubleshoot. The only obvious symptom is that the files cannot beblocked. Make sure that the EMC CAVA service runs with an account that has administrative rights on the CIFS servers managed byQuota Server.

1 b) EMC CAVA Service running with the SYSTEM account

In case the CQM application is co-resident (i.e. NSS services and CEPA server are the same host), the EMC CAVA can run with theLocal System account. However, this configuration is strongly not recommended. The Local System account can be easily affected bysecurity policies forced on the server, preventing connection from the network, for example.

1 c) NSS Services running with the wrong account

The NSS Quota Server service account should belong at least to both “Backup Operators” and “Power Users” groups in the VNX /Celerra CIFS server. If not, quotas may not be locked, without any errors logged in the NSS trace files or in the Data Mover log files.

2. Communication problem between CEPP and CEPA

Page 2: Troubleshoot EMC CelerraVNX Integration

Enter the EMC Control Station and make a CEPA pool check. This will provide a status report of the CEPA pool. If there are any problemswith the communication between CEPP and CEPA, it will be displayed in the pool information. This is the command for a pool check:

$ server_cepp server name -pool -info

The command will produce a result similar to this:

server name :pool_name = Northernserver_required = Noaccess_checks_ignored = 0req_timeout = 5000msretry_timeout = 1000mspre_events = OpenFileWrite, CreateFile, RenameFile, DeleteFile, CloseModified, CreateDir, RenameDir, DeleteDir, SetAclFilepost_events =post_err_events =CEPP Servers:IP = xx.xx.xx.xx, state = ONLINE, rpc = MS-RPC over SMB, cava version = 6.0.4.0, nt status = SUCCESS, server name = server.domain.com

If there are any problems on this end they will be featured in the bottom row. Check the 'state' and the 'status'.

Common state errors:

ERROR_CEPP_NOT_FOUND - Insufficient account permissions.OFFLINE - NSS Quota Server not running or not registered as a CQM application.

Common status errors:

OBJECT_NAME_NOT_FOUND - CEPP is unable to communicate to EMC CAVA-service on the CEPA-server.CONNECTION_DISCONNECTED - Connection rejected. Possibly by closed ports, a firewall or insufficient account permissions. This

error could occur if the cepp.conf-file is pointing to the wrong server (e.g. to a server that does not have the EMC CEE Frameworkinstalled).

INVALID_PARAMETER - Account problems of a more complex nature. The MS RPC account is incorrectly mapped and configured in thedomain.

If the problem should persist on this end (CEPP & CEPA), you need to contact EMC support in order to receive further assistance.

Quotas not UpdatingDisabled CIFS Notifications

The most common reason behind quotas not updating synchronously on EMC is the absence of CIFS notifications. NSS 8.x, 9.0 and 9.5 relieson CIFS notifications in order to update quotas. No CIFS notifications means no usage level update.

A quick way to verify that the server receives CIFS notifications is to enter the trace file named ncl_trace_qsserver_statistics.txt andsearch for the term "CIFS notifications". How big is this number? If it's zero it means that no CIFS notifications are received. If it's largerthan zero, how big is it? Does the number change over time or does it remain unchanged? Does the number of CIFS notifications reallymatch the size and activity of the environment?

One way to see if the number of CIFS notifications is correct is to compare it with the number of CheckEvents in the previously mentionedstatistics log. These two numbers should be fairly close to each other. If the difference is large it's usually a sign of that CIFS notificationsare turned off for a majority of the CIFS servers.

CIFS notifications need to be enabled for ALL CIFS servers used. The setting responsible for this is called 'notifyonwrite' and it's disabledby default.

This command enables CIFS notifications on the CIFS server:

$ server_mount server_2 -option notifyonwrite ufs1 /ufs1 (where ufs1 is a fileserver name)

Consult with your EMC technical account manager if you are unsure of the implications of enabling CIFS notifications in your environment.

Empty CIFS Notifications

Another common reason behind quotas not updating is empty CIFS notifications. An empty CIFS Notification is a notification that one orseveral changes have occurred within the file system, but the CIFS server is unable to deliver a complete message of these changes due toan overflowed command buffer. An empty notification can be likened to an error message "changes occurred in a share, but no details canbe provided". NSS responds to this error by re-scanning the quota path, or the entire share where multiple quotas are configured on theshare, in order to calculate current usage levels.

An abnormal rate of Empty Notifications could potentially lead to a state of constant rescanning. In this scenario, the file changenotifications will get stuck in the scan queue and a significant delay in processing can be witnessed. In a worst case scenario, this couldcontinuously and negatively affect major Quota Server features such as quota locking.

Read more about Empty CIFS notifications here.

Error 1450For versions 9.5 or earlier, this is a problem that shows up as Error 1450 in Windows Application Event Log. Error 1450 means that"Insufficient system resources exist to complete the requested service". The error message refers to a resource exhaustion on the EMCCIFS server. All available CIFS/SMB-threads on the CIFS server are consumed.

Due to the insufficient resources on the CIFS server, Quota Server will not be able to perform operations on the target storage andprocess quota usage level updates. This could potentially cause serious harm to the Quota Sever functionality (i.e. quotas not updating,miscalculating quotas and failed locking).

Page 3: Troubleshoot EMC CelerraVNX Integration

Illustration:

Description: Failed to queue for notification on drive root: \\device\fs1$ Error:1450.

The entries of Error 1450 in the Windows Application Event log can be matched to a specific message in the EMC Command Station:

2013-09-26 09:26:36: VC: 3:[vdm_002v] Too many access from CAVA server xx.xx.xx.xx: 2013-09-26 09:26:36: VC: 3:[vdm_002v] without the EMC VirusChecking privilege:

The IP address mentioned in this message is the IP address of the NSS server (and the CEPA/CAVA-server if everything runs on the samemachine). Through cooperation with EMC engineers, it has been discovered that the combination of these two error messages is a safeindicator that all available CIFS/SMB threads are consumed at the time the error is reported. The error messages are printed out as soonas NSS tries to spawn a thread to perform a required action, but is denied by the EMC CIFS server.

EMC's default maximum number of threads, in both EMC Celerra and EMC VNX OE for File environments, is 256 for systems with more than1GB of memory. In a highly active environment this can become a bottleneck. It is possible to increase the number of threads by makingalterations to a specific EMC parameter. Northern's experience shows that the resource exhaustion can be greatly mitigated (or in somecases even resolved) by increasing the number of maximum threads available.

IMPORTANT:

EMC customers should always consult with EMC technical personnel to get expert advice on the effect that a change of thissetting may have on the EMC Datamover and the specific environment in question. This is an EMC setting within EMC technology,Northern is providing this information to assist customers in investigating, together with EMC personnel, what is the most appropriateaction to resolve resource exhaustion. Northern makes no claim as to the applicability of these settings in a specific customer'senvironment, and shall not be held responsible for any ill effect in the use of these settings.

How to increase the number of threads:

$ server_setup server_X -P cifs -o start=XXX (Where XXX decides the number of available CIFS threads. Default is 256)

The following is a more detailed explanation from EMC's document Configuring and Managing CIFS on VNX (P/N 300-013-429 Rev 02, page65):

Please note once again that EMC personnel must be consulted prior to changing this parameter!

Other considerations:

Error 1450 is directly linked to the amount of activity that NSS must monitor; a combination of system activity and the scope of quotapolicies configured. As such, and if the number of available threads cannot be successfully extended, it may be possible to look at these twoparameters: reducing the rate of activity on the device, reducing the scope of the quota policies.

NSS subscribes to receive notification of file system changes. When a change notification is received NSS scans the individual folder wherethe change occurred in order to establish the new quota usage level. These operations (notification and scan) require system resources. Assuch it is always wise to review quota policies and ensure no unnecessary quotas are configured. Additionally, it may be possible to reducethe number of quotas configured, to prioritize specific file shares - avoiding high-level 'general monitoring' quotas (this monitoring can beachieved with NSS' reporting capabilities). Note that hard and soft quotas require the same level of access to CIFS threads in order toperform monitoring operations.

Northern has seen excessive load being generated by the constant writing of temporary internet files to remote storage devices in VirtualDesktop environments. Non business-related streaming media has been seen to generate huge amounts of traffic to remote InternetExplorer temporary file caches, tying up resources and destroying system performance. This is a possible opportunity to avoid resourceexhaustion.

For advanced troubleshooting, please contact the Technical Support team at Northern ([email protected]).

ADDITIONAL RESOURCESKB2884 How to: Configure EMC & NSSKB1785 About: Handling of Empty CIFS Notifications in NSS