Improving Production SQL Server Operations

10
Contents Improving Production Operations of SQL Server ....... 1 Service-Level Expectations ...... 1 Best Practices.............................. 2 Operations Management ......... 3 Support Management............... 4 Change Management ............... 5 Capacity Management.............. 6 Summary..................................... 8 For More Information .............. 8 Improving Production SQL Server Operations White Paper July 15, 2002 Avoiding common problems that cause unexpected SQL Server outages leads to great performance and availability in your production SQL Server environment. For example, detecting capacity issues and data anomalies and avoiding surprise availability loss. Ensuring SQL Server performance and availability requires that DBAs implement processes and tools to effectively operate and support SQL Server. For guidance, DBAs can use the Microsoft Operations Framework (MOF). MOF is a comprehensive resource providing SQL Server DBAs with guidance in the form of white papers, operations guides, assessment tools, best practices, case studies, templates, support tools, and services. The purpose of this white paper is to outline some of the MOF concepts and processes and the NetIQ SQL management tools that can be quickly implemented in a SQL Server environment to provide immediate results in production operations.

Transcript of Improving Production SQL Server Operations

Page 1: Improving Production SQL Server Operations

Contents Improving Production Operations of SQL Server ....... 1

Service-Level Expectations...... 1

Best Practices.............................. 2

Operations Management ......... 3

Support Management............... 4

Change Management ............... 5

Capacity Management.............. 6

Summary..................................... 8

For More Information .............. 8

Improving Production SQL Server Operations White Paper July 15, 2002

Avoiding common problems that cause unexpected SQL Server outages leads to great performance and availability in your production SQL Server environment. For example, detecting capacity issues and data anomalies and avoiding surprise availability loss.

Ensuring SQL Server performance and availability requires that DBAs implement processes and tools to effectively operate and support SQL Server. For guidance, DBAs can use the Microsoft Operations Framework (MOF). MOF is a comprehensive resource providing SQL Server DBAs with guidance in the form of white papers, operations guides, assessment tools, best practices, case studies, templates, support tools, and services. The purpose of this white paper is to outline some of the MOF concepts and processes and the NetIQ SQL management tools that can be quickly implemented in a SQL Server environment to provide immediate results in production operations.

Page 2: Improving Production SQL Server Operations

THIS DOCUMENT AND THE SOFTWARE DESCRIBED IN THIS DOCUMENT ARE FURNISHED UNDER AND ARE SUBJECT TO THE TERMS OF A LICENSE AGREEMENT OR A NON-DISCLOSURE AGREEMENT. EXCEPT AS EXPRESSLY SET FORTH IN SUCH LICENSE AGREEMENT OR NON-DISCLOSURE AGREEMENT, NETIQ CORPORATION PROVIDES THIS DOCUMENT AND THE SOFTWARE DESCRIBED IN THIS DOCUMENT “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. SOME STATES DO NOT ALLOW DISCLAIMERS OF EXPRESS OR IMPLIED WARRANTIES IN CERTAIN TRANSACTIONS; THEREFORE, THIS STATEMENT MAY NOT APPLY TO YOU.

This document and the software described in this document may not be lent, sold, or given away without the prior written permission of NetIQ Corporation, except as otherwise permitted by law. Except as expressly set forth in such license agreement or non-disclosure agreement, no part of this document or the software described in this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, or otherwise, without the prior written consent of NetIQ Corporation. Some companies, names, and data in this document are used for illustration purposes and may not represent real companies, individuals, or data.

This document could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein. These changes may be incorporated in new editions of this document. NetIQ Corporation may make improvements in or changes to the software described in this document at any time.

© 1995-2002 NetIQ Corporation, all rights reserved.

U.S. Government Restricted Rights: If the software and documentation are being acquired by or on behalf of the U.S. Government or by a U.S. Government prime contractor or subcontractor (at any tier), in accordance with 48 C.F.R. 227.7202-4 (for Department of Defense (DOD) acquisitions) and 48 C.F.R. 2.101 and 12.212 (for non-DOD acquisitions), the government’s rights in the software and documentation, including its rights to use, modify, reproduce, release, perform, display or disclose the software or documentation, will be subject in all respects to the commercial license rights and restrictions provided in the license agreement.

ActiveAgent, ActiveAnalytics, ActiveKnowledge, ActiveReporting, ADcheck, AppAnalyzer, Application Scanner, AppManager, AuditTrack, AutoSync, Chariot, Chariot VoIP Assessor, ClusterTrends, CommerceTrends, Configuration Assessor, ConfigurationManager, the cube logo design, DBTrends, DiagnosticManager, Directory and Resource Administrator, Directory Security Administrator, Domain Migration Administrator, End2End, Exchange Administrator, Exchange Migrator, Extended Management Pack, FastTrends, File Security Administrator, Firewall Appliance Analyzer, Firewall Reporting Center, Firewall Suite, Ganymede, the Ganymede logo, Ganymede Software, Group Policy Administrator, Intergreat, Knowledge Scripts, Log Analyzer, Migrate.Monitor.Manage, Mission Critical Software, Mission Critical Software for E-Business, the Mission Critical Software logo, MP3check, NetIQ, the NetIQ logo, the NetIQ Partner Network design, NetWare Migrator, OnePoint, the OnePoint logo, Operations Manager, Qcheck, RecoveryManager, Security Analyzer, Security Manager, Server Consolidator, SQLcheck, Visitor Mean Business, Visitor Relationship Management, VoIP Manager, W logo, WebTrends, WebTrends Analysis Suite, WebTrends Data Collection Server, WebTrends for Content Management Systems, WebTrends Intelligence Suite, WebTrends Live, WebTrends Network, WebTrends OLAP Manager, WebTrends Report Designer, WebTrends Reporting Center, WebTrends Warehouse, Work Smarter, WWWorld, and XMP are trademarks or registered trademarks of NetIQ Corporation or its subsidiaries in the United States and other jurisdictions. All other company and product names mentioned are used only for identification purposes and may be trademarks or registered trademarks of their respective companies.

Page 3: Improving Production SQL Server Operations

Improving Production SQL Server Operations 1

Improving Production Operations of SQL Server Microsoft SQL Server provides a reliable and scalable database solution. However, unless you apply some basic processes to managing your SQL Server environment, it is likely you will encounter common problems, unpredictable performance, unexpected hardware upgrades, and the resultant unhappy users.

For the DBA, applying basic processes and solutions to improve SQL Server availability and performance produces happier users and a more stable, predictable work environment. Of course, satisfied users and stability also helps the organization the DBA supports. During database outages, user performance suffers, as does the credibility of and confidence in the corporate IT department.

The Microsoft Operations Framework (MOF) provides technical guidance on effectively operating and supporting Microsoft technologies, including SQL Server. MOF is extremely comprehensive, offering white papers, operations guides, assessment tools, best practices, case studies, templates, support tools and services. This extensive knowledge set helps address the people, process, technology, and management issues related to maintaining high availability and performance. Using examples from the NetIQ SQL Management Suite, this white paper outlines several of the MOF best practices for SQL Server, and shows you how to quickly improve production SQL Server operations.

Service-Level Expectations The first steps toward improving production SQL Server operations is identifying the level of availability and performance SQL Server users need to keep the business running. Then, you need to determine if that level of service is possible with the systems, processes, and budget available.

The popularity of Service Level Agreements (SLAs) has grown and evolved since they were first introduced in the 1960s. This evolution from measuring technical services like data center uptime by IT departments to the much more broad-reaching SLAs of today. Now, SLAs are developed through detailed back-and-forth discussions between IT departments and business customers. Customers often use outside sources for research, benchmarking and final negotiations.

For a quicker solution, we recommend developing service-level expectations. Even if you do not have the time and resources to develop formal SLAs, you can identify key metrics to determine if SQL Server availability and performance meets the needs of your business users.

Service-level expectation criteria includes:

• Acceptable transaction response times • Daily/weekly maintenance time • Acceptable recovery times • Acceptable information loss (if any) • Fault reporting and resolution system • Possible budgetary impact

Page 4: Improving Production SQL Server Operations

2 White Paper

Once you determine acceptable criteria against which you want to measure expectations, there are many automated monitoring and management tools to help track availability and performance, such as NetIQ AppManager. These solutions constantly monitor the SQL Server environment, and alert you when exception conditions occur.

NetIQ AppManager automatically runs user transactions at regular intervals and tracks response. The spikes in the above screenshot highlight exceptions to normal response times.

Best Practices After service-level expectations are identified in your production operation, focus improvement efforts on the following four key areas:

• Operations management – Managing current SQL Server performance and tracking values over a longer period to identify trends

• Support management – Troubleshooting and avoiding data problems

• Change management – Identifying changes made to systems and how changes differ from systems in a model or golden state

• Capacity management – Ensuring current systems are optimized and accurately predicting when new systems are necessary

You can improve production operations by implementing best practices in any or all of these areas.

Page 5: Improving Production SQL Server Operations

Improving Production SQL Server Operations 3

Operations Management Operations management involves ensuring SQL Server runs at optimal performance. Remembering to monitor performance from both a short- and a long-term perspective is extremely important.

To address immediate problems, you need to monitor and know about outages and trouble areas before users experience them. You need real-time, immediate monitoring of the following SQL Server information:

• Buffer cache hit ratio • CPU and I/O usage • Full scans • Lock waits • Logins • Memory usage • Network activity • Oldest open transactions • Page life expectancy • Page lookups and requests • Page splits • Physical disk activity • Procedure cache hit ratios • Procedure cache sizes • Processes • Read ahead pages • Replication latency and transaction counts • Server response times • SQL batches • SQL compilations and recompilations • Table locks escalations • Tempdb usage • Work files and work tables

Tracking these values over time allows you to determine if a short-term anomaly is really an exception to your performance record or a regular occurrence at a specific time. For example, does a spike in I/O usage on Tuesday morning represent a one-time exception or an event that occurs every Tuesday morning?

Tracking performance over time also helps when an alternate DBA needs to work on an unfamiliar system. Instead of having to guess if current system loads are normal, this DBA can use real historical information to recognize normal conditions versus exceptions that need to be addressed.

Page 6: Improving Production SQL Server Operations

4 White Paper

NetIQ DiagnosticManager stores data for more than 20 specific performance areas, and can give you either a real-time view or a view of performance over time.

Support Management Support management requires the management of data issues. DBAs commonly receive requests for tracing and tracking changes to data. For example, “Who released that purchase order?” or “Is it a user or an application that is entering this incorrect data?”

A common way to determine the cause of data changes is to investigate stored procedures/triggers to try to pinpoint where the application changed data in a specific way, as well as the events leading to the user or application making the changes. This in-depth investigation can be challenging, especially in complex environments or for complex applications.

Another common situation involves the discovery of seemingly small changes that have huge impacts on both database and user productivity. For example, a company updates its customer database from ZipCodes to ZIP+4 with an update procedure but forgets that their customer database also includes Canadian customers. The minor update effectively eliminates their ability to ship to thousands of customers. They restore their backup from the previous evening and ask users to re-key any data entered throughout the day.

Page 7: Improving Production SQL Server Operations

Improving Production SQL Server Operations 5

Utilizing the SQL Server transaction log can help you avoid scenarios like the one described above. NetIQ RecoveryManager (powered by Lumigent Log Explorer) is the only tool that exposes the contents of the transaction log to the SQL DBA. RecoveryManager allows you to research the source of data changes and back out transactions in error while the database is online, keeping SQL Server available while fixing otherwise time and resource extensive problems.

NetIQ RecoveryManager makes data in the SQL Server transaction log useful. It provides online undo capability for individual transactions, restores dropped tables, and permits investigation of data change history by users and applications.

Change Management In most organizations, multiple people have access to servers. Multiple DBAs might share responsibility for a computer. Even more common, the DBA has to share access to a computer with the Windows or network administrator. In these environments, it is vital to know what changes are made to SQL Servers and how those changes compare to a known-good working state. When changes are not tracked, it is easy for problems to occur. For example, a DBA and a network administrator share the administration of a SQL Server database server. The network administrator decides to make the server a BDC as a precaution in case there is a problem with the network PDC. Of course, SQL Server performance remains fine until the PDC fails and the BDC is promoted. Being aware of this type of change in a production SQL Server environment is definitely a best practice.

Page 8: Improving Production SQL Server Operations

6 White Paper

SQL Server areas for which to maintain change histories include the following:

• Environment (hardware and operating system) • Application software and services • Database schema (tables, columns, indexes) • Security (permissions, groups, roles) • SQL Server configuration settings

NetIQ ConfigurationManager keeps a history of more than 200 configuration settings affecting performance. These settings range from changes to hardware through changes to the operating system into changes to SQL Server instances themselves.

Capacity Management With the constant demands and everyday fire fighting involved in database administration, capacity management and planning is often ignored. Unfortunately, ignoring capacity management usually results in unpredicted and unbudgeted hardware requests.

Ensuring the optimized use of current servers is the first best practice in capacity management. Are stored procedures or triggers consistently performing poorly? If so, direct your performance tuning efforts towards them to get the best return from available hardware.

Page 9: Improving Production SQL Server Operations

Improving Production SQL Server Operations 7

Of course, even with tuning, you need to address database growth. Businesses today have an insatiable need for data, and they want immediate access to that data. To understand how SQL Servers are growing, you need to gather long-term growth statistics on:

• Database growth • Table growth • Index growth • Table fragmentation

By understanding the inter-relationships of how table and index growth affects overall database growth over time, it becomes easy to predict when hardware upgrades will be required.

DiagnosticManager highlights the databases with the greatest growth both in space and rows. Within each database, the product displays the tables that have been responsible for that growth.

Page 10: Improving Production SQL Server Operations

8 White Paper

Summary The best practices covered in this white paper were:

• Determine realistic user expectations • Let tools automatically handle monitoring performance and availability • Be able to quickly recover from outages • Track changes to your environment • Capture metrics for accurate capacity planning

Implementing these best practices though processes and tools will help you improve SQL Server production operations. You will achieve higher availability and performance, while spending less time on routine and mundane issues.

For More Information NetIQ’s SQL Management Suite is the industry’s most comprehensive solution dedicated to improving the production operations of Microsoft SQL Server. With the Suite, DBAs and IT professionals can improve performance and availability with automated operations, in-depth diagnosis, granular data recovery and configuration management.

The SQL Management Suite consists of four products:

AppManager for SQL Server is the most widely adopted solution for automatically managing distributed SQL Servers from a central easy-to-use console. AppManager allows you to optimize performance, run pre-packaged management reports, and ensure availability through automated event detection and correction.

DiagnosticManager for SQL Server provides real-time performance and status information, enabling administrators to quickly diagnose and correct SQL Server problems. You will be able to quickly identify the root causes of problems and take action, reducing downtime and improving availability.

RecoveryManager for SQL Server provides real-time recovery using SQL Server transaction log analysis. DBAs can research the source of data changes and back out transactions in error, keeping applications available

ConfigurationManager for SQL Server provides comprehensive change history and configuration reporting for SQL Servers. By notifying DBAs and NT administrators of changes to key database and system configuration changes, ConfigurationManager helps reduce downtime caused by less-than-optimal change control procedures.

The products are available both as individual products and as a bundled suite.

For more information on the SQL Management Suite and to download a free 30-day trial, please visit http://www.netiq.com/products/sql.

For more information on the Microsoft Operations Framework, please visit http://www.microsoft.com/sql/techinfo/administration/2000/default.asp.