Data Movement Process - Commvault

14
Chapter 4 Data Movement Process CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Transcript of Data Movement Process - Commvault

Chapter 4

Data Movement Process

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

46 - Data Movement Process

Understanding how CommVault® software moves data within the production and protected environment is

essential to understanding how to configure your physical environment, the logical environment, and to help

improve performance.

Data Movement Concepts

Chunks

CommVault software writes protected data to media in chunks. During data protection jobs as each chunk is

written to media indexes and the CommServe database are updated. For indexed based jobs this creates points in

which a job can be resumed if network, client or Media Agent problems occur. In this case the job can continue

from the most successfully written chunk. It also allows for indexed jobs to be recovered to the most recent chunk

if the job fails to complete. This partial recovery is performed by using the restore by job option. In this case the

data can be recovered up to the most successfully written chunk.

As a general rule the larger the chunk size, the more efficient the protection operation will be. In the event that

jobs are being conducted over unreliable links such as WAN backups decreasing the chunk size may improve

overall performance. If any disruption occurs during the job any data written to media prior to the chunk update

operation in the cache and CommServe server will have to be rewritten. In this case a smaller chunk size will

limit the amount of data that will have to be re-transmitted over the link. For reliable clients, Media Agents and

networks using a higher chunk size will improve performance.

Chunk sizes are determined by the job type being performed and media being used. Depending on the media type

the default chunk sizes will be based on the following:

Disk storage will use 2 GB chunk sizes.

Tape media will write chunks based on whether the job is indexed or non-indexed jobs types.

o 4 GB chunk sizes for indexed based backups.

o 16 GB chunks for non-indexed chunk sizes.

Chunk size can be configured using the following methods:

Tape Media – Chunk size can globally set in the Media Management applet in Control Panel.

Tape or Disk Media – Chunk Size can be set at the data path level in the storage policy copy. This is

done in the data path properties in the Data Path tab of the policy copy. Settings at the data path level

will override settings configured in control panel.

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Data Movement Process - 47

Blocks

Block size determines the block size when writing data to protected storage. This size can be modified to meet

hardware requirements and also improve performance. The default block size CommVault software uses is 64KB.

Block size can be set in the data path properties in the Data Path tab of the storage policy copy. It is important to

note that block size is hardware dependent. This means that increasing this setting requires all network cards,

Host Buss Adapters, switches, operating systems and drives to support the block size. Consider this aspect not

only in the production environment but also any DR locations where recovery operations may be performed.

Data Interface Pairs

Data Interface Pairs (DIP) are used to explicitly define the physical IP network path data will take from source to

target. This is done by specifying source and destination network interfaces using host name or IP address. When

multiple paths from source to target exist, multiple DIPs can be configured allowing multi-stream operations to

use separate network paths for streams. This will permit the aggregate bandwidth of multiple DIPs source to

target physical connections to improve data movement performance.

Data Interface Pairs can be configured in several different ways:

Job Configuration tab of client properties – can be used to configure source and target paths for a

client.

Data Interface Pairs applet in Control Panel – can be used to configure source and target paths for clients

and Media Agents.

DataIFPairs.exe – resource pack utility allows bulk entry of multiple DIPs using an answer file.

The following diagram illustrates the use of data interface pairs from client to Media

Agent for primary backups and Media Agent to Media Agent when auxiliary copies run

to generate secondary copies. When multi-streaming jobs streams can use separate

physical connection from source to destination to improve performance.

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

48 - Data Movement Process

Data Streams

Data Streams are what CommVault software uses to move data from source to destination. The source can be

production data or CommVault protected data. A destination stream will always be to CommVault protected

storage. Understanding the data stream concept will allow a CommCell environment to be optimally configured

to meet protection and recovery windows. This concept will be discussed in great detail in the following sections.

Primary Protection Streams

Primary data protection streams originate at the source file or application that is being protected. One or more

read operations will be used to read the source data. Once the data is read from the source it is processed by the

iDataAgent and then sent to the Media Agent as Job Streams. The Media Agent then processes the data, arranges

the data into chunks and writes the data to storage in Device Streams.

The following diagram illustrates the stream movement process from source to

destination. One or more read operations are performed on source data which is then

moved to the Media Agent as job streams. The Media Agent writes the data to protected

storage as device streams.

Configuring Source Read Streams

Content requiring protection is defined within a subclient. Each subclient will contain one or more streams for

data protection jobs. For most iDataAgents, it is possible to multi-stream subclient operations. Depending on

performance requirements and how the data is organized in the production environment, multi-streaming source

data can be done by adding more subclients or increasing the streams for an individual subclient.

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Data Movement Process - 49

Multiple Subclients

There are many advantages to use multiple subclients in a CommCell environment. These advantages are

discussed throughout this book. This section will focus only on the performance aspects of using multiple

subclients.

Running multiple subclients concurrently allows multi-stream read and data movement during protection

operations. This can be used to improve data protection performance and when using multi-stream restore

methods, it can also improve recovery times. Using multiple subclients to define content is useful in the following

situations:

Using multiple subclients to define data on different physical drives – This method can be used to

optimize read performance by isolating subclient contents to specific physical drives. By running

multiple subclients concurrently each will read content from a specific drive which can improve read

performance.

Using multiple subclients for iDataAgents that don’t support multi-stream operations – This

method can be used for agents such as the Exchange mailbox agent to improve performance by running

data protection jobs on multiple subclients concurrently.

Using multiple subclients to define different backup patterns – This method can be used when the

amount of data requiring protection is too large to fit into a single operation window. Different

subclients can be scheduled to run during different protection periods making use of multiple operation

windows to meet protection needs.

Multi-Stream Subclients

For iDataAgents that support multi-streaming individual subclients can be set to use multiple read streams for

data protection operations. Depending on the iDataAgent being used this can be done through the Data Readers

setting or the Data Streams setting.

Data Readers

Data Readers determine the number of concurrent read operations that will be performed when protecting a

subclient. By default, the number of readers permitted for concurrent read operations is based on the number of

physical disks available. The limit is one reader per physical disk. If there is one physical disk with two logical

partitions, setting the readers to 2 will have no effect. Having too many simultaneous read operations on a single

disk could potentially cause the disk heads to thrash slowing down read operations and potentially decreasing the

life of the disk. The Data Readers setting is configured in the General tab of the subclient and defaults to two

readers.

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

50 - Data Movement Process

Allow multiple readers within a drive or Mount Point

When a disk array containing several physical disks is addressed logically by the OS as a single drive letter, the

Allow multiple readers within a drive or mount point can be used as an override. This will allow a backup job

to take advantage of the fast read access of a RAID array. If this option is not selected the CommVault software

will use only use one read operation during data protection jobs.

The following diagram illustrates a client using multiple readers defined through two

subclients to multi-stream data protection jobs. Subclient 1 is defined using two data

readers. Subclient 2 is defined with 3 data readers and Allow multiple readers within a

drive of mount point enabled.

Data Streams

Some iDataAgents will be configured using data streams and not data readers. For example, Microsoft SQL and

Oracle subclients use data streams to determine the number of job streams that will be used for data protection

operations. Data Streams are configured in the Storage Device tab of the subclient. Although they will be

configured differently in the subclient, they still serve the same purpose of multi-streaming data protection

operations.

Source Read Streams Best Practices

Using multiple streams for data protection operations can provide better data protection and restore performance.

Before modifying stream settings consider the following best practice guidelines:

There should not be more than one stream configured for each physical drive. By default CommVault

software will automatically use one stream per physical drive regardless of the number of streams

configured. However if the Allow multiple readers within a drive or mount point is selected then the

software will use the number of streams specified. This can cause disk contention which may slow down

performance and shorten the physical life of the disks.

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Data Movement Process - 51

If multiple subclients are configured to read from the same physical disk consider scheduling the

subclients to run at different times to prevent contention on disks.

Only increase the number of streams to help meet protection windows. Multi-streaming data protection

jobs is used to improve performance. If windows are being met there is no reason to alter streams. For

every extra stream that is configured on source data it will use a corresponding stream in the protected

environment. Using too many read streams may result in other jobs not being able to run until storage

stream resources become available.

When a job runs it can only use the number of streams that are currently available in the storage

environment. This means that if a subclient is configured to use four streams but the Media Agent and

storage only has two streams available the job will use only two streams. You can use the advanced

backup option Reserve Resources Before Scan to reserve the number of streams configured for the job

to ensure adequate streams are available. This option should only be used for mission critical jobs as the

streams reserved will remain locked for the duration of the job.

When multi-streaming a subclient for MS-SQL, DB2 or Sybase the streams cannot be multiplexed to a

single tape. Each stream will have to be written to a separate tape and there must be an equivalent

number of drives available to restore all streams concurrently during restore operations. The streams can

be combined to a tape during secondary copy auxiliary jobs but they would have to be pre-staged to a

disk library prior to being recovered. If multiple databases are being protected it is recommended to use

separate subclients for different databases if performance needs to be improved. When using single

stream operations of multiple subclients the streams can be multiplexed or combined to tape.

Job Streams

Job Streams for data primary protection jobs are network streams running from the client to the Media Agent.

The number of concurrent job streams that can run in an environment is based on the number of streams Media

Agents are configured to accept, the number of streams a library will accept and the number of storage policy

device streams configured. The number of job streams a Media Agent will accept is determined by the Maximum

number of parallel data transfer operations setting configured in the General tab of the Media Agent

Viewing the number of streams per job in the Job Controller

Fields in the job controller can be customized to show additional information. The fields

Number of Data Readers and Data Readers in Use can be added to view the number of job

streams being attempted and used for each job. Refer to CommVault documentation for

more information on customizing the Job Controller.

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

52 - Data Movement Process

properties. This option defaults to 100 streams and is dependent on the Optimize for concurrent LAN backups

option being selected in the Control tab. This option is enabled by default and it is recommended not to change

this setting. Library and storage policy device stream configuration will be discussed in detail in the next section.

Device Streams

As Job Streams are received by the Media Agent, data is put into chunk format and is written to media as Device

Streams. The number of device streams that can be used will dependent on the library type, library configuration

and storage policy configuration.

Tape Library Device Streams

For tape libraries one sequential write operation can be performed to each drive. If there are eight drives in the

library then no more than eight device streams will be used. By default each job stream will write to a device

stream. To allow multiple job streams to be written to a single tape drive, multiplexing can be enabled. The

multiplexing factor will determine how many job streams can be written to a single device stream. If a

multiplexing factor of four is set and there are eight drives a total of thirty two job streams can be written to eight

device streams.

The following diagram illustrates multiple job streams being multiplexed into device

streams within the Media Agent. Multiplexing to tape libraries can improve write

performance by keeping drive buffers filled allowing the drives to write faster.

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Data Movement Process - 53

Disk Library Device Streams

For disk libraries the number of device streams is based on the total number of mount path writers for all mount

paths within the library. If a disk library has two mount paths with ten writers each, a total of twenty device

streams can write to the library. It is important to note that since disk libraries allow multiple write operations

multiplexing is not recommended. By increasing the number of mount path writers, more job streams can be

written to device streams on a one-to-one ratio. If network, Media Agent and disk resources are adequate

increasing the number of writers for a mount path will have a positive effect on data protection performance.

The following diagram illustrates four job streams writing to a disk library with two

mount paths. Each job stream equates to a device stream when writing to disk libraries.

Storage Policy Device Streams

Device streams are configured in the properties of the storage policy. The general rule of thumb is that the

number of device streams configured in a storage policy should always equal the number of drives or writers of

all libraries defined in the storage policy primary copy. Having fewer number of streams may be used to throttle

parallel throughput, but that doesn‘t make maximum efficient use of the devices and there are other means to

restrict allocation of devices. If the number of device streams is greater than the total number of resources

available no benefit will be gained. The CommVault software uses a throttling mechanism to always use the

lowest stream value throughout the data movement process.

Moving Data from Production to Protected Storage

This section provides a brief overview of running data protection jobs. For more information and step-by-step

guides use CommVault online documentation. There are three methods for running data protection jobs for

production data:

Job scheduling

On-demand jobs

Scripting

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

54 - Data Movement Process

Job Scheduling

Jobs are scheduled for subclients through a dedicated schedule at the subclient or data set level or through the use

of schedule policies. Whichever method is used it is important to note that jobs always run at a subclient level. If

a schedule policy is being used to backup five client file systems and each client has two subclients a total of ten

jobs will run when the policy executes.

On-Demand Jobs

On-demand jobs can be run at the subclient or data set level. On demand jobs will run immediately.

Scripting

Scripts can be manually created or automatically generated through the CommCell console. Once a script is

created it can be run at the client by manually executing the script or by calling the script from a separate

scheduling mechanism or another script. When a data protection script executes it will contact the CommServe

server and the based on the scripts parameters the CommServe server will execute the job.

Secondary Copy Streams

For most data centers, performance is the key requirement when performing primary backups from the production

servers. When copying data to secondary copies, media management becomes the primary focus. The aspect of

grouping data with like retention and storage requirements allows for more efficient media management. Using

multiple secondary copies allows for data with like requirements to be managed on the same media set. Using

options such as combine to streams and multiplexing secondary copies improves overall performance and media

management.

Multiple Secondary Copies

One method for grouping data with like protection requirements is to use multiple secondary copies. A month end

selective copy for compliance reasons and a daily synchronous copy for off-site DR can be used to separate data

to meet different retention and media management requirements.

By associating subclients with secondary copies the data will then be copied and managed in subclient groups

using the same media sets. When multiple subclients are associated with a copy and when subclients are using

multiple streams, the management of the streams on media and the movement of the streams becomes important.

If the secondary copy is not configured correctly then media management requirements will not be met. One

method to properly consolidate and manage streams in secondary copies is to use the combine to streams option.

Combine to Streams

A storage policy can be configured to allow the use of multiple streams for primary copy backup. Multi-streaming

of backup data is done to improve backup performance. Normally, each stream used for the primary copy

requires a corresponding stream on each secondary copy. In the case of tape media for a secondary copy, multi-

stream storage policies will consume multiple media. The combine to streams option can be used to consolidate

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Data Movement Process - 55

multiple streams from source data on to fewer media when secondary copies are run. This allows for better media

management and the grouping of like data onto media for storage.

Example: You backup a home folders subclient to a disk library using three streams to maximize performance.

The total size of protected data is 600GB. You want to consolidate those three streams onto a single 800GB

capacity tape for off-site storage.

Solution: By creating a secondary copy and setting the Combine to Streams setting to 1 you will serially place

each stream onto the media.

Combine to streams setting to 1 will take streams A, B, and C and serially write them to

one tape.

In some cases using the combine to streams option may not be the best method to manage data. Multi-streaming

backup data is done to improve performance. When those streams are consolidated to the same media set they can

only be recovered in a single stream operation. Though combining to streams has a media consolidation benefit it

will have a negative effect on restore performance.

Another reason not to combine to streams is for multi-streamed backups of SQL, DB2, and Sybase subclients.

When these iDataAgents use a single subclient with multi-streaming enabled the streams must be restored in the

same sequence they were backed up in. If the streams are combined to the same tape, they must be pre-staged to

disk before they can be recovered. In this case not enabling combine to streams and placing each stream on

separate media will bypass the pre-staging of the data and also allow multiple streams to be restored concurrently

making the restore process considerably faster. Note that this only applies to subclients that have been multi-

streamed. If multiple subclients have been single streamed and combined to media they will NOT have to be pre-

staged prior to recovery.

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

56 - Data Movement Process

Moving Data to Secondary Copies

When you want to backup production servers you either schedule the job at the client level or through a schedule

policy. In either case you choose which server you want to backup. Once data is in the backup environment it is

no longer tied to the production server, instead it is managed by the storage policy. There are various methods for

copying data to secondary copies. The following methods can be used to move data to secondary copies:

Auxiliary copy

o Schedule

o On demand

o Save as script

o Automatic copy

Inline copy

Parallel copy

Deferred copy

Auxiliary Copy Operations

Before discussing auxiliary copy operations, a very important distinction in terms must be made. I am referring to

the difference between secondary copies and auxiliary copy operations Secondary copies allow you to configure

where the data will go (data path), how long it will stay there for (retention), and what data you want to copy

(subclient associations). An auxiliary copy operation determines when the data will be copied (scheduler) plus

media and resource management options such as the CommVault VaultTracker™ feature.

Auxiliary copy operation allows you to schedule, run on-demand, save a job as a script, or set an automatic copy.

When configuring Auxiliary copy operations there are several options you can configure:

Allocate number of drives to use during auxiliary copy

Which secondary copies you want to include in the auxiliary copy

Start new media and mark media full which can be used to isolate jobs on media

Vault tracker options which can be used to export and track media using VaultTracker™ policies and

reports

Job priorities to assign different job priorities for auxiliary copies

Automatic Copy

Most jobs run once during a day and a normal schedule can be used for auxiliary copies. The automatic copy

allows you set a check interval for source data to be copied. This can be a great advantage when jobs are being

run multiple times per day or if you are unsure when the source data will be available for copy.

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

Data Movement Process - 57

Example: A critical database is running transaction log backups every four hours. You want to run an auxiliary

copy of the source transaction logs to a secondary location, in this case a disk library off-site.

Solution: Schedule the transaction logs to backup every four hours. Then set the automatic copy option to check

for source data. If source data is present the auxiliary copy will run creating an additional copy of the data.

Inline Copy

The Inline Copy feature allows you to create additional copies of data at the same time you are performing

primary backups. This feature can be useful when you need to get two copies of data done quickly. Data is passed

from the client to the Media Agent as job streams. The Media Agent then creates two sets of device streams, each

going to the appropriate library. This can be a quick method for creating multiple copies but there are some

caveats:

Inline Copy is not supported if Client Side Deduplication has been enabled.

If the primary copy fails the secondary copy will also fail.

Since both copies are made at the same time twice as many library resources will be required which may

prevent other jobs from running.

Since backup data is streamed, data will be sent to both libraries simultaneously, which may cause

overall performance to degrade. Basically your job will run as fast as the slowest resource.

Inline copy receives two streams from the client server and sends those streams to two

different libraries.

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838

58 - Data Movement Process

Parallel Copy

A parallel copy will generate two secondary copy jobs concurrently when an auxiliary copy job runs. Both

secondary copies must have the Enable Parallel Copy option enabled and the destination libraries must be

accessible from the same Media Agent.

Two secondary copies are run in parallel through the Media Agent.

Deferred Copy

Deferring an auxiliary copy will prevent a copy from running for a specified number of days. Setting this option

will result in data not aging from the source location regardless of the retention on the source until the auxiliary

copy is completed. This option is traditionally used in Hierarchal Storage Management (HSM) strategies where

data will remain in a storage policy copy for a certain period of time. After that time period the data will be

copied to another storage policy copy and deleted from the source once the copy is completed. Although this

method was implemented since traditional HSM solutions worked this way, with CommVault software it is

recommended to copy data to multiple HSM copies to provide for disaster recovery as well as HSM archiving.

This concept will be discussed in more detail in the Compliance, Information Management & eDiscovery chapter.

CommVault Concepts & Design Strategies: https://www.createspace.com/3726838