Today I will cover the following topics
Transcript of Today I will cover the following topics
Today I will cover the following topics
The changes that have occurred to mainframe storage technology since DB2
was introduced
I cover the various IBM copy technologies as well as EMC’s and Hitachi’s.\
Finally I will cover how IBM’s and BMC’s utilities exploit these technologies
2
This section is not meant to be an in depth, cover every detail presentation of
mainframe storage. It is meant to be an overview in the drastic changes that
have occurred with mainframe storage technology since DB2 V1 was released.
When DB2 was released in 1983 the IBM 3330, IBM 3350, and IBM 3380
storage devices were in wide use. The drive capacities were 100 to 200 mb,
300mb, and 1.26 gb per drive respectively. As each model was released IBM
was able to reduce the relative footprint per megabyte, as well as reducing the
power consumption per megabyte. They were connected to the mainframe
system using bus and tag channel cables with a max distance of 400 feet (122
meters). This is in comparison to now where the storage offerings include the
IBM DS8800 series of storage systems that can have up to 3 petabytes of
storage depending on how it is configured. It can be connected to mainframes,
unix servers and windows servers. With FICON and Fibre Channel
connections the max distance has been extended to the point that the
computers connected to the storage can be in another building across town.
3
This picture is meant to bring home just how much storage technology has
changed.
The IBM Mass Storage cartridge holds 50mb or half of a 3330-1 and required
large complex hardware to be able to use it
The Samsung 128gb micro SD card came from one of my music players that
fits in my hand . The card has about a 1000 flac music files and is still well
over half empty
4
Storage used by DB2 in the 1980s had these common characteristics
• Each mainframe disk device was a single physical drive
• Since these physical drives tended to be rather large, the cabinets that
housed them required a bit of floor space
• This space requirement meant that data centers with large storage
requirements were also very large. It was not uncommon for disk storage to
be housed on multiple floors above and below the computer room. This
allowed large amounts of storage (for that time) to be connected to the
mainframe and still be within the 400 foot (122 meters) limit of the bus and
tag cables.
• Hard disk failures known as “head crashes” always meant an outage. At the
minimum an application was down. The worst case being the mainframe
that needed that drive was down. The outage was usually extended because
the field engineer had to determine exactly what was wrong. If they were
fortunate enough to have the correct parts on hand they could begin the
necessary repairs. If they didn’t then they had to have the parts shipped or
delivered from another location. Once the repairs were performed the
recovery process began.
5
Later cache storage was added to the storage controller for the 3380 storage
line. With the updated hardware and microcode updates to support the new
hardware, read response times were improved for data that was regularly
accessed.
1989 saw the introduction of the IBM 3990/3390. Drive capacity was
increased to 1.89gb for the 3390-1 and increased again for the 3390-2, 3390-3,
and 3390-9. The amount of cache was increased to allow more data to be
staged in cache. At this time Non-volatile cache was introduced which allowed
cached data to be updated without having to write back to the disk
immediately. The NVS cache used a battery backup to retain the data in the
case of a power failure.
ESCON channels were introduced which used fiber optic technology to
communicate between the mainframe and the controllers. This had two major
improvements. The first being that bus and tag cables were no longer needed
for devices that supported ESCON. The second was the 400 foot limitation
was no longer an issue. With addition of chained ESCON directors disk
storage and the mainframe could be as far as 60 kilometers apart (approx. 37
miles)
6
The entire approach to mainframe storage was changed with the
announcement and introduction of StorageTek’s Iceberg and arrival of EMC’s
Symmetrix line of disks.
By using the small form factor server drives the number of megabytes per
square foot of computer room floor was drastically reduced. Power
requirements were reduced as well.
Using large memory cache to hold the emulated volume images meant that
reads and writes could be handled faster. The use of multiple processors
running software to handle I/O from the mainframe, to manage the cache
memory and volume emulation, and to read and write to the arrays of drives
that stored the actual data meant they could conceptually be connected to a
mainframe, a unix server, and a windows server and provide each with its
storage requirements.
Setting up the drives that actually store the data in RAID arrays and the use of
hot spares helped address catastrophic disk failure. If a drive should fail the
field engineer could be dispatched to come replace the drive and everything
could continue operating without an outage.
7
While each mainframe storage vendor has their own implementation of
“intelligent storage” they all share some common characteristics
• They use small form factor drives, from SATA drives like you have in your
PCs to solid state disks
• The drives are configured in some sort of raid array with hot spares for
redundancy
• They all have large cache memories
• They all use multiple processors to manage I/O with the mainframe,
manage the cache memory, and the physical drives
Instead of a terabyte of data requiring lots of floor space, petabytes of storage
reside in an enclosure the size of a small walk-in closet
Additionally the migration from channels using bus and tag cables to channels
using ESCON improved speeds and allowed the storage to be located further
away. The migration from ESCON to FICON increased the speed data is
transferred, increased the number of concurrent I/O operations, and increased
the distance the storage can be from the mainframe
8
As the processing power, memory and internal communication grew, more copy
functions were moved to the storage controller. Now you can keep volumes in sync
within the data center, between data centers, as well as keeping the disaster recovery
site up to date. These functions are being done at the intelligent storage level rather
than using mainframe cycles. You can also make an on demand backup of data sets or
groups of data sets that you can use to recover quickly should a problem be discovered
during a scheduled batch update.
While IBM supplies a mechanism to perform these functions via Advanced Copy
Services and all the storage vendors support these functions in some form or fashion
all the storage vendors feel compelled to use different terminology for what is
effectively if not exactly the same thing. The rest of this presentation is an attempt to
cover what features each storage provider offers. What marketing names they use to
call these features and how they relate to the Advanced Copy Services functions.
In addition, the use of small form factor drives has allowed the support for SSD, SAS,
and SATA drives. This has enabled vendors to exploit automated tier software to
migrate the data to the appropriate performance drive based on customer defined
criteria. This is done within the controller in the background without any overt action
on the part of the application accessing the data.
The last section will cover how DB2 utilities exploit these features.
9
This next topic will cover the various disaster recovery and data migration
offerings from IBM.
These are covered in depth in the DFSMS Advanced Copy Services manual
(SC23-6847-00).
11
Peer to Peer Remote Copy is one feature described in the IBM’s DFSMS
Advanced Copy Services manual. It allows one volume to be kept in sync
with another volume within a data center or between data centers. The source
volume or primary is kept in sync with the target volume or secondary as long
as the PPRC relationship is kept active. This is known as a duplexed pair. The
actual copy operations are handled at the storage hardware level.
Metro Mirror was coined when the technology allowed larger distances
between storage boxes without impacting performance. This was important as
Metro Mirror uses synchronous PPRC. This means a write I/O is not complete
until the secondary volume has acknowledged the write I/O. This is good
because the data is safely mirrored. The downside of synchronous PPRC is that
as the distance increases between the storage devices, applications waiting on
that write I/O to complete are impacted. Latency increases because the data
cannot travel faster than the speed of light at best.
Max recommended distance for ESCON is 100 kilometers (62 miles) and 200
kilometers (124 miles) for FICON
12
This diagram is an example of a PPRC or Metro Mirror setup.
The ds8800 disk subsystem that is being used by Production has devices that
are in a PPRC/Metro Mirror relationship to the ds8800 subsystem that is used
by the Disaster Recovery site.
In the case of a DR event, they can run the crecover jobs at the DR site, run
any additional recovery jobs for the database subsystems and then begin
running at the DR site.
I do want to point out that there will need to be additional recovery operations
in this scenario. It would be quite surprising if the failure that caused the DR
event happened at a point where you could safely bring up your database
subsystems without having to run various recovery jobs to insure data
integrity.
13
The cquery command is used to determine the PPRC status of a device.
The cestpair command will establish the duplex pair between two devices.
The csuspend command will suspend the duplex operation between the two
devices.
The crecover command sets the status of the secondary device to simplex
where it can then be brought online to another system
These commands require the following information: the Storage Subsystem
ID (SSID), Storage Control Serial Number, Channel Connection Address
(CCA), and Logical Subsystem (LSS) of the primary and secondary devices.
14
XRC was created to address high availability disaster recovery. This is done
through the use of the system data mover, journal data sets, high speed
communications, storage hardware that has XRC architecture support, and
similar configurations at the remote site.
It does not have to be a single vendor environment, but the storage hardware
must have similar configurations, i.e. cache size, nvs size, number of devices
and similar capacity.
Like using DB2’s System Backup, XRC requires careful planning and
specialized configurations to insure it works as expected.
XRC is now called Global Mirror for Z/Series
15
Global Mirror uses PPRC – XD to keep primary volumes in sync with
secondary volumes using asynchronous communication over much greater
distances than what Metro Mirror supports.
Flashcopy is used to create point in time copies of the secondary volumes at
the remote site
Consistency groups are set up which tell the software which group of volumes
are part of some application or even some database subsystem to insure that all
writes to any member of the group is dependent on the other writes within the
group
Finally, there is also additional software to manage, monitor, and control the
activity to insure that the data that is at the remote site is operationally
consistent.
This is not intended to be an in depth look at how Global Mirror works, but
rather a high level view of a rather complex process.
16
Consistency groups are a group of PPRC duplexed volumes that comprise an
application or database subsystem. When defined as a consistency group all
I/O writes to those volumes are treated as dependent upon one another. When a
normal PPRC duplexed pair that is not in a consistency group gets an error, it
changes to a suspend state and write I/Os continue to occur to the primary
volume. This changes with a consistency group. In that situation when an error
that would prevent the write from being successful occurs, all of the devices
are put into a long wait state and all I/Os to those devices are held. This will
keep any inconsistent updates from being propagated to the remote devices.
The reality is that some of the inconsistent I/Os may have been transmitted,
but the Global Mirror software and process has a way of resolving that
situation.
17
Slide with automation
1.Write to local, Write complete to application*.
2.Autonomically or on a user-specified interval, consistency group formed on local
*If write comes in to local, write complete held only until CG formed on local (fast)
3.CG sent to remote via Global Copy (drain)
If writes come in to local, IDs of tracks with changes are recorded
4.After all consistent data for CG is received at remote site, FlashCopy with 2-phase commit occurs
5.Consistency complete to local
6.Tracks with changes (after CG) are copied to remote via Global Copy, and FlashCopy Copy-on-Write preserves consistent image
18
Global mirror keeps track of the updates to the consistency groups that have
been propagated so that should an error occur it can use the consistent
flashcopy image to restore the remote volumes to the consistency point.
These flashcopy volumes are not brought online to the remote system. They
are used to maintain the latest consistency point
20
This next section covers Flashcopy which is a copy process that provides a
point in time copy of extents from a source volume to a target volume.
The range of extents can be the first track of the volume to the last track of the
volume. It can also be a subset or even a group of extents.
This operation can occur between any pair of volumes within the same
enclosure.
I want to emphasize that Flashcopy works on extents, i.e. a range of tracks. It
is utilities like DFSMSdss and Innovation’s FDR that turn a Flashcopy data set
request into a range of extents.
Once a Flashcopy establish request is started a background copy is started by
default and the Flashcopy relationship will be ended once all tracks have been
copied from the source to the target
If NOCOPY is specified, no background copy is initiated, but tracks will be
copied as they are updated on the source. A write to the source will be held up
long enough to copy the track. The Flashcopy relationship will remain active
until all tracks are copied or a Flashcopy withdraw command is issued.
21
This slide is an attempt to describe what happens with Flashcopy.
When a copy data set is started with fastreplication required a background copy is
started within the storage controller.
If a write occurs to the source before the background copy has copied that track, it will
copy the track before allowing the write to the volume.
It is also possible to begin using the target dataset as the background copy is running.
If a read occurs to a track that has not yet been copied, it will be retrieved from the
source data set.
Once all of the tracks have been copied from the source to the target, the Flashcopy
relationship is ended.
If the NOCOPY keyword is specified the background copy does not occur. Instead as
each source track is written, the track data before the update is performed is copied to
the target location.
Other things to note are that volume flashcopies must be between like devices, just as
physical dataset flashcopies. Logical data set flashcopies can occur between different
device types as long as extended format data sets are copied to an extended format
SMS managed volume. See the DFSMSdss manual for considerations and exceptions.
My thanks to the tsmguru.com webpage for the example I have presented in this slide
22
There are other types of flashcopy besides volume flashcopy and data set
flashcopy.
Space Efficient Flashcopy involves special volumes called space efficient
volumes. The difference between space efficient volumes and regular volume
is that space efficient volumes share the same physical repository or pool of
storage. Since only the only data moved is the pre-update data the actual
physical storage needed should be much less. This would be useful if you want
to do volume backups to tape at a specific point in time. You cannot do a data
set space efficient flashcopy as it is a volume operation only.
Incremental Flashcopy reduces the amount of time the flashcopy relationship
is active. For instance when you take a volume flashcopy from vol1 to target1,
the background copy is initiated. Some batch cycle has completed and
everything is verified as good, you can take an incremental flashcopy before
the next batch cycle runs to have that point in time saved should you need to
recover back to that point.
23
What problem is remote pair flashcopy solving?
If flashcopy is used to make copies of db2 tables to be migrated to another
subsystem and the target volume or volumes is PPRC mirrored for DR
purposes, the DR site will be out of sync during the flashcopy operation
because the duplexed pair is suspended. Once the flashcopy is complete the
volumes are resynced. The problem is that this will cause all of that data that
was Flashcopied to be sent along the link to the remote volume. If this is
happening with multiple volumes it is possible to flood the link and take even
longer for the remote site to be synced up.
24
In this slide, a DB2 copy is run using Flashcopy from the primary device to
another device within the same controller.
Drive A and drive A’ are a duplex pair
Drive B and drive B’ are a duplex pair
A flashcopy occurs from drive A to drive B and the pprc relationships are
suspended
Once the flashcopy is complete A and A’ and B and B’ are set back to duplex
and both pairs are out of sync
All of the data that was copied from A to B plus any additional updates to B
must be written to B’. Any updates to drive A will also have to be written to
drive A’.
While these updates are being written the remote site is not an exact mirror of
the local site. If the local site should experience an outage the remote site
would not have the current data.
25
The solution to the problems described in the previous slide are solved by Remote Pair Flashcopy.
It is enabled by including the FCTOPPRCPRIMARY keyword with PMR or PMP specified in the DFSMSdss copy statement.
The requirements for remote pair flashcopy to work are that both the source volume and the target volume are mirrored at the both the local and remote site and are in the same enclosure at each location.
If they are, the flashcopy command is sent to the remote storage subsystem and performed there while it is being executed on the local subsystem.
This addresses the flooding of the link because the only data sent is the flashcopy command itself. Also both local and remote sites remain in sync.
When preserve mirror preferred (PMP) is specified and either one of the volumes is not mirrored or some other condition that would prevent remote pair flashcopy to be successful it will suspend the target volume if it is mirrored.
When preserve mirror required (PMR) is specified and either one of the volumes is not mirrored or some other condition that would prevent remote pair flashcopy to be successful the copy will fail.
26
In this slide, a DB2 copy is run using Remote Pair Flashcopy from the primary
device to another device within the same controller.
Drive A and drive A’ are a duplex pair
Drive B and drive B’ are a duplex pair
A Flashcopy occurs from drive A to drive B and FCTOPPRCPRIMARY(PMP)
was specified.
The Flashcopy request is sent to the remote controller and is performed on the
PPRC secondary volumes.
When both Flashcopies complete both sites are in sync. The only traffic across
the link was the Flashcopy request and any regular updates that have occurred
to the primary volumes.
27
EMC introduced the first Symmetrix product in 1990 with a capacity of 24
drives. This was a departure from the IBM 3390 type drives sold by IBM and
the other storage vendors of the time.
Raid-1 arrays were used to provide redundancy in case of drive failure. To
enhance performance both drives were used for read and writes.
EMC provided a way to create a temporary drive image through the use of the
Business Continuance Volume (BCV). This drive image could then be used to
take volume backups on another system or to make that data available on a test
system. It could also be used to mirror to an SRDF volume to migrate a point
in time copy of the data to another site.
As we go through this section you will see that many of the functions EMC
provides seem similar to those provided by IBM’s Advanced Copy Services.
There are differences though.
31
In 1994 EMC introduced Symmetrix Remote Data Facility which allowed
customers to replicate their data from one enclosure to another. With SRDF/S
the data was written synchronously, meaning the host write was not complete
until the cached image of the remote device was updated. Like PPRC,
performance was impacted as the distance increased.
With asynchronous SRDF the data is sent to the remote in intervals. Although
this means that the remote device is not an exact mirror of the local device,
this technique allows them to be at greater distances from one another and to
require less bandwidth than SRDF/S
32
SRDF/Star extends SRDF so that SRDF replication can be one data center to
many data centers or many data centers to one data center.
Consistency Group functionality is provided to a set of SRDF devices with
SRDF/CG
33
The various software products under the Timefinder name all perform local
replication services.
TF/Mirror uses the pool of BCVs to create temporary copies of volumes
TF/Clone is EMCSNAP. It can perform snapshot copies of datasets or
volumes. In a RAID-5 or Raid-6 environment the TF/Mirror operation is
converted to a TF/Clone call when using clone emulation mode.
TF/CG product is the Consistency Group component.
34
To create a mirrored pair you need to issue an establish command with the BCV device number and the Source device number
When you wish to use the BCV image for local backups or to bring it online to another system, you issue the split command specifying the BCV or BCVs to be split. Unlike PPRC the BCV can be brought online without issuing an additional command to the control unit
The query command provides information about the state of the device including if it is mirrored and to what it is mirrored
The Re-Establish command will mirror the BCV to the source and copy any updated tracks to the BCV
The Restore command copies the data from the BCV back to the source volume. One example where this would be useful is if you are testing a database migration. After you have mirrored all of the volumes used by the DB2 while it is down, you can split them. Run through the migration process. To roll it back, bring down the DB2 subsystem and issue a restore for all the volumes and you can start all over again.
The Incremental Restore is different than the Restore because it only restores the changed tracks. The controller keeps track of any source tracks that have been updated. If an Incremental Restore command is issued, the only data copied back are the BCV tracks that correspond to the updated source tracks.
35
EMC’s EMCSNAP product was rebranded as TF/Clone
It provides the ability to create point in time copies of data sets or volumes and
does not require BCVs. As mentioned before when TF/Mirror operations that
are performed in a raid-5 or raid-6 environment with clone emulation or at
Enginuity 5874 level or higher, TF/Clone is used instead.
TF/Clone is also used to create space efficient snapshots using virtual devices
or VDEVS. Virtual Device snapshots track the pre-updated data through the
use of pointers and tables. It uses the SNAPPOOL to store the physical data.
The VDEVs are actual devices that can be brought online and used by the OS.
SNAPPOOL devices are not available to the OS.
The TF/Clone operations can be found in the Mainframe Enablers Timefinder
Clone Mainframe Snap Facility Product Guide
As with SRDF and TF/Mirror, TF/Clone also supports Consistency groups
36
Over time due to customer demand, EMC has put in support for the Advanced
Copy Service APIs. This allows customer to use a common command set to
manage all of their DASD rather than one set for EMC, one for IBM, and one
for Hitachi.
EMC enables different features in the storage controller’s configuration. The
Compatible Peer feature and Extended Compatible feature provide PPRC and
XRC support. The underlying actions are still using SRDF and other existing
EMC functions. Flashcopy is supported natively as of Enginuity level 5773
Remote Pair Flashcopy is supported on the latest Vmax subsystems. The
VMax 20K and 40K running Enginuity 5876.251.161.
If you wish to use EMC in a native global mirror setup it is not supported.
Also, you do not need to run EMC’s SCF started task to perform native
Flashcopy.
37
Hitachi Truecopy is Hitachi’s metro mirror or synchronous PPRC between two
physical storage subsystems
Hitachi Shadowimage is a version of PPRC used within the same storage
subsystem
A volume can have one Truecopy secondary and up to two Shadowimage
secondarys. It can also have up to three Shadowimage secondarys if there is no
Truecopy mirror
Hitachi Universal Replicator is a remote replication technology that uses
asynchronous communication and journaling to assist in consistency. There is
no distance limit. It is a pull technology, i.e. the target controller at the remote
site is managing the replication process
41
Hitachi supports some of the IBM Flashcopy offerings, such as Flashcopy
V02, SE, and Flashcopy Remote Copy
Like the other storage vendors Hitachi also offers tiered storage solution that
moves data automatically between SSD, SAS, and SATA drives using
customer defined rules.
42
This is the list of DB2 Utilities that can use Flashcopy to take an imagecopy or
read an imagecopy
The DFSMSdss address space actually does the allocation and various
Flashcopy commands via API calls from DB2 utilities
45
These are the zparm entries used to control if Flashcopy will be used and in
some cases whether it is required or preferred.
The defaults are in bold lettering
46
System Level Backup uses Flashcopy to create a point in time copy of all the
volumes that belong to a DB2 subsystem
It requires that DB2’s catalogs are not on the same volumes as user or system
catalogs
Separate SMS pools must be created and used for DB2 Logs and the rest of the
DB2 data
System Level Backups can be used to recover individual tablespace objects
Because of the different requirements that must be met to have a successful
System Level Backup it is important to plan everything that needs to be done
and to coordinate with the DASD team to insure everything is set up and
configured correctly.
47
These are the DB2 utilities that use the FLASHCOPY YES or FLASHCOPY
CONSISTENT keyword
When Flashcopy consistent is specified uncommitted work will be backed out
during a SHARELEVEL CHANGE Operation
If the check zparm is set to use Flashcopy, Check will use Flashcopy when
doing a SHARELEVEL CHANGE Check
48
In order for BMC’s utilities to exploit Flashcopy and the other hardware
functions for copy and recover an XBM subsystem must be up and running on
every member of a sysplex and all be part of the same XBM XCF group. This
is required to support snapshot in a data sharing environment. The Snapshot
for DB2 license module must be present and the snapshot for DB2 component
must be active.
The other items that must be in place for everything to work are:
• Allow Instant Snapshots must be set to 1 (yes) in SSI Options for Flashcopy
only support
• Allow SSI Assisted Snapshots must be set to 1 (yes) in SSI Options for the
other hardware support (PPRC, Shadowimage, and BCVs)
• If the ability to fall back to software snapshot is required, then a snaphsot
template must be active with some cache storage allocated and Allow SSI
Assisted Snapshots set to 1 (yes)
49
SUF also provides the ability to perform copies from the offline mirrors of
PPRC duplexed pairs, EMC BCVs and Hitachi Shadowimage mirrors
This is provided through a setting in the XBM subsystems SSI Options and the
Snapshot Template and do not require any special keywords from the BMC
utilities
The BMC Utilities can request data set level images or “Instant Snapshot”
copies through the use of a keyword
50
The BMC utilities that overtly use hardware are Copy Plus for DB2, Recover
Plus for DB2, Reorg Plus for DB2, and through its use of Copy Plus for DB2
and Recover Plus for DB2, Recovery Manager for DB2
The Copy Plus for DB2 syntax DSSNAP with Yes or Auto will use hardware
features to perform the copy.
Recover Plus for DB2 will use XBM SUF to perform a hardware based
recovery with XBMSSID specified and the image copy entry in bmcxcopy or
if the syscopy entry indicates it is a Flashcopy image copy
Reorg Plus for DB2 use of the SIXSNAP option tells REORG PLUS whether
to use the Instant Snapshot technology of XBM and SUF to create a copy of
storage-group-defined nonpartitioned indexes in SHARELEVEL
REFERENCE or SHARELEVL CHANGE partial tablespace reorg
51