Experiences with D/R Procedures Of ADABAS Data on Mainframes Natural Conference Boston Dieter W....

81
Experiences with D/R Procedures Of ADABAS Data on Mainframes Natural Conference Boston Dieter W. Storr May 2004 [email protected]

Transcript of Experiences with D/R Procedures Of ADABAS Data on Mainframes Natural Conference Boston Dieter W....

Experiences with D/R Procedures

Of ADABAS Data on Mainframes

Natural Conference Boston

Dieter W. Storr May 2004

[email protected]

May 2004 Dieter W. Storr -- [email protected]

2

May 2004 Dieter W. Storr -- [email protected]

3

Different Disaster Different Action

Unplanned downtime Machine outages Network outages Software failures

Disaster Site / data center loss Catastrophic failure

May 2004 Dieter W. Storr -- [email protected]

4

Leading Causes of DowntimeSource: DRJ Summer 2002, Volume 15, Number 3

29%

11% 10% 8%

Power Storm Flood TerrorismOutage Damage Sabotage

May 2004 Dieter W. Storr -- [email protected]

5

Other Causes of Downtime

Fire

Earthquake

Computer Crime

May 2004 Dieter W. Storr -- [email protected]

6

LA Times Downtime

Flood Damage 21 April 2002: Water was flooding through the Orange County

facility, 14-inch pipe that supplies the fire-sprinkler

system burst, half the facility standing in more than

a foot of muddy water

Affected areas: editorial, ad ops, IT,HR,

ADABAS was not affected

May 2004 Dieter W. Storr -- [email protected]

7

LA Times Downtime

Bomb Alarm 14 June 2002: A bomb was believed to have been left in the Bank

of America branch that’s set into the Times

Building

Security swept the building,

DBA’s observed the system from home

May 2004 Dieter W. Storr -- [email protected]

8

LA Times Downtime

Bomb Alarm 29 July 2002: An intruder claimed to have a bomb,

darted into the garage

Security swept the building,

OP stopped CA7 - so PLOGCOPY couldn’t start

automatically, two PLOG’s got full, ADABAS was

locked, DBA’s later started the PLCOPY jobs

manually

May 2004 Dieter W. Storr -- [email protected]

9

LA Times Downtime

Power Outage - 29 August 2002 (3:43 P.M.)

City (DWP) had a power grid, flood leaked into a

DWP transformer

There were actually 2 spikes/outages, the first started

the UPS switchover, which was interrupted by the

second, which took the UPS down.

May 2004 Dieter W. Storr -- [email protected]

10

LA Times Downtime

Power Outage - cont’

The network was back in service after a short delay. Our Unix-based servers were restarted, and checked.

There was no evidence of damage to the Sybase Adaptive Server Enterprise (ASE, formerly: Sybase SQL Server) servers.

May 2004 Dieter W. Storr -- [email protected]

11

LA Times Downtime

Power Outage - cont’ Mainframe recovery was delayed due to corruption to

the Hardware Management Console (HMC) OP did a power-on reset, which restored the HMC Operations IPLed, and Technical Support proceeded

with system checkout procedures. Although Enterprise Storage Server (ESS) had an error

indicator, it was still up and did not add to any outages IBM reset error indicator without impact.

May 2004 Dieter W. Storr -- [email protected]

12

LA Times Downtime

Power Outages - cont’ Started ADABAS servers manually: Parm Error 23,

DIB block remained after an abnormal termination Started all servers with IGNDIB=YES

18:25 ADABAS IS ACTIVE

NO ADAN58 Message

May 2004 Dieter W. Storr -- [email protected]

13

LA Times Downtime

ADAN58 Message (ADA71: ADAN5A)

ADAN58 BUFFER-FLUSH START RECORD DETECTED

DURING AUTORESTART.

THE NUCLEUS WILL T E R M I N A T E AFTER AUTORESTART. IN CASE OF POWER FAILURE, THE DATABASE MIGHT BE INCONSISTENT BECAUSE OF PARTIALLY WRITTEN BLOCKS.

O N L Y IN THIS CASE, REPAIR THE DATABASE BY RESTORE AND REGENERATE; OTHERWISE RESTART THE NUCLEUS.

ADAN5A: FILES MODIFIED DURING AUTORESTART: files

May 2004 Dieter W. Storr -- [email protected]

14

Power Failure During Buffer Flush

A B C D

E F C H

E F C D

old block

updated block

partially updated block on disk

May 2004 Dieter W. Storr -- [email protected]

15

Nucleus Restart After Power failure - IGNDIB=YES<snip>ADA200 00230 User exit 2 active. ADA201 00230 PLOG2 closed. ADAP3X2P submitted. ADAN21 00230 PROTECTION-LOG PLOGR1 STARTED ADAN02 00230 NUCLEUS-RUN WITH PROTECTION-LOG 00677 ADAL02 00230 2002-08-29 18:25:18 CLOGRS IS ACTIVE ADAN03 00230 ADABAS COMING UP ADAN5A 00230 FILES MODIFIED DURING AUTORESTART: ADAN5A 00230 00038 00057 00069 00072 00073 00074 ADAN5A 00230 00075 00076 00104 00138 00139 00148 ADAN5A 00230 00195 00221 00243 ADAN19 00230 RUNNING WITH ASYNCHRONOUS BUFFERFLUSH ADAN8Y 00230 FILE-LEVEL CACHING INITIALIZED ADAN80 00230 ADABAS DYNAMIC CACHING ENVIRONMENT ESTABLISHED. ADAN01 00230 A D A B A S V6.2.2 IS ACTIVE ADAN01 00230 MODE = MULTI I S O L A T E D ADAN01 00230 RUNNING WITHOUT RECOVERY-LOG ADA800 00230 User exit 8 active. <snip>

May 2004 Dieter W. Storr -- [email protected]

16

LA Times Downtime

Power Outage - cont’ Switched all PLOGs Checked batch and online There was no evidence of damage to any of the

ADABAS components.

May 2004 Dieter W. Storr -- [email protected]

17

Other LA Times Disasters

1965: Watts riots

1971: Sylmar quake 6.5

1987: Whittier punch 5.9

1992: LA riots

1994: Northridge quake 6.7

6 Feb 1998: El Niňo, flooding in B-1 computer room

15 April 1999: Power failure ‘news editing’

May 2004 Dieter W. Storr -- [email protected]

18

ADABAS Recovery

Command Log (CLOG) Failure - I/O Error Restore or reallocate/format the CLOG ADABAS will come up through Autorestart normally No data loss if CLOG is not used

CLOG

May 2004 Dieter W. Storr -- [email protected]

19

ADABAS Recovery

Protection Log (PLOG) Failure - I/O Error Restore or reallocate/format the PLOG Take a full back-up of the database ADABAS will come up through Autorestart normally Restart batch jobs

Restartable batch jobs = OK Non-restartable batch jobs = check

PLOGPLOG

May 2004 Dieter W. Storr -- [email protected]

20

ADABAS Recovery

TEMP and SORT Failure - I/O Error Restore or reallocate/format the TEMP/SORT dataset Different actions for the utilities

See the ADABAS Utilities manuals

TEMPSORT

May 2004 Dieter W. Storr -- [email protected]

21

ADABAS Recovery

DSIM Failure - I/O Error Restore or reallocate/format a DSIM dataset Different actions for the utilities

See the ADABAS Utilities manuals

DSIM

May 2004 Dieter W. Storr -- [email protected]

22

ADABAS Recovery

Recovery Aid Dataset Failure - I/O Error Restore or reallocate/format a RLOG dataset Prepare the RLOG dataset

ADARAI PREPARE RLOGSIZE / RLOGDEV…. Different actions for the utilities

See the ADABAS Utilities manuals Take a full back-up of the database

This will start the first generation of the RLOG dataset

RLOGR

RLOGM

May 2004 Dieter W. Storr -- [email protected]

23

ADABAS Recovery

ASSO/DATA Failure - I/O Error Copy PLOG twice - ADARES PLCOPY Restore or reallocate/format DATA dataset(s) Instead of reallocate/format and restore all DATA

volumes, System specialists can Reallocate and format the new volume Restore the VTOC chain Restore and Regenerate only files that were located

on the failed volume Otherwise, . . .

DATADATA

ASSOASSO

May 2004 Dieter W. Storr -- [email protected]

24

ADABAS Recovery

ASSO/DATA Failure - I/O Error Restore entire database

ADASAV RESTORE [OVERWRITE = for GCB] ADASAV RESTONL [OVERWRITE]include PLOG

Start nucleus with UTIONLY=YES Regenerate updates from end of last save (SYN2)

ADARES REGENERATE PLOGNUM=xxxADARES FROMCP=SYN2,FROMBLK=xxx

DATADATA

ASSOASSO

May 2004 Dieter W. Storr -- [email protected]

25

ADABAS Recovery

ASSO/DATA Failure - I/O Error Possible utilities need to be rerun (see ADARES):

ADALOD LOAD FILE=xxx ADALOD UPDATE FILE=xxx ADALOD UPDATE FILE=xxx,DDISN ADAINV INVERT FILE=xxx,FIELD=xx

Lock files to rerun utilities ADADBS OPERCOM LOCKU=xx

Unlock utility-only status ADADBS OPERCOM UTIONLY=NO

DATADATA

ASSOASSO

May 2004 Dieter W. Storr -- [email protected]

26

ADABAS Recovery

ASSO/DATA Failure - I/O Error Rerun the regenerate function for the relevant files Unlock the regenerated files

ADADBS OPERCOM UNLOCKU=xx Don’t repeat these steps if ADARES points out:

ADALOD LOAD FILE=nn ADARES REGENERATE FILE=nn ADADBS REFRESH FILE=nn

Nucleus is ready

DATADATA

ASSOASSO

May 2004 Dieter W. Storr -- [email protected]

27

ADABAS Recovery

WORK 1 Failure - I/O Error Restore or reallocate/format the WORK dataset Restore and regenerate the entire database to avoid

inconsistencies: open transactionsSee ASSO/DATA failure

WORK2

WORK1

WORK3

May 2004 Dieter W. Storr -- [email protected]

28

ADABAS Recovery

WORK 2/3 Failure - I/O Error End the database normally (ADAEND) to avoid open

transactions in part 1 of WORK Restore or reallocate/format the WORK dataset Restart the database normally If database abends then restore and regenerate the

entire database - see ASSO/DATA failure

WORK2

WORK1

WORK3

May 2004 Dieter W. Storr -- [email protected]

29

ADABAS Recovery

Failure in Data Storage Blocks//DDSIIN DD DSN=SAVE.SIBA….// DD DSN=PLCOPY.LOG1…// DD DSN=PLOCPY.LOG2…//DDCARD DD *ADARES REPAIR DSRABN=xxx-yyyADARES FILE=n1,n2,n3

Failure in DSSTADADCK DSCHECK FILE=xxxADADCK REPAIR

DS

DS

DS

DATA

CALL SAG ! !

May 2004 Dieter W. Storr -- [email protected]

30

ADABAS RecoveryNucleus Ends With RC 77 Not restartable No more space for Checkpoint File (CP) Rename old WORK Allocate/format new WORK with old space Change high-used RABN and high-used ISN Restart nucleus with new WORK and UTIONLY=YES Nucleus is in “crippled mode” - no user has access Expand the database Stop the nucleus normally Rename old WORK and restart the nucleus with old

WORK (autorestart)

CP

CP

ASSO

DATA

May 2004 Dieter W. Storr -- [email protected]

31

ADABAS RecoveryNucleus Ends With RC 77 Not restartable No more space for user files Rename old WORK Allocate/format new WORK with old space Restart nucleus with new WORK and UTIONLY=YES Nucleus is in “crippled mode” - no user access Expand database Stop nucleus normally Rename old WORK and restart nucleus with old

WORK (autorestart)

User

ASSO

DATA

User

May 2004 Dieter W. Storr -- [email protected]

32

ADABAS RecoveryNucleus Abends - Missed DE ValuesDescriptor is marked in FDT as DE, value doesn’t

exist in ASSO, but in DATA.

Check: ADAICK ICHECK FILE=xxx[,NOOPEN] ADAVAL VALIDATE FILE=xxx,DESCRIPTOR=yy

Solution 1: ADAULD UNLOAD FILE=xxx,UTYPE=EXF ADALOD LOAD FILE=xxx,LWP=yyyyKSolution 2: ADADBS RELEASE FILE=xxx,DESCRIPTOR=yy ADAINV INVERT FILE=xxx,FIELD=yy,LWP=...

ASSO DATA

CALL SAG ! !

May 2004 Dieter W. Storr -- [email protected]

33

Back-up Possibilities ADASAV to tape / disk Including Fast Dump Restore, DFDSS Delta Save Facility (DSF) Delta Save QDUMP (Legent) Disk mirroring (hardware level)

FlashCopy of Enterprise Storage Server (ESS) Peer-to-Peer Remote Copy Extended Distance (PPRC-XD) OC-3 links two EMC disc arrays

Replication Stand-by systems Restore and Regenerate Entire Transaction Server

ASSO

DATA

May 2004 Dieter W. Storr -- [email protected]

34

ADABAS Disaster Recovery

How to back-up

Collect recovery data

Restore w/o nucleus

Start nucleus w/ UTILONLY=YES

Regenerate w/ nucleus

Switch UTIONLY=NO

May 2004 Dieter W. Storr -- [email protected]

35

21:00 01:00 02:00 03:00 8:00 - 11:00 12:00

ADAP1BKFOnline SAVE

ADAP1BKFOnline SAVE

ADAP1PLC(FEOFPL)

ADAP1PLC(FEOFPL)

ADAP1PLCPLOG Switch

ADAP1PLCPLOG Switch

BRM/ABARSSeveral Jobs

BRM/ABARSSeveral Jobs

ADAP1BKOCopy Tapes

ADAP1BKOCopy Tapes

ASSO / DATA / WORK / etc.

Pick-up by Recall

PDS, GDGs etc.

DFDSSFull-Volume Back-up

DFDSSFull-Volume Back-up

Weekly

ADABAS 6.2.2 Back-up at LA Times

May 2004 Dieter W. Storr -- [email protected]

36

Date DB GB Cartridge3490 Silo

Number of3490 Carts

Disk3390(3399)

4/038/03

1 4.94.9

15 min 2 < 2 min< 2 min

4/038/03

2 30.036.7

150+ min224+ min

42 < 35 min< 45 min

4/038/03

3 11.617.1

110+ min 19 < 15 min< 22 min

4/038/03

4 9.79.9

90+ min 9 < 15 min< 15 min

4/038/03

5 5.27.3

28 min 5 < 5 min< 7 min

Production Database Back-ups

ADASAV SAVE BUFNO=2,TTSYN=60Record format . . . : VB Record length . . . : 27994Block size . . . . : 27998BUFNO=30

May 2004 Dieter W. Storr -- [email protected]

37

Back-up to SMS Disk Pool

Run times are consistently at least

80% lower when writing to disk

instead of cartridge

Run times are consistently around

60% lower when copying from disk to

cartridge (compared with cart to cart)

DFSMShsm, automate your storage

management tasks,

SMS Production Storage Pool

DFSMShsm

May 2004 Dieter W. Storr -- [email protected]

38

Back-up to Disk Pool

No cartridge errors

No cartridge drive errors

No cartridges get accidentally ejected from the silo

Smaller back-up window

Smaller maintenance windows

Less impact to application processes

Greater confidence that the data you need will be

there when you need it

May 2004 Dieter W. Storr -- [email protected]

39

IBM Magstar 3494/Virtual Tape Server

Linear design 1 - 18 frames

Conf. Flexibility SCSI, FC, ESCON,

FICON 3590, 3490E, VTS

High availability Dual robotics Dual library manager

>42 old 3490 carts will fit on 1 new 3494 cart

5 x 3390 volumes fit on one 3494 cart

One 3494 cart can be read in 45 seconds into the VTS disk cache (raid-5)

May 2004 Dieter W. Storr -- [email protected]

40

Virtual Tape Concept

Virtual tape drives Appear as multiple 3490E tape drives 3490E Media 1 and 2 support Shared / partitioned like real tape drives

Tape Volume Caching All data access is to cache Improves ‘mount’ performance LRU Cache management

Volume Stacking Fully utilizes physical cart capacity Reduces physical cart requirement Reduces footprint requirement

Virtual Volume 2

Magstar 359030/60 GB capacity*

Logical Volume 1

. . .

VirtualDrive

1

VirtualDrive

n

180 181 19F

Virtual Volume 1

Virtual Volume n

TapeVolume Cache

VirtualDrive

2

Logical Volume n

* assumes 3:1 compression

May 2004 Dieter W. Storr -- [email protected]

41

Performance Tests

Input Output MM.SS StorageAdabas Disk 42.63 526125 tracks 3390

Adabas VTS 46.43 31 log. 3490 tapes

Disk VTS 42.47 31 log. 3490 tapes

VTS VTS 48.38 31 log. 3490 tapes

Disk VTS 39.39 31 log. 3490 tapes

VTS 3590 47.86 1 phys. 3590 tape

Adabas 3490 216.27 51 phys. 3490 tapes

Adabas VTS 52.47 39 log. 3490 tapes

May 2004 Dieter W. Storr -- [email protected]

42

Collecting Data For Recovery

Block Ranges SYN1 - SYN2For ADASAV RESTOREFrom ADASAV SAVE PROTECTION LOG PLOGNUM=64, SYN1=4695, SYN2=4698

From ADAREPSYN1 06 UTI 2002-09-23 21:00:09 64 4695 DUAL ADAP1BKFSYNP 06 UTI 2002-09-23 21:00:12 64 4696 DUAL ADAP1BKFSYN2 06 UTI 2002-09-23 21:01:37 64 4698 DUAL ADAP1BKFSYNV 0A UTI 2002-09-23 21:01:40 64 4699 DUAL ADAP1BKFSYNV 0A UTI 2002-09-23 21:01:40 64 4700 DUAL ADAP1BKFSYNV 28 UTI 2002-09-23 21:02:08 64 4702 DUAL ADAP1PLCSYNP 28 UTI 2002-09-23 21:02:08 64 4703 DUAL ADAP1PLC<snip>EOD 00 ET 2002-09-23 23:30:03 64 4747 DUAL ADAPRREPSYNS 53 ET 2002-09-23 23:30:25 64 4749 DUAL ADAP1REPSYNV 28 UTI 2002-09-23 23:30:30 64 4750 DUAL ADAP1PLCSYNP 28 UTI 2002-09-23 23:30:31 64 4751 DUAL ADAP1PLC

May 2004 Dieter W. Storr -- [email protected]

43

Collecting Data For Recovery

Block Ranges SYN2 - EndFor ADARES REGENERATEFrom ADAREPSYN1 06 UTI 2002-09-23 21:00:09 64 4695 DUAL ADAP1BKFSYNP 06 UTI 2002-09-23 21:00:12 64 4696 DUAL ADAP1BKFSYN2 06 UTI 2002-09-23 21:01:37 64 4698 DUAL ADAP1BKFSYNV 0A UTI 2002-09-23 21:01:40 64 4699 DUAL ADAP1BKFSYNV 0A UTI 2002-09-23 21:01:40 64 4700 DUAL ADAP1BKFSYNV 28 UTI 2002-09-23 21:02:08 64 4702 DUAL ADAP1PLCSYNP 28 UTI 2002-09-23 21:02:08 64 4703 DUAL ADAP1PLC<snip>EOD 00 ET 2002-09-23 23:30:03 64 4747 DUAL ADAPRREPSYNS 53 ET 2002-09-23 23:30:25 64 4749 DUAL ADAP1REPSYNV 28 UTI 2002-09-23 23:30:30 64 4750 DUAL ADAP1PLCSYNP 28 UTI 2002-09-23 23:30:31 64 4751 DUAL ADAP1PLC

May 2004 Dieter W. Storr -- [email protected]

44

Collecting Data For Recovery

Dataset Name From Back-up Job (GDG)For ADASAV RESTORE

ADABAS.PRODOFFD.DB1.BACKUP.FULL.G0842V00 CATALOGED

May 2004 Dieter W. Storr -- [email protected]

45

Collecting Data For RecoveryDataset Names From PLOG Copy Jobs (GDG)

Matching block numbers 4695 - EndFor ADASAV RESTORE and ADARES REGENERATE

DDSIAUS1 OUTPUT VOLUME=WRK015, SESSION NR=64

FROMBLK= 1214, FROMTIME=2002-09-23 03:30:24 TOBLK= 4701, TOTIME= 2002-09-23 21:01:42ADABAS.PROD.DB1.PLOG.COPY.G7170V00DDSIAUS1 OUTPUT VOLUME=WRK015, SESSION NR=64 FROMBLK= 4702, FROMTIME=2002-09-23 21:02:08 TOBLK= 4748, TOTIME= 2002-09-23 23:30:03ADABAS.PROD.DB1.PLOG.COPY.G7171V00DDSIAUS1 OUTPUT VOLUME=WRK004, SESSION NR=64 FROMBLK= 4749, FROMTIME=2002-09-23 23:30:25 TOBLK= 4791, TOTIME= 2002-09-24 03:30:33ADABAS.PROD.DB1.PLOG.COPY.G7172V00

May 2004 Dieter W. Storr -- [email protected]

46

Recovery - Part 1 - W/O NucleusADASAV RESTONL

<snip>//RESTONL EXEC ADASAVRD//DDREST1 DD DISP=SHR,BUFNO=30,// DSN=ADABAS.PRODOFFD.DB1.BACKUP.FULL.G0842V00 //DDPLOG DD DISP=SHR,BUFNO=30,// DSN=ADABAS.PROD.DB1.PLOG.COPY.G7170V00//DDKARTE DD * ADASAV RESTONL BUFNO=2,OVERWRITE //REPORT EXEC ADAREP //DDKARTE DD * ADAREP NOFILE //

May 2004 Dieter W. Storr -- [email protected]

47

Recovery - Part 2Start the ADABAS nucleus with normal JCL (UTIONLY=YES)<snip>ADAN21 00215 PROTECTION-LOG PLOGR1 STARTED ADAN02 00215 NUCLEUS-RUN WITH PROTECTION-LOG 00064 ADAL02 00215 2002-09-21 21:20:29 CLOGRS IS ACTIVE ADAN03 00215 ADABAS COMING UP ADAN19 00215 RUNNING WITH ASYNCHRONOUS BUFFERFLUSH ADAN8Y 00215 FILE-LEVEL CACHING INITIALIZED ADAN80 00215 ADABAS DYNAMIC CACHING ENVIRONMENT ESTABLISHED. ADAN01 00215 A D A B A S V6.2.2 IS ACTIVE ADAN01 00215 MODE = MULTI I S O L A T E D ADAN01 00215 RUNNING WITHOUT RECOVERY-LOG ADA800 00215 User exit 8 active. ADA801 00215 ADAP1PLC submitted.

May 2004 Dieter W. Storr -- [email protected]

48

Recovery - Part 2 - With NucleusADARES REGENERATE<snip>//REGEN EXEC ADARES //DDSIIN DD DISP=SHR,BUFNO=30,// DSN=ADABAS.PROD.DB1.PLOG.COPY.G7170V00 // DD DISP=SHR,BUFNO=30,// DSN=ADABAS.PROD.DB1.PLOG.COPY.G7171V00// DD DISP=SHR,BUFNO=30,// DSN=ADABAS.PROD.DB1.PLOG.COPY.G7172V00//DDKARTE DD * ADARES REGENERATE PLOGDBID=215,PLOGNUM=64 ADARES FROMCP=SYN2,FROMBLK=4698 ADARES TOCP=EOD,TOBLK=00000ADARES TOCP=EOD,TOBLK=00000 not needed <snip>

May 2004 Dieter W. Storr -- [email protected]

49

Recovery - Part 3 - With Nucleus Lock files to re-run utilities

See regenerate report ADADBS OPERCOM LOCKU=fnr

or SYSAOS: A / I / L / F or modify command /F jobname,LOCKU=fnr

Unlock utility-only status for users ADADBS OPERCOM UTIONLY=NO

or SYSAOS: A / I / L / U or modify command /F jobname,UTIONLY=NO

May 2004 Dieter W. Storr -- [email protected]

50

Recovery - Part 3 - With Nucleus Re-run the utilities - if necessary

ADALOD LOAD / UPDATE / DDISN ADAINV INVERT FILE=xxx,FIELD=xx

Unlock files ADADBS OPERCOM UNLOCKF=fnr

or SYSAOS: A / I / L / F / N or modify command /F jobname,UNLOCKF=fnr

May 2004 Dieter W. Storr -- [email protected]

51

ASSO

ADASAV

DLOG

Delta Save

changed blocks

NUCLEUS

DDPLOGR1

DATA

ASSO

ASSO

DATADATA

Buffer Pool Delta Log (RABN) changed RABN

ADARES

PLCOPY

DSIM

DDPLOGR2

SAVE

DELTA

PLOG copy

DDSAVE1

DDDSIM

DSF=YES

DDSIAUS1

DSF=YES

DSF=YES

Dual Protection Log

Extracted

Blocks

Delta Save Facility (DSF)

May 2004 Dieter W. Storr -- [email protected]

52

Delta SaveADASAV

RESTORE

DSIM

DDDSIM

DSF=YES

DATADDDELT1-8

DDREST1

Full Image

Save

Online/Offline

Online Images

RABN

extracted

ASSO

RABN

from PLOG

Delta Save Facility

May 2004 Dieter W. Storr -- [email protected]

53

Delta Save QDUMP (CCA - now: TSI)

ASSO

DATA

MPM

ADABAS

und

Utilities

ADAIOR

QDUMP

RABN-WRITE

CSA

12346789

QDUMP

Read

Sub-

task

Write

Sub-

task

Internal

Buffer

ControlProgram

Front End

84318987

91239675

Read

Sub-

task

Write

Sub-

task

http://www.treehouse.com/qdump.shtml

May 2004 Dieter W. Storr -- [email protected]

54

Disk MirroringBenefits Asynchronous disk mirroring can

provide better physical protection by supporting extended physical distances.

No loss of committed transactions in synchronous storage (mirroring/RAID) on a CPU failure

ASSO

DATA

ASSO

DATA

May 2004 Dieter W. Storr -- [email protected]

55

Disk MirroringLimitations No protection from data corruption

introduced by the hardware / software Secondary site is not guaranteed to be

transitionally consistent, because data is moved at the disk/track/sector or bit level (in the case of asynchronous mirroring).

Client application must be re-started after failure and need to be aware of failure

ASSO

DATA

ASSO

DATA

May 2004 Dieter W. Storr -- [email protected]

56

Disk MirroringLimitations Synchronous mirroring and RAID devices

can add overhead to application performance.

Redundant/specialized high availability hardware/software can be expensive and restricted to use for backup purposes only.

Secondary copy of data is not available for use – low hardware utilization.

Need to replicate everything on disk, no selectivity of data replication

ASSO

DATA

ASSO

DATA

May 2004 Dieter W. Storr -- [email protected]

57

Example For Disk Mirroring

S/390 UNIX

S/390 UNIX

12-15 miles

OC-3 link

EMC 5700

EMC 5700

SRDF remote mirroredsynchronized

Back Up / Hot Site

SRDF remote mirroredsynchronized

Main Platform

May 2004 Dieter W. Storr -- [email protected]

58

Dedicated line broadband speeds and prices

T-1 - 1.544 megabits per second (24 DS0 lines) Ave. cost $400.-$650./mo.

T-3 - 43.232 megabits per second (28 T1s) Ave. cost $6,000.-$16,000./mo.

OC-3 - 155 megabits per second (100 T1s) Ave. cost $20,000.-$45,000./mo.

OC-12 - 622 megabits per second (4 OC3s) no price OC-48 - 2.5 gigabits per seconds (4 OC12s) no price OC-192 - 9.6 gigabits per second (4 OC48s) no priceSource: http://www.infobahn.com/research-information.htmprices updated: 16 March 2004

May 2004 Dieter W. Storr -- [email protected]

59

Peer-to-Peer Remote Copy Extended Distance (PPRC-XD)

PPRC = 60 miles - PPRC-XD = continent

ESS Shark

- IBM ESS DASD - HDSalso support PPRC

ESS Shark

FlashCopy

Also see TimeFinder from EMC

May 2004 Dieter W. Storr -- [email protected]

60

External Back-up SystemsFast Copy of Data Snapshot

No data movement A virtual copy by copying pointers

Copy Process Physical copy asynchr. from the log. Copy No impact on applic. on the original data

Specific Hardware Required Software works only with the hardware

Work on Volume Level Some snapshot only tools work also on

dataset level

May 2004 Dieter W. Storr -- [email protected]

61

Snapshot & Physical Copy

IBM Hardware: Enterprise Storage Server Software: Flashcopy

http://www.share.org/proceedings/sh98/data/S3087.PDF

EMC2

Hardware: Symmetrix Remote Data Facility Software: EMC TimeFinder

http://www.emc.com/interactive_center/media/timefinder/tf_noRC.html

May 2004 Dieter W. Storr -- [email protected]

62

How It Works

Read / update

PhysicalBackup

PhysicalBackup

SnapshotSnapshot

Read / updateRead only

snap

Pre-defined time window

Suspend Resume

SourceData

SourceData

Read only: update requests are queued

Source: SAG

May 2004 Dieter W. Storr -- [email protected]

63

ReplicationBenefits Warm standby systems can be

configured over a Wide Area Network, providing protection from site failures.

Ability to more quickly swap to the standby system in the event of failure, as backup database is already on-line.

Data corruption is typically not replicated as transactions are logically reproduced rather than I/O blocks mirrored.

May 2004 Dieter W. Storr -- [email protected]

64

ReplicationBenefits Warm standby systems can be

configured over a Wide Area Network, providing protection from site failures.

Ability to more quickly swap to the standby system in the event of failure, as backup database is already on-line.

Data corruption is typically not replicated as transactions are logically reproduced rather than I/O blocks mirrored.

ASSO

DATA

WORK

ASSO

DATA

WORK

May 2004 Dieter W. Storr -- [email protected]

65

ReplicationBenefits Automatic switch over for clients using a

switching mechanism, no client restart needed.

Originating applications are minimally impacted as replication takes place asynchronously after commit of the originating transaction.

The warm standby database is available for read-only operations, allowing better utilization of backup systems.

ASSO

DATA

WORK

ASSO

DATA

WORK

May 2004 Dieter W. Storr -- [email protected]

66

ReplicationBenefits Ability to resynchronize and easily switch

back to primary system when it becomes available without loss of data.

ASSO

DATA

WORK

ASSO

DATA

WORK

May 2004 Dieter W. Storr -- [email protected]

67

ReplicationLimitations Warm standby system will be out-of-date

by transactions committed at the active database that have not been applied to the standby.

Protection is limited to components supporting Warm Standby (e.g. DBMS data sources may be protected but file systems may not be supported).

ASSO

DATA

WORK

ASSO

DATA

WORK

May 2004 Dieter W. Storr -- [email protected]

68

Entire Transaction Propagator

The Entire Transaction

Propagator allows for

asynchronous data

replication.

Replicated data can be

updated and

synchronized with

master data at user

specified intervals.

May 2004 Dieter W. Storr -- [email protected]

69

OS/390 Recovery ProceduresPrepared by the Mainframe Recovery Team

Recovering The OS/390 platform

The ABARS aggregates

The ADABAS databases

May 2004 Dieter W. Storr -- [email protected]

70

R e c o v e rR e m a in in g

S y s te mV o lu m e s

M a in fra m e R e c o v e ry P ro c e d u re s

P r e - IP L P r o c e d u r e s

P o s t - IP L P r o c e d u r e s

R e s to re S Y S R .D R PL ib ra r ie s

R e s e rv e C y p re s sT a p e D r iv e s

IP L S u n G a rd F lo o rS y s te m ; C h e c k

S e tt in g s

In i t ia l S e tu p

C o n n e c tT im e s a n dS u n G a rdC a ta lo g s

R e s to reS Y S 0 0 2

a n dO S 7 P C 0

C o p y a n dP r in t

S Y S L O G

C h e c k C lo c k a n dR e s e t , i f N e e d e d

C h a n g e J E S 2p a rm to

P = N O R E Q

R H S M T R E PR H S M D IS MR H S M D E L V

R S M S W O R KR S M S P R M

V e r ify S h ip m e n tsfro m R e c a ll

L o a d O S /3 9 0D o c u m e n ta t io n

in to B o o k M a n a g e r

G o to P re - IP LP ro c e d u re s

In it ia l iz eP ro d u c t io n

V o lu m e s

In it ia l iz e W o rkV o lu m e s

IP L T im e sS y s te m

B e g in A p p lic a t io n(A B A R S ) R e s to re s

M . M a k o fs k e , 7 7 2 6 3D ra f t o f J a n u a ry 2 4 , 2 0 0 2

R e s to re R e m a in in gS y s te m C a ta lo g s

R e s to re H S M a n dT M C D a ta s e ts

R e s to re P a g eV o lu m e s

In s e r tT h ird -P a r ty

S o f tw a reP a s s w o rd s

Im p o r tM V S C A TC a ta lo gE n tr ie s

V A R Y O F F W o rk ,P ro d u c t io n a n d

P a g e P a c k s

R e s to re T im e sP R O C L IB s

R e s to re A D A B A SP ro d u c t io n

V o lu m e s

May 2004 Dieter W. Storr -- [email protected]

71

OS/390 D/R Times (SUNGARD) About 2400 tapes

Shipping time from storage to the mainframe ? 4 hours ahead for tape staging

OS/390 and ABARS aggregates 5 hours planned, 7+ hours with problems

ADABAS databases Approx. 2-3 hours for tape restore and regenerate Next test Nov 1: approx. 45 minutes from disk pool

May 2004 Dieter W. Storr -- [email protected]

72

Experiences From D/R Tests Problems to IPL on a strange CPU (6 hours duration)

Initial setup (restore SYS.. Libraries) Pre-IPL procedures (restore Adabas, work, spool volumes, etc) Post-IPL procedures (DFHSM in disaster mode, etc.) Application restores

Tape drive offline problems, Import MVSCAT typo errors, etc.

Recovered wrong volumes, generation errors

Initialize work volumes - conversion to SMS (DFSMShsm)

TMC recovery problems caused BRM recovery problems, too

May 2004 Dieter W. Storr -- [email protected]

73

Experiences From D/R Tests Sent wrong cartridges with system dates to storage

Less channels for tapes on our offsite (2 instead of 4) = double restore time

May 2004 Dieter W. Storr -- [email protected]

74

Experiences From D/R Tests

RESTONL abended with SB00, no PLOG restored, Recovery

Aid flag was on at the saved database.

REGENERATE deleted file and pointed out to repeat the

ADALOD job but the input dataset was not saved

We did a full volume restore (DFDSS), restored the

database and forgot to format the dual protection logs.

Missed protection logs

BRM restored wrong aggregates

Missing full-volume restores - (Database 2)

Missing volumes in Work Storage Pool - (Database 3)

May 2004 Dieter W. Storr -- [email protected]

75

Experiences From D/R Tests

BRM: Back-up and Recovery ManagerABARS: Aggregate Back-up and Recovery Support(ABARS = not: Air conditioning and refrigeration industry services <smile> ) Recovered (-1) Aggregates instead of (0) – (all Databases) Recovered only SOME files on Aggregate (0) - (Database 1)BRM/ABARS was not properly recovered (wrong version of BRM database) Once those problems were resolved (several hours later), the ADABAS recovery ran smoothly.

5 Databases (61.4GB) restored and regenerated in 3.5 hours (tape/cart)

May 2004 Dieter W. Storr -- [email protected]

76

How Far is ‘Far Enough?’(http://www.drj.com/articles/spr03/1602-02.html)

Alternate Facility

Offsite Storage

Facility

Answer = 105 miles

…so the survey

May 2004 Dieter W. Storr -- [email protected]

77

Lessons Learned (http://www.drj.com/articles/spr02/1502-07.html)

Distance is keyStreets, bridges, tunnels, airports are closed

Tape recovery is not effective

All applications are critical

Inconsistent back-up is no back-up at all

People-dependent processes do not suffice

Two sites are not enough

People are irreplaceable; so is information

May 2004 Dieter W. Storr -- [email protected]

78

Lessons Learned (http://www.drj.com/articles/spr02/1502-07.html)

Companies that relied on tape or on third-party

provider found in many cases they had difficulty

meeting their recovery time objectives

All disasters are possible

May 2004 Dieter W. Storr -- [email protected]

79

Helpful Links Software AG - ADABAS Recovery

http://www.softwareag.com/adabas/news/vers_7.htmhttp://servline24.softwareag.com/SecuredServices/ <Knowledge Center - ADABAS>

ADABAS Restart and Recovery (Operations Manual)http://servline24.softwareag.com/SecuredServices/ <Knowledge Center - Product Documentation>

University of Arkansas - D/R Planhttp://www.uark.edu/staff/drp/

Disaster Recovery Journal http://www.drj.com

May 2004 Dieter W. Storr -- [email protected]

80

Helpful Links FlashCopy

http://www.share.org/proceedings/sh97/data/S9111.PDFhttp://www.storage.ibm.com/hardsoft/products/ess/pubs/f2ahs05.pdf

Shark (ESS)http://www.almaden.ibm.com/cs/shark/ http://www.storage.ibm.com/hardsoft/disk/index.html

State of the Art Storagehttp://www.networkmagazine.com/article/NMG20010104S0002/2

EMC TimeFinderhttp://www.emc.com/products/software/timefinder.jsp

Entire Transaction Propagator (SAG)http://servline24.softwareag.com/SecuredServices/document/html/etp151/pdf/man.pdf

May 2004 Dieter W. Storr -- [email protected]

81

Thank you!

Questions?