Documentation for snapckp: Taking Ingres Checkpoints by ...

58
The Software That Manages eBusiness Computer Associates Documentation for snapckp: taking Ingres Checkpoints by Splitting Mirrors Presented To Pacific States Marine Fisheries Commission On 3 Feb, 2002 Site ID: 167710 S P Maybury Senior Consultant Computer Associates International, Inc. One Computer Associates Plaza Islandia, NY 11749 www.ca.com

Transcript of Documentation for snapckp: Taking Ingres Checkpoints by ...

The Software That Manages eBusiness

Computer Associates

Documentation for snapckp: taking Ingres Checkpoints by Splitting Mirrors

Presented To Pacific States Marine Fisheries Commission

On 3 Feb, 2002

Site ID: 167710

S P Maybury

Senior Consultant

Computer Associates International, Inc. One Computer Associates Plaza

Islandia, NY 11749

www.ca.com

1 Introduction

This documentation completes the deliverables for CA Project 13891 for the Pacific States Marine Fisheries Commission (CA Site ID 167710), executed by Ron Duursma and Simon Maybury. Scripts to perform mirror-split checlq)oints were written and installed on the PSMFC blueback machine during the week of 6-jan-03; this is the documentation for these scripts.

This document consists of a brief general introduction and a detailed description of the 15 steps involved in the snapckp cycle. The complete scripts are included in Appendices A-D.

2 General Introduction to Logical Volumes and Mirrors

In modem, mission-critical environments data must be available at all times. This poses a huge problem for database and system administrators who must maintain regular backups of the data while ensuring that the data remains available. Ingres has long supported the concept of Online backup and restore in which transactions are briefly stalled and users are permitted back into the database while the files arc being copied. Changes during the online copy are recorded in the Ingres journal and dump files and are used to get back to a clean copy at any desired point in time since the backup. This works well enough for most environments, but changes to the Ingres schema or installation meta-data are disallowed during the data archiving phase. At sites with huge databases the data archiving phase can last for many hours. We can avoid this constraint if we use disk mirroring.

There are many vendors of disk-mirroring software/hardware and the commands for manipulating the "virtual disks" is different for each vendor. The document below will therefore first describe the concept in general terms. We describe the steps that are common to this procedure regardless of which disk-mirroring package you may be using. Next we describe in detail the actual commands and procedures as they are performed on a Solaris machine using the Veritas Volume Manager.

Regardless of the terminology the vendor may use, disk mirroring is a concept in which real physical disks and real file systems arc mapped to multiple "virtual disks" and "virtual file systems". Typically an array of RAID disks will duplicate one or more times each of the real disks and real file systems such that a single real disk will map to multiple identical 'mirrored' copies of the same data. These copies are maintained in real time by the vendor mirroring software. If one of the physical disks fails - no worries -we have another up-to-the-second copy of the same data on one or more physical devices that we can use just as if it were the original. We can even replace the broken disk and the vendor soitware will 'catch up' the disk until it is once again an exact, backup copy of the online data.

3 How We Use Disk Mirroring for Fast Ingres Backups

We make use of the 'split-off then join and catchup' capability that is built in by the mirroring vendor. We modify the Ingres checkpoint template file to call a script that splits the set of virtual disks during the Ingres checkpoint. This is just after the Ingres flush of all data to disk, so both copies of the virtual data disks contain identical, consistent and static copies of the Ingres data. This virtual disk split takes only a moment since no real I/O occurs to copy the actual Ingres data.

Once the virtual disks are split we allow Ingres users to continue working immediately on the single virtual disk that remains online. Meanwhile we have a static and consistent copy of the data on the split-off disk and can copy it "at our leisure" without impacting online processing.

Once the data has been safely copied out, we use the 'catch up' capability of the mirroring software to bring the offline disk back in phase with the online disk so that we once again have multiple up-to-the-second copies of the data.

That's the simple concept. The devil is in the details - and those details are specific for each vendor of mirroring software or hardware. The details below are for the Veritas Volume Manager software on a Solaris system.

4 Walkthrough of the snapckp cycle

This section describes the starting configuration, a summary of the splitting cycle and a detailed examination of the scripts which implement the splitting.

Useful reference links to Veritas Volume Manager documentation: <http : //www • sun . com/prodiii^ts-n-BOlutions/hardware/docs/Software/Storaqe So£tware/VERITas Volume Manaqer/>

4.1 Starting Configuration

The ptagis3 database is 30GB in size and has 8 data locations. The dba is Ingres. The script is run as root.

The blueback filesystem includes 8 data volumes which do not belong to a volume group. Each of these data volumes hosts a filesystem used by an Ingres data location. The ptagis3 database is extended to these 8 locations; the entire database is contained in these 8 volumes, iidatabase (and hence iidbdb) does not share a filesystem with any of theseS locations.

Each volume consists of two mirror plexes. The "primary" plex (-01) is implemented in a RAIDS array. The other plex (-02) is implemented on a set of internal disks. During the splitting cycle the "primary" plex remains unaffected and continues to support the production volumes, whether mirrored or not.

Non-persistent fast resync was enabled for the data volumes. This causes the volume manager to maintain a memory-resident map of each plex with the equivalent of a checksum for each track of the plex. During resilvering, the checksums on corresponding tracks are compared. If the checksums are identical, no resilvering is performed, if the two plexes are very similar, this can reduce the resilvering time from over an hour to a couple of minutes.

4.2 Summary of the Splitting Cycle

The splitting cycle is described briefly below:

• cron starts the snapckp script.

• The plex configuration within the volumes is checked (both plexes must be present and ready to split).

• The database servers are closed and sessions removed (necessary only for offline ckp).

• An offline "snap" checkpoint is run: o The checkpoint template file runs snapshot for each location. For location 1 only, the

Unix file buffers are flushed and the mirror is split.

• A normal offline checkpoint of iidbdb is run.

• The database servers are reopened and production recommences.

• The "snapshot" volumes are fsck'ed and moimted alone elsewhere in the filesystem.

• A tar I gzip is performed for each location, running 8 in parallel. This creates the checkpoint files, which can be used directly in a normal roUforward.

• We remove any session with pwds in the newly-mounted snapshot volimies and unmount them.

• We perform a "snapback" operation to resilver the snapshot plexes from their partner plexes.

• The checkpoint files and supporting admin files are backed up to tape.

• Checkpoint files and various log files are managed.

• The system is now back in its starting configuration.

4.3 Detailed Description of the Splitting Cycle

There are two main scripts: • /usr/ingres/snapckp/snapckg ^ ShA*.c.W/ • /usr/ingres/i?napshot

a customized checkpoint template file: ,— bcv • $11 SYSTEM/ingrcs/fitcs/cktmpLsaa^

and several general purpose Ingres scripts which may exist in some form in many installations:

• close servers - closes all registered DBMS servers to new connections • show servers - to check whether the DBMS servers are closed • rra sessions - Forcibly removes all user sessions from all registered DBMS servers • open servers - opens all registered DBMS servers to new connections • freemb after ckp - Checks space in the checkpoint filesystem • runbg - runs a command in the background (not related to Ingres)

The following sections follow the execution of functions within the snapckp script, which covers the whole splitting cycle.

4.3.1 Check the Status of the mirrors REftDYTOSPLITO {

F_NAMF = "REflDYTOSPr.TT"

l o g " P r e p a r i n g a n d c h e c k i n g t h e s t a t u s o f t h e m i r r o r s . "

p rDb lem=0

f o r J i n SISNRPVOLLIST) do

ft Check t o s e e i f t h e - 0 2 p l e j t e s a r e r e a d y t o b e s p l i t log "Checking volS[i)-02" p l s t a t e = " S { V X B I N ) / v j ( p r i n t - p v o l $ l - 0 2 [ g r e p ' ' p i | awk ' ( p r i n t $7 ) " ' l o g " I n i t i a l s t a t e o f p l e x v o l $ i - 0 2 i s $ p l s t a t 6 " i f [ " S p l s t a t e " != "SNRPDOaE" ] t h e n

l o g " I n i t i a l s t a t e of p i e x v o l S i - 0 2 i a S p l s t a t e " v x p l e x c o n v e r t state=SNAPDONE v o l $ ! i } - 0 2 c h e c k s t a t 5

l o g " C u r r e n t s t a t e o f p l e x v o l S i - 0 2 i s S p l s t a t e " f i

d o n e } # e n d READYT03PLTT

SSNAPVOLLIST, defined at the start of the script, is a list of volume numbers for the volumes that arc to be split. The entire script assumes that it is the -02 plexes that will be split.

All the Veritas (vx) commands are in the SVXBIN location. This function relies upon a particular format for the output of vxprint -p. Compatibility should be verified whenever Veritas is upgraded.

Despite the name "SNAPDONE", this state means that the plex is READY to be split. We first check the -02 plex in each volume. If the state is not SNAPDONE, we attempt to force the plex into this state. If this is not successful we fail.

This function is rerunnable if the mirrors have not yet been split.

4.3.2 Set the DBMS Servers CLOSED

CLOSE DBMS() i

F_HAME="CLOSE_DBMS" l o g " S e t t i n g s e r v e r s c l o s e d on 5 (PR0SRV)" ST s u - i n g r e s - c " S ( S N A P C K P } / c l o s e _ s e r v e r s " c h e c k s t a t . 10

# Use the show_servers s h e l l s c r i p t t o v e r i f y t h a t t h e s e r v e r s a r e c l o s e d . i When t h e s e r v e r s a r e c lo sed "IIMONITOR> CLOSED" i s d i s p l a y e d in the o u t p u t , # "irM0NTTOB> OPEN" i s d i s p l a y e d i f t h e y a r e o p e n .

CHK C L S E - ' s u - I n g r e s - c " S f S N A P C K P ) / s h o w s e r v e r s " I g r e p G L O S E D I t a i l - l | a w k ' i p r i n t S 2 ) ' '

i f [ i S { C : H K _ C L S E ) ! - " C L O S E D " ] ]

t h e n p r i n t " \nS{F_NAMEJ: ERROR c l o s i n g s e r v e r s on S(PROSRV) ." p r i n t "SlF__NAME}: Have a DBA r u n 5 t S N A P C K P ) / c i o s e _ s e r v e r s on S(PROSRV}." p r i n t "S(F_NRME): c h e c k s t a t ERROR CODE 1 0 . \ n " / b i n / f a l a e c h e c k s t a t 10

e l s e p r i n t " \ n \ n S e r v e r s a r e CLOSED C o n t i n u i n g — "

f i ) # e n d CLOSE DBMS

This function closes the DBMS servers to new connections from any user except ingres (so beware cron jobs running as ingres that may start subsequently).

The closcservers script relies on the name server registry and will not close any unregistered DBMS servers.

It is considered that running these (*_servers, *_sessions) scripts as ingres at this level is the simplest solution. If for other reasons these scripts need to be runnable as root, then the "su - " would be unnecessary here.

This function is rerunnable at all times in the cycle.

4.3.3 Shut down the remote command server BMCMDSTPO {

F_NAME="RMCMDETP" 5 T S U - i n g r e s - c " n n c r n d s t p "

)

We shut down the rmcmd server because this script performs an offline checkpoint of iidbdb. If the iidbdb checkpoint is not to be performed, or is to be an online checkpoint, then RMCMDSTP (and ingstart -rmcmd in the OPENDBMS function) can be omitted.

If you are nmning one, this is a more elegant way of shutting it down than allowing rm sessions to remove the session.

If you are not running one, no harm is done.

This ftinction is rerunnable at all times in the cycle.

4.3.4 Remove any remaining ingres users in anticipation of an offline checkpoint

RMSESSIONSO i

F_NAME="RM_SESSIOHS" log "Removing any remaining i n g r e s s e s s i o n s from DBMS s e r v e r s " ST su - i n g r e s - c "5lSNAPCKP)/rm_sessions" c h e c k s t a t 20

log "S(F_NAME): Al l I n g r e s u s e r s Removed" ) ft end BM SESSIONS

This function will remove any remaining ingres sessions. If sessions cannot be immediately removed (eg they are running a long transaction, rolling back a large transaction or in an abnormal state) then it is possible that the offline checkpoint will fail to obtain its exclusive lock on the database.

This function is not rerunnable after the servers have been reopened.

4.3.5 Perform the SNAP Checkpoint

CKPSNAP() i

F_riAME="CKP_SNAP" log "Performing a bcv checkpo in t "

# Env var II_CKTMPL_FILE i s ignored in I n g r e s 2 .6 so we use the symbol t a b l e ST su - i n g r e s - c " i n g s e t e n v II_CKTMPL_FILE $iII_CKTMPL_BCV)" ST SU - i n g r e s - c "ckpdb - 1 +w 'Ifx' STRRGETDB" checkstat 25 ST SU - I n g r e s - c " i n g u n s e t II_CKTMPL_FILE"

# QQQ II_CKTMPL_FILE remains s e t on e r r o r ! !

# Save t h e dbg f i l e ST su - i n g r e s - c "mv ckpdb.dbg ckpdb.dbg-STARGETDB"

) # end CKP SNAP

With Ingres 2.6 SPl.the environment variable II CKTMPL FILE is ignored by ckpdb, so we use ingsetenv to place this assignment temporarily into the Ingres symbol table. One consequence of this is that it is not fail-safe; if the ingunset fails, the assignment remains, even after this process has exited.

This function is not rerunnable in general. Careful investigation must be made concerning the state of the snapshot plexes. To be safe, resilver the mirrors (SNAPBACK function) and restart snapckp from the beginning.

The +w flag will cause the ckpdb to wait until the last session exited, but would prevent all new sessions (including ingres) from starting until the checkpoint has run. So if the blocking lock remains in place forever, no sessions, even ingres sessions, will be able to start.

The entire snapckp process is compatible with online or offline checkpoints, with or without journal ling. To perform an online checkpoint (allowing users to update the database during the checkpoint), remove the -I flag from ckpdb. For this to be effective, also remove the CLOSESERVERS, RMCMDSTP, RM_SESSIONS and

V ^ \.^e' \ ''t\^-L/ ^\^ m^.lc^ i . f f e e ^ t e U c r ^ W^t | M ^ ^ 5 cWVf*»^+ m^l^anvsn^- 1 ^ v;ay s n ^ ^ k p iS Ar^t^. |«WiJ .4 ^ J : ^^^

.ewX {•n^l&^e^' rn^s C U S E ^ ^ ^ ^ R S , ^V\_^tfV>^Mc. r e ^ t ^

OPEN SERVERS functions. The benefit of an online checkpoint in this case is not the extra uptime of the system but that there is no interruption to the users' sessions at checkpoint time.

#x creates a ckpdb.dbg file with verbose diagnostics in pwd. It is unnecessary for the splitting process, as is the subsequent preservation of this file by renaming.

The file referenced by $11 CKTMPLBCV is of course the customized cktmpl file which performs the SNAP checkpoint. Only the WSDD line should be customized for the SNAP procedure; the WRDD line should reverse whatever will be done in the TAR GZIP function below (though this cktmpl file would not normally be used for a rollforward). Here is the WSDD line:

WSDD: /usr/local/ingres/snapckp/snapshot %M

%M is the location number. This allows the snapshot script to take action only for location 1:

4.3.5.1 snapshot

i f [ Sft -ne 1 ] then

echo Usage: Smodname l o c a t i o n number e x i t 1

f i

i f [ SI -eq 1 I then

log "sync sync sync" sync; sleep 5; sync; sleep b; sync

log "splitting plex 02 from data volumes" ST vxassist -o name=%v-snap snapshot volOl volO? vol 03 vol04 vol05 vol06 vol07 vnlOB checkstat 2

fi

log "completed successfully"

We perform a sync; sleep 5; sync... to ensure that the Unix file buffers are flushed to disk for all files in the target volumes. Although Ingres performs only OSYNC writes to the data locations, it is conceivable that other processes own open files in the volumes. If there are pending writes at split time, fsck or mount MAY have a problem later.

The splitting is effected by the vxassist snapshot statement. All listed volumes are split logically at an instant. This takes only a few seconds.

The snapshot process automatically creates a new volume for each of the split plexes. The name of this volume can be controlled somewhat; we add "-snap" to the source volume name.

4.3.6 Perform the iidbdb CHECKPOINT

CKP_IIDBDB() 1

l o g " P e r f o r m i n g p h y s i c a l o f f l i n e c h e c k p o i n t o f IIDBDB" $T s u - i n g r e s - c " c k p d b +] +w ' # x ' i i d b d b " checkstat 30 ST Su - ingres -c "mv ckpdb.dbg ckpdb.dbg.iidbdb"

} # end CKP IIDBDB

In this system II_D AT ABASE is not included in the split volumes, and it is simply for convenience that the iidbdb checkpoint is included here.

If II DATABASE were included in the split volumes, then a TAR_ZGIP of iidbdb should be performed after remounting.

If this step fails, it can easily be rerun, now or later, unless it is imperative that the iidbdb checkpoint is synchronized with the ptagis3 checkpoint.

4.3.7 Reopen the servers and restart rmcmd

OPRN_DBMS() I

F_HAME=OPEN DBMS

# Open s e r v e r s a n d v e r i f y t h a t t h e y op&ned l o g " S e t t i n g s e r v e r s o p e n " ST su - i n g r e s - c "$ (SNAPCKP) /open s e r v e r s " c h e c k s t a t 1

CllK_OPEN=''su - i n g r e s - c "S (SNAPCKP) / show__serve r s" [ g r e p O P E M I t a i l -1 | awk ' { p r i n t S ? ) ' ' i f [ [ $!CHK_OPEt]) \- "OPEN" ] ] t h e n

l o g "$iF_NAME}; ERROR o p e n i n g s e r v e r s " l o g "5(F_lil7iME> ; Have a DBA r u n S (SNAPCKP) / o p e n s e r v e r s . " l o g "S(F_NAME1; c h e c k s t a t ERROR CODE 3 5 . \ n " l o g " " d a t e " : S iENAME) r e s u m i n g p r o c e s s i n g "

e l s e Log " S e r v e r s a r e O P E N . . . C o n t i n u i n g . . . \ n "

f i

log "Restarting rmcmd" ST S U - i n g r e s -c " i n g s t a r t - rmcmd"

I # e n d OPEN DBMS

After the servers are reopened, production operations can resume (after only a few seconds interruption!). Consider that the production volumes are now not protectedbv the mirrors that were split. Having a third mirror or RAID5 plex for each of the volumes considerably increases the robustness of the disk subsystem through this phase.

The remote command server needs to be started only if you need it (for VDBA) and if you shut it down (for an offline checkpoint).

Once successfully past this point, DO NOT GO BACK to rerun any earlier function unless you are rerunning the entire snapckp script.

ro

di^

V^ii f oi>vO

\ ^ U -H <e

acceSi r user

\^ th^S,

.viS>fC-Vf -

4.3.8 fsck the snapshot volume FSCKSNAPI) I F_NAME="FSCK_SNftP"

volct=0 for i in S(SNAPVOLLIST) do

(( volet = Svolct + 1 ) 1 log "Running fsck on SISNAPDEVSTEM)/vol?{i|-snap" cmd="fsck -y S|SNAPDEVSTEM)/vol5{i}-snap" ST SSNAPCKP/runbg f s c k ^ s n a p S ( u o l c t ) "$cmd" £

done

log "Wait ing Tor background j o b s " wait

okc t=" I s / tmp/ f sck_snap*-ok | wc - 1 " if ( Sokct \= Svolct ] then

log "Attempted t o f sck Svo lc t v o l s ; succeeded fo r on ly S o k c f / b i n / f a l s e c h e c k s t a t 40

f i ) # end FSCK SNAP

Now that production has been restored, timing is not so critical. In this system where we will mount the snapshot plexes back on the same computer, we expect an overwhelming resource consumption for an hour or so while the data files are tar | gzipped. Appropriate activities during this period would include low-impact user access but NOT heavy batch processing.

To maximize the chances of a successful mount, wc perform an fsck -y on all the snapshot volumes., We run the 8 fsck's in parallel using the runbg script and the process takes about 6 minutes. This runbg script allows us to execute any command in the background.

After waiting for the 8 backgrounded fecks, we check the interface files to detect errors.

4.3.8.1 runbg # l / b i n / k s h t Wrapper for any command cmd t h a t t ouches a f i l e t o i n d i c a t e s u c c e s s o r f a i l u r e # Th i s e n a b l e s i t t o be run i n t h e background and f o r t h e p a r e n t p r o c e s s # to check fo r s t a t u s T=

modnairie=''bas6name SO"

i f [ - z " S I " ] then echo "Usage: Smodname task cmd" exit 1

else task^ $1

fi

if [ -z "S2" ] then

e c h o " U s a g e : Smodname t a s k cmd" e x i t 1

e l s e aiid=S2

f i

s t a t f i l e - / t m p / S t a s k

rm -£ S(statfile)* echo "Smodname: Scmd" eval Scmd err=S? if [ Serr -ne 0 ] then touch 5(statfile}.fail exit 1

fi

touch Sistatfile).ok

$ 1 or task is simply a string that will uniquely identify this instance of runbg from amongst all possible concurrently-running instances. Typically the task string consists of a descriptive string and a sequence number.

After completion, each runbg process will touch a Sstatfile.ok or Sstatfile.fail and exit. The parent process is responsible for checking for these files.

4.3.9 Mount the snapshot volumes as READONLY MOUNT_SNAP(} {

F_NAME="MOUNT_srlAP"

l o g " S a v i n g c o p y oT v f s t a b a s v f s t a b . b a k " cp / e t c / v f s t a b / e t c / v f s t a b . b a k

l o g " a p p e n d i n g mounL e n t r i e s f o r s n a p v o l s t o v f s t a b " c a t / e t c / v f s t a b . s n a p » / e t c / v f s t a b

f o r i i n SiSNAPVOLLIST) do

l o g " M o u n t i n g S ( S N A P D E V S T E M ) / v o l S ( 1 ) - s n a p a s r e a d o n l y " $T mount - r S ( S N A P D E V S T B M ) / v o l S l i ) - s n a p checkstat 45

done ) I end MOUNT_SNRP

After backing up vfstab, we add entries (previously prepared in vfstab.snap) for the snapshot volumes. Then we mount them readonly (read-write is unnecessary).

4.3.10 Verify that there is enough space and perform the effective CHECKPOINT TRB_GZIP() (

F_NRME="TAR GZIP"

t QQQ I f t h e r e a r e n o c k p f i l e s t h i s p r o c e s s g e t s i n t o t r o u b l e t We s h o u l d c h e c k s i z e _ o f _ l a s t _ c k p f o r a v a l i d number s i z e _ o f _ l a s t _ c k p = " 3 u - i n g r e s - c "SSMAPCKP/freemb a f t e r c k p STARGETDB" \

I g r e p LAST_CKP__MB | awk ' [ p r i n t S2 ) ' "" f r e e m b a f t e r c k p = " a u - i n g r e s - c " S S N A P C K P / f r e e m b _ a f t e r _ c k p STARGETDB" \

d

I grep EXCESS 1 4wk 'I print $2 ]''

# QQQ fo not run d e l e t e _ o l d e s t _ c k p because i t removed ckp r e f e r e n c e s i from t h e cnf f i l e . Need t o run ?ckp c l eanup i n s t e a d su - i n g r e s - c " a l t e r d b STARGETDB - d e T e t e o I d e s t c k p '

i f [ S(freerab_af ter_ckp] - g t 100 ] t hen

log "Performing t a r i g z i p of snap d i r e c t o r i e s "

# Get l a s t ckpno ckpno="su - i n g r e s - c " in fodb STARGETDB" | awk '

/Checkpoint H i s to ry for J o u r n a l / { g e t l i n e ; g e t l i n e ; whi le [ g e t l i n e > 0) (

i f ( 50 ~ /Checkpoint H i s t o r y for Dump/ ) [ b reak ) l a s t c k p n o =- S6

) p r i n t l a s t c k p n o

' ' c tp- c\e2h.vf U L^

(UL^ Or^ •^W f^5^ V>

log "Found last checkpoint number = Sckpno" if [ -7, "Sckpno" 1 then /bin/false

elif [ Sckpno -eq 0 ] then /bin/false

fi checkstat 50

volct-O f o r i in SSNAPDIBLIST do

(( volet = Svolct I I ) ) c k p f l l e = " p r i n t f "c%04d%03d.ckp" Sickpno) S l i ) ' ckppath^^Sckpdir/Sckpf i i e sou rced i r=^ /us r /dbS i i} - snap / ing l I / Ing re s /da t a /de fau l t /STARGETDB log " S t a r t i n g t a r l g z i p of S s o u r c e d i r " cfnd="cd S s o u r c e d i r ; / h i n / t a r cvf - . | g?: 1 p > Sckppath" ST su - i n g r e s ~c "SSNAPCKP/runbg t a r g z i p S l v o l e t ) 'Scmd' " S

done

log "Wait ing fo r background t a r g z i p j o b s " wa i t

log "Al l background jobs f i n i s h e d "

o k c t = " l s / t m p / t a r _ g z i p * . o k | wc - 1 " i f [ Sokct != Svo lc t 5 then

log "Attempted to t a r l g z i p Svolc t v o l s ; succeeded fo r on ly Sokct" /bin/false c h e c k s t a t lO

fi

log "Background checks OK"

It Check t o t a l s i z e of ckp f i l e s and compare t o p rev ious ckp t o t a l s i z e _ o f _ t h i s _ c k p = ' s u - i n g r e s - c "SSNAPCKP/freemb_after_ckp STARGETDB"

I g rep TAST^CKP MB | awk '{ p r i n t S2 } ' "

i f [ "expr S s i z e _ o f _ t h l s _ c k p - S s i z e _ o f _ l a s t _ c k p ' - g t &00 ] then

log "** '** Checkpoint f i l e s much s m a l l e r than p r e v i o u s ckp - Check \\ log "***** SIZE_OF_THIS_CKP = S{size_of_this_ckp)MB" log " and SiaE_OF_PREV_CKP = S1size^of_last_ckp)MB •***"•

e l s e log "****' SIZE OF_THIS CKP = ${siz6 of_this ckp}MB"

log " and S1ZE_0F_PREV_CKP = SI size of_last_ckp)MB ****'" fi

else

log "*»*** Not enough room fo r checkpo in t s ***•-•" log log " e s t i m a t e d ckp s i ^ e : S s i z e _ o r _ i a s t _ c k p Mb" log "Es t imated space a v a i l a b l e a f t e r ckp: Sfreemb_after_ckp Mb" log "P l ea se ensure s u f f i c i e n t space on Sckpdir" l o g log "Run SISNAPCKP)/ckpcleanup t o remove unnecessa ry f i l e s . " log " I f t h i s i s not s u f f i c i e n t , e i t h e r lower ckp^keep count" log " in ckp_c]eanup or ge t more d i s k s p a c e . " log "SIF_NAME|: ERROR CODE 50"

/bin/false checkstat 50

fi ) # end TAR GZIP

Much of the TAR GZIP function is concerned with estimating the space required for the gzip files and comparing it with the space available in the checkpoint filesystem.

The essential part of this function is simply running tar | gzip from each snapshot-mounted data directory to checkpoint files in the checkpoint directory. All 8 processes are run in parallel using runbg. Unless there is a LOT of CPU available or you do not have large amounts of cache available to the checkpoint volume filesystem, this process will probably be CPU-bound.

Currently this function runs alterdb -delete oldest ckp. Although simple, this method has the disadvantage that it removes the checkpoint entry fi^om the database cnf file, rendering recovery to that checkpoint a multistep process.

It is recommended that delete oldest ckp is removed and that additional lines arc added to the FILE_MNGR function below, to remove checkpoint files older than (say) 3 days, or to count {?valid) checkpoints and keep only (say) three of them. The dump directory should probably contain 30 or 40 checkpoints worth of files because they are small and could conceivably be of use.

4.3.11 Ensure no one is in any $-CSNAPDIRSTEM} directories so we can unmount the filesystems. FUSER_STOP() { F_NAME-"F0SER_STOP" log "Removing processes with pwds in SNAP filesystems" for dir in SiSNAPDlRLTST) do

piDS^'/etc/fuser -c SISNAPDIBSTEM) S Idir)-snap 2>/dev/nuH~ for pid in "print SIPIDS}" do

log "Process S{pid) has a pwd i n SISNAPDIRSTEM)ISdir)" ps - e f I / u s r / b i n / g r e p SIp id) | / u s r / b i n / g r e p -v grep log "Termina t ing p rocess S(p id) with \ " k i l 1 - 9 \ " " $T k i l l -9 S{pid}

done done

i This i s c h e c k s t a t ERROR CODE 55

) ^

) S end FUSER STOP

It is possible that another process set a pwd somwhere within the mounted snapshot filesystems during the tar|gzip. To ensure a successful unmount, we first kill any such process.

4.3.12 Unmount the snapshot volumes UMOUNT_SNAP() ( F_NAME="UMOUNT_SNAP" log "Unmounting s n a p v o l s " s ta tus^O for i in ${SNAPVOLLTST) do

ST umount S(SNAPDEVSTEM}/uolSii}-snap II status=l done

if f Sstatus -eq 1 ] then log "Failed to unmount at least one snapvol" log "Exiting..." /bin/false c h e c k s t a t 60

f i

i Remove f s t a b e n t r i e s for snap v o l s because boot f a i l s i f t h e r e a r e I volumes t h a t cannot be f s c k ' d log "Removing v x f s t a h e n t r i e s for * - snap" cp / e t c / v f s t a b / e t c / v f s t a b . b a k l sed ' / - s n a p / d ' / e t c / v f s t a b . b a k l > / e t c / v f s t a b

log "SIE_NAME) completed s u c c e s s f u l l y "

This function is very fast.

We remove the snapshot volume entries Irom vfstab so that a boot is possible in the absence of these volumes (which are destroyed during the SNAPBACK process).

4.3.13 Return tiie snapshot plexes to their original locations SNAPBACK() {

F NAME="SNAPBACK"

v o l c t - 0 f o r 1 i n SISNAPVOLLIST) do

(( volet = Svolct ( 1 )) log "Starting snapback for volSIi)-snap..." cmd-"S{VXBIN)/vxassist snapback. volS{i}-Enap" ST S{SNAPCKP)/runbg snapbackSIvo lc t ) "Scmd" s

done

log "Wait ing fo r s n a p b a c k s . . . " log " v x p r i n t -h w i l l show t h e s t a t e of the -02 p l exes as SNAPTMP u n t i l " log " they a re synchron ized , when t hey w i l l be e i t h e r SNAPDONE or ACTIVE" wai t

okct="ls /trap/snapback*.ok | wc -1"

if [ Sokct != Svolct ] then log "Attempted to snapback Svolct vols; succeeded for only Sokct" /bin/false c h e c k s t a t 40

f i

log "S{F_NAME) compleLed s u c c e s s f u l l y " }

We run vxassist snapback for each snapshot volume. This can take over an hour if fast resync is not enabled, but only a few seconds if it is enabled and there are very few differences between the original plexes and the snapshots. This is another reason to minimize updates to the production volumes; more physically distributed updates cause longer resilvering times.

The final state of the -02 plexes will be either SNAPDONE or ACTIVE. Either state provides complete production tunctionality and is a suitable starting state for the next run of this script.

The net effect of this exercise is the same as if a physical tar | gzip checkpoint had been made, but the producfion system was available for 98% of the potential downtime.

4.3.14 Backup the checkpoint to tape

BACKDPCKPO (

F_NAME="BACKUP_CKP"

log "Backing up checkpoint files to tape" )

Backmg up the checkpoint filesystem (or at least the recently-created checkpoint files) AND the dump directory files (for both the target database AND iidbdb!) is strongly recommended!

4.3.15 Manage any files created by this script

FILEMNGRO {

I De le t e a l l f i l e s i n the S{PROD_LOG) d i r e c t o r y t h a t a r e o l d e r than 60 days log "Cleaning up F i l e s in S|PBOD_L0G)" ST f i n d 5(PROD_I,0G) -mtime +60 - t y p e f - p r i n t | xargs rm - f c h e c k s t a t 75

} # end FILE MNGR

The best place for file management is in the script that created the files (unless they are interface files for communication with a downstream system).

The files created by the checkpoint process (checkpoint files (created here by the TARGZIP function) and files in the dump directory) must be managed.

Generally at least 16 checkpoints worth of dump directory files are worth keeping, because they support the default rollforward irom any registered checkpoint (16 are registered for each database in the aaaaaaaa.cnf file in the database directory (the "ROOT" location in infodb output).

There may not be sufficient space to keep 16 sets of checkpoint files on disk; this fiinction is an appropriate place from which to remove the ones we don't want. First determine the business rule: do you want to keep N days worth of checkpoints on disk, or the last M valid checkpoints? The latter will involve parsing the infodb output.

4.4 Restarting the snapckp script

If the snapckp script fails, under many circumstances it would be reasonable to restart the script after correcting the problem. Snapckp contains a mechanism for restart. Each function has been allocated to an identifying integer (5,10,15,...). You must determine the target restarting function and provide the corresponding integer on the command line thus:

snapckp - r 45

There is also a menu available for running individual functions.

In most cases, it makes sense to restart from the function which failed.

4.5 To recover the ptagis3 database from the previous checkpoint

First be sure that you want/need to do this.

Identify the checkpoint you want to restore. Run infodb ptagis3 and identify the required checkpoint number (NNNN) from the Checkpoint History for Journal section

Check that you have all the required files on disk (by restoring from backup if necessary):

• the checkpoint files: "ingprenv lI_CHECKPOINT7ingres/ckp/defauliyptagis3/cNNNN00[l-8].ckp

• the current ptagis3 cnf file: /usr/dbl/ingres/data/default/ptagis3/aaaaaaaa.cnf There is a backup copy in the dump directory.

• the archived cnf file for the target checkpoint: " ingprenv II_DU!VlP7ingres/dmp/default/ptagis3/c000NNNN.dmp

• if this is an online checkpoint, you will also need in the dump directory the range of dump files (dDDDDDDD.dmp - dEEEEEEE.dmp) specified in the infodb ouqjut in the Checkpoint History for Dump line for the checkpoint.

• If you want to rollforward journal files, then you must also have in the ptagis3 journal directory all the journal files from the target checkpoint to the end of the checkpoint interval which contains the target time for the end of the journal replay (normally as late as possible). This range of journal file numbers can be read from the Checkpoint History for Journal lines starting with the one for the target checkpoint.

• the checkpoint template file SlISYSTEM/ingres/files/cktmpl.def with a cat ] gunzip \ tar xvf on the WRDD line. There is a copy called cktmpl.gzip.

Check that the roUforwarddb flags in the Rollforward script are appropriate (+e -j for restoration to a checkpoint; -i-c +j for restoration to a checkpoint followed by replaying journal files). #m8 will perform parallel rollforward.

Run /usr/ingres/Roll forward ptagis3

Appendix A snapckp - Script to perform a mirror-split cycle # ! / b l n / k s h #1 #!###### H###########»#(Hi(######H######l#############»#IH######** ##############

S c r i p t : s n a p c k p D e s c r i p t i o n : T h i s s c r i p t m a k e s a c l e a n s n a p s h o t o f t h e d a t a v o l u m e s

m o u n t e d a t / u s r / d b [ l - B ] , m o u n t s t h e s n a p s h o t v o l u m e s an d c o p i e s t h e d a t a f i l e s t o t a r l g z i p e d I n g r e s c h e c k p o i n t f i l e s . I t t h e n r e - e s t a b l i s h e s t h e s n a p s h o t p l e x a s a m i r r o r t o t h e o r i g i n a l d a t a p l e x e s .

The r e s u l t i n g c h e c k p o i n t f i l e s c a n b e u s e d f o r a n o r m a l r o l l f o r w a r d o f t h e d a t a b a s e .

fl p h y s i c a l g z i p c h e c k p o i n t o f i i d b d b i s p a r t o f t h e p r o c e s s .

# The p r o c e s s h a s t h e same n e t e f f e c t f o r a d a t a b a s e a s t a k i n g a g z i p If c h e c k p o i n t f o r p t a g i s 3 and i i d b d b , b u t t h e o f f l i n e t i m e Tor t h e # d a t a b a s e i s m i n i m a l . # # B o t h o n l i n e o r o f f l i n e c h e c k p o i n t s a r e s u p p o r t e d [ c h a n g e t h e c k p d b # f l a g s i n CKP SNAP), a n d r o l l f o r w a r d c a n i n v o k e j o u r n a l r e p l a y i n g # r e q u i r e d [ t h i s i m p l i e s t h a t t h e d a t a b a s e i s j o u r n a l l e d ! ) . « # T h i s s c r i p t mus t b e r u n a s r o o t

Ml

# T h i s s c r i p t h a s command l i n e o p t i o n s t o a l l o w t h e e x e c u t i o n o f e a c h # p o r t i o n o f t h e p r o c e s s . The a l l o w e d command l i n e o p t i o n s a r e a s f o l l o w s : * # s n a p c k p -m A l l [ T h i s w i l l p r e s e n t a menu f o r R e s t a r t ) f s n a p c k p - r SjNOM) ( r e s t a r t f rom c h e c k s t a t ERROR S(NUM)) « s n a p c k p - r S(NUM} - s NOM ( r e s t a r t f rom ERROR SiNUM) s t o p a f t e r SjNUM}) « # The o n l y v a l u e s f o r S1NHM) a r e 5 - 7 5 i n i n c r e m e n t s o f 5 # The Menu h a s a d e s c r i p t i o n o f wha t t h e e r r o r c o d e s mean o r w h e r e t h e ft p r o c e s s f a i l e d . T h i s w i l l a s s i s t i n t r o u b l e s h o o t i n g . # #######IB#####*#########»##############lf#####ft f t* f t f t#»#ft#####i#ft##ft#»i########*##### # D a t e Rev Who Change made I 0 8 - j a n - 2 0 0 3 1 .1 S P Maybury O r i g i n a l d e r i v e d from h c v t o r p t 1 .7 ####»f######f#lt###ftftif(!SI##ft###BII###ft#fS######i######fifll###ftft*»f############ # Define Variables ##!#########«#####*####################*####«####ft#l#)################*##### #T=echo ft Used for Testing this script T=

MODHAME='basename SO" I The name of this script

VXBIN="/usr/sbin" • Location of vx utilities" SNAPCKP=^/usr/Ingres/snapckp ft Ingres script directory

# Don't forget to change this in # SII_SYSTEM/inqres/files/cktmpl.bcv ft if you move snapshot

PROD_LOG=/va r / log ft D i r e c t o r y f o r l o g f l i e s

I I_SYSTEM="su - i n g r e s - c ' e c h o MARKER SlI_SYSTEM' lawk '/MARKER/ { p r i n t $2 ) ' "

# C h e c k p o i n t t e m p l a t e f i l e s I I _ C K T M P L _ B C V = $ ( I I _ S Y S T E M ) / i n g c e s / f i l e s / c k t m p l . b c v # b c v ( s p l i t ) c k p - c a l l s

# s n a p s h o t w h i c h s p l i t s # t h e m i r r o r s

L0G_FILE=''S1PR0D_L0G)/5M0DNAME.log."date +%Y%m%d"" ft LOG file for this script

TflRG£TDB=ptagis3 # Target of the bcv (split) ckpdb If and the tarlgzip. Any other dbs ft sharing the data locations will ft be ignored (though iidbdb is ft always checkpointed).

# Cons t ruc t ckpd i r (where t h e checkpoin t f i l e s go) dynamical ly c k p d i r - " s u - i n g r e s - c ' p r i n t f "XnMARKER "; ingprenv II_CHECKPOINT' | awk '/MARKER/ { p r i n t $2 }~

ckpdir=Sckpdir / ingres/ckp/defaul t /STARGETDB

SNAPDEVSTEM="/dev/vx/dsk/rootdg" ft d i r e c t o r y fo r d e v i c e f i l e s SNAPDIRSTEM="/iisr/db" # DATA Mount po in t d i r e c t o r y

ftft»##ftft##lf##ftft###ftft##ftft###ft##f######ft####ft###ftft##ftftftft##ft»##ft####ft########## # I f you change t h e number of d a t a v o l u m e s / l o c a t i o n s fo r p t a g i s 3 then # t h e l i s t s below must, be u p d a t e d . # Maybe get the l i s t s from i i d b d b and df -k - i t i s impor tan t t o fol low j the volNN conven t ion ! ! ####)Hf##Sf#################f«##f|f####)I:####ftf t####ft#ft################ftf t f#### SNAPDIRLIST="1 2 3 4 5 6 7 8" ft L i s t of s u f f i x e s for snap d i r s SNAPVOLLIST="01 02 03 04 05 06 07 08" ft L i s t of s u f f i x e s for snap v o l s DBLOC_LIST=''001 002 003 004 005 006 007 008" # Checkpoint l o c a t i o n s u f f i x e s

# A d m i n i s t r a t i v e v a r i a b l e s fo r r e s t a r t , e t c CHOICE^"'" # Used for command l i n e o p t i o n s STOP_DBF=75 # Defaul t s t o p v a l u e for - r - s o p t i o n s STOP=75 # Should match l a r g e s t ERROR code

#ftft##ft###ft####ftft####ft#####ft########Sft#»f#fl»#«###Sftftft##ftftftft#ft##«#ftl#H#l##### ft Trap and log fo r the fo l lowing s i g n a l s : HUP INT QUIT KILL TERM ######ft#ftft#*##ftftftftftft#ftftft##ftft#f#*l##ftft####ft###ft##llllBB#lt«ftft###ftft##ft######ftli«»

fttrap 'DATE="date" ; p r i n t "NnSDATE: ERROR: HUP caugh t \n" » SLOG_FILE ; e x i t ' HUP fttrap "DATE--date"" , -pr int "\nSDATE: ERROR: Int .erupt caughtXn" » $LOG_FILE ; e x i t " INT # t r a p 'DATE="date" ; p r i n t "\n$DATE: ERROR: Quit c augh t \n" » SLOG_FILE ; e x i t ' QUIT fttrap 'DATE="date" . -pr int "\nSDATE: ERROR: Terminate c a u g h t \ n " » SLOGFILE ; e x i t ' TERM

##!#####ft###ftft###ftftft##ftft##ftft#»#ftftlHf#ft###S####ftft###ftft###ftftftftftft»»ftftft###ftftftftft# # Define Func t ions ftftft##ftftft#ffftftftftSftftft##ft#########ft###*ftft#IHftft#Sftft##ftftftft#ftftftft##ftft#########ftft#ft»»

# Log() is a wrapper for print that adds line prefix info logo I rasg-Sl STAMP=""date '+%a %H:%M:%S %Z'" S(MODNAME)" print "SiSTiiMP): Smsg"

}

ft Check t h e r e t u r n code fo r s t a t u s on an a c t i o n ft The r e t u r n code of a l l s u c c e s s f u l a c t i o n s should be z e r o . # I f i t i s no t , an e r r o r message i s p r i n t e d and the s c r i p t e x i t s c h e c k s t a t [ ) i

err=S7 i f [ Se r r -ne 0 ] then

log "$|F_NSME}: ERROR Se r r whi le Smsg" log "$,{F_NAME): c h e c k s t a t ERROR CODE SI" e x i t $1

f i )

# Query t h e u se r to c o n t i n u e RETURN{) i print -n "\nPlease hit \"EnterS" to Continue: " read

) # end RETURN

# ERROR Code d e f i n i t i o n s de f ined fo r each f u n c t i o n . ERR HELPO

c l e a r p r i n t p r i n t

"" "ERROR

i n c o m p l e t e . " p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t p r i n t

' "ERBOR

" "ERROR It

"ERROR

" "ERROR

" "ERROR

" "ERROR

" "ERROR

" "ERBOR

" "ERROR

" "ERROR

" "ERROR 11

"ERROR '•

"ERROR

" "ERROR

"

CODE

CODE

CODE

CODE

CODE

CODE

CODE

CODE

CODE

CODE

CODE

CODE

CODE

CODE

CODE

1 # e n d ERR HELP

5 :

1 0 :

I f i :

2 0 :

2 5 :

3 0 :

3 b :

4 0 :

4 5 :

5 0 :

5 5 :

6 0 :

6 5 :

7 0 :

7 5 ;

Verification of bcv mirrors on SiPROSRV) failed or is

Function READYTOSPLIT." Clos ing DBMS s e r v e r s f a i l e d . " Function CLOSE_DBMS." Faili d to stop remote command server." Function RMCMDSTP" Removing remaining users failed." Function RM^SESSTONS" SNAP checkpoint failed." Funct ion CKP SNAP" P h y s i c a l o f f l i n e checkpoin t of i i d b d b f a i l e d . Function CKP_IIDBDB" Open d a t a b a s e s e r v e r s fo r p r o d u c t i o n " Func t ion OPEN_DBMS" Checking SNAP volumes b e f o r e mounting f a i l e d " Funct ion FSCKSNAP" Mounting SNAP volumes f a i l e d . " Funct ion MOUNTSNAP" t a r I g z i p f a i l e d . " Func t ion TAR_GZIP" Removing processes with fuser failed." Function FUSER_STOP" Unmount of S(SNAPDIRSTEM) filesystems failed. Function UMOUNT_SNAP" SNAPBACK failed." Function SNAPBACK" BACKUP_CKP failed" Function BACKOP_CKP" FILE_MHGR f a i l e d " Function FILE MNGR"

# Verify the target volumes are ready to be split READYTOSPLITd (

F_NAME="READYTOSPLIT"

log "P repa r ing and checking the s t a t u s of t h e m i r r o r s . "

pcoblem^O for i in SISNAPVOLLIST) do # Check to see if the -02 plexes are ready to be split log "Checking vol$(i)-02" plstate="S{VXBIN)/v)cprint -p v o l S i - 0 2 | g rep " p i | awk '{ p r i n t $7 ) ' " log " I n i t i a l s t a t e of p l e x volSi~02 I s S p l s t a t e " i f I " S p l s t a t e " 1= "SNAPDONE" J then

log " I n i t i a l s t a t e of p l e x v o l $ i - 0 2 i s S p l s t a t e " vxp lex conve r t state=SNAPDONE v o l $ { i } - 0 2 c h e c k s t a t 5

log "Curren t s t a t e of p l e x v o l S i - 0 2 i s S p l s t a t e " f i

done ) ft end KEADYTOSPLIT

ft Close the i n g r e s s e r v e r s on SiPROSRV). This must be done b e f o r e an o f f l i n e # c h e c k p o i n t .

CLOSE_DBKS() { F_NAME="CLOSE DBMS" log "Setting servers closed on SiPROSRV)"

5T SU - ingres -c "${3NAPCKP)/close_servers" checkstat 10

I Use the showservers shell script to verify that the servers are closed. f When the servers are closed "IIMONITOR> CLOSED" is displayed in the output, i "IIMONITOR> OPEN" is displayed if they are open.

ST su - Ingres -c "SISNAPCKP)/close_seruers"

checkstat 10

ft Use t h e s h o w s e r v e r s s h e l l s c r i p t t o v e r i f y t h a t t h e s e r v e r s a r e c l o s e d . ft When t h e s e r v e r s a r e c l o s e d "IIMONIT0R> CLOSED" i s d i s p l a y e d i n t h e o u t p u t , # " I T M 0 N I T 0 R > OPEN" i s d i s p l a y e d i f t h e y a r e o p e n .

CHK_CLSE="su - I n g r e s - c " S i S N A P C K P ) / s h o w _ s e E v e i : s " | g r e p C L O S E D j t a i l - l | a w k ' i p r i n t S 2 ) ' -if [[ SiCHK CLSE) !- "CLOSED" ]] then

print "VnSIFNAME}: ERROR closing servers on SiPROSRV)." print "SiFNAMEl: Have a DBA run $iSNAPCKP)/closeservers on SiPROSRV)." print "SiF_NAMEl: checkstat ERROR CODE 10.\n" /bin/false c h e c k s t a t 10

e l s e p r i n t " \ n \ n S e r v e r s a r e CLOSED. . . C o n t i n u i n g . . . "

f i ) ft e n d CLOSE_DBKS

ft S h u t down r e m o t e command s e r v e r BMCMDSTPO [

F_N71ME=" RMCMDS TP " S T S U - I n g r e s - c " r m c m d s t p "

I

ft Remove a n y r e m a i n i n g i n g r e s u s e r s i n a n t i c i p a t i o n o f an o f f l i n e c h e c k p o i n t RM SESSIONS{} {

F_NAME="RM_SESSIONS" l o g "Removing a n y r e m a i n i n g i n g r e s s e s s i o n s from DBMS s e r v e r s " S T S U - i n g r e s - c " S ( S N A P G K P ) / n n _ s e s s i o n s " c h e c k s t a t 20

l o g "SIF_NAME): A l l i n g r e s u s e r s Pemoved" ) I e n d RM_SESS10NS

ft This is the BCV Checkpoint CKPSNAPO {

F_NAME="CK P S N A P" log "Performing a bcv checkpoint"

-lL.se ^\tp mrA« 'pl4line wait

ft Env v a r I I C K T M P L F I L E i s i g n o r e d i n I n g r e s 2 . 6 s o we u s e t h e s y m b o l t a b l e S T s u - i n g r e s - c " i n g s e t e n v II_CKTMPL_FTLF SiTT_CKTMPL_BCV)" . / , , i l| S T S U - i n g r e s - c " c k p d b - 1 Iw ' f t x ' STARGETDB" checkstat 25 "^ • ST SU - ingres -c "ingunset ri_CKTMPL_FILE"

* QQQ II_CKTMPL_FILE remains set on error!I

# Save the dbg file S T S U - i n g r e s - c "mv c k p d b . d b g ckpdb.dbg.STARGETDB"

) ft e n d CKP_SNAP

ft P e r f o r m t h e i i d b d b CHECKPOINT CKP_IIDBDB() {

l o g " P e r f o r m i n g p h y s i c a l o f f l i n e c h e c k p o i n t o f IIDBDB'' ST s u - i n g r e s - c " c k p d b +j +w ' f t x ' i i d b d b " checkstat 30 $T SU - ingres -c "mv ckpdb.dbg ckpdb.dbg.iidbdb"

) • end CKP IIDBDB

ft reopen the servers and restart rmcmd OPENDBMS()

( F_NAME =OPEK_DBMS

ft Open s e r v e r s a n d v e r i f y t h a t t h e y o p e n e d l o g " S e t t i n g s e r v e r s o p e n " ST s u - i n g r e s - c " S i S N A P C K P t / o p e n s e r v e r s " c h e c k s t a t 1

CHK_OPEN="sn - i n g r e s - c " S i S N A P C K P ) / s h o w s e r v e r s " I g r e p O P E N l t a i l - 1 Iawk ' t p r i n t S2) IE [[ SiCHKOPEN) != "OPEN" ]] then log "SiFNAME): ERROR Opening servers" log "SiF^NAME): Have a DBA run S(SNAPCKP)/open_servers." log "SiF_NAME): checkstat ERROR CODE 35.\n" l o g " " d a t ^ " : SiF_NAME) r e s u m i n g p r o c e s s i n g "

e l s e log "Servers are OPEN... Continuing...\n"

fi

log "Restarting rmcmd" ST s u - i n g r e s - c " i n g s t a r t - rmcmd"

) ft end OPEN_DBMS

ft f s c k t h e s n a p s h o t v o l u m e FSCK_SNAP() {

F_NAME="F3CK_SNAP"

v o l c t = 0 for i in SiSNAPVOLLIST) do

[ ( v o l e t = S v o l c t + 1 ) ) l o g " R u n n i n g f s c k on S i S N A P D E V S T E M ) / v o l S ( i } - s n a p " c m d = " £ s c k - y S I S N A P D E V S T E M ) / v o l S I D - s n a p " ST SSNAPCKP/runbg f s c k s n a p S I v o l c t } "Scmd" 5

d o n e

log "Waiting for background jobs" wait

o k c t = ' l s / t m p / f s c k _ s n a p * . o k 1 wc - 1 " if [ Sokct ;= Svolct ] then

l o g " A t t e r ^ t e d t o f s c k S v o l c t v o l s ; s u c c e e d e d f o r o n l y S o k c t " / b i n / f a l s e c h e c k s t a t 40

f i ) ft end FSCKSNAP

ft Mount t h e s n a p s h o t v o l u m e s a s READONLY MOUHT_SNAP[1 I

F_N71ME-"M0UNT_SNAP"

l o g " S a v i n g c o p y o f v f s t a b a s v f s t a b . b a k " Cp / e t c / v f s t a b / e t c / v f s t a b . b a k

l o g " a p p e n d i n g mount e n t r i e s f o r s n a p v o l s t o v f s t a b " Ca t / e t c / v f s t a b . s n a p >> / e t c / v f s t a b

f o r i i n SiSNAPVOLLIST) do

l o g " M o u n t i n g S ( S N A P D E V S T E M ) / v o l S l i } - s n a p a s r e a d o n l y " ST moun t - r S ( S N A P D E V S T E M ) / v o l S { i | - s n a p checkstat 45

done

) f end MOUNT SNAP

ft Verify that there is enough space and perform the effective CHECKPOINT TAR_GZIP() I

F_NAME="TAB_GZIP"

ft QQQ If t h e r e a re no ckp f i l e s t h i s p r o c e s s g e t s i n t o t r o u b l e j We should check s i z e o f l a s t c k p for a v a l i d numbei. Bize_of_ las t_ckp="su - i n g r e s - c "$SNAPCKP/freemb_after_ckp STARGETDB" \

I g rep LAST_CKP_MB 1 awk ' i p r i n t 32 ) ' " freerab_after_ckp="su - i n g r e s - c "SSNAPCKP/freemb_after_ckp STARGETDB" \

I grep EXCESS | awk '( print S2 )''

ft QQQ Do not run d e l e t e _ o l d e s t _ c k p because i t removed ckp r e f e r e n c e s # from t h e cnf f i l e . Need t o run ?ckp_cleanup I n s t e a d su - i n g r e s - c " a l t e r d b STARGETDB - d e l e t e o l d e s t c k p "

i f [ S | f r e e m b a f t e r ckp) - g t 100 ] then

log "Performing t a r | g z i p of snap d i r e c t o r i e s "

f Get l a s t ckpno ckpno="su - i n g r e s - c " in fodb STARGETDB" I awk '

/Checkpoint H i s t o r y for J o u r n a l / ( g e t l i n e ; g e t l i n e ; whi l e i g e t l i n e > 0) i

i t ( $0 - /Checkpoint H i s t o r y for Dump/ ) [ b reak ) l a s t c k p n o = S6

}

print lastckpno }

log "Found last checkpoint number = Sckpno" if [ -z "Sckpno" j then

/ b i n / f a l s e e l i f [ Sckpno -eq 0 ] then

/ b i n / f a l s e f i c h e c k s t a t 50

volc t^O fo r i i n SSNAPDIRLIST do

(( volet = Svolct + 1 )) o k p f i l e - ' p r i n t f "c%04dl03d.ckp" S|ckpno) $ | i ) " ckppa th=$ckpd i r /Sckp f l i e s o u r c e d i r = / u 3 r / d b 9 ( i ) - s n a p / i n g i I / I n g r e s / d a t a / d e f a u l t / S T A R G E T D B log " S t a r t i n g t a r l g z i p of S s o u r c e d i r " cmd="cd S s o u r c e d i r ; / b i n / t a r cvf - . j g ? i p > Sckppath" ST SU - i n g r e s - c "SSNAPCKP/runbg t a r g z i p S ( v o l e t ) 'Scmd' " K

done

log "Wait ing fo r background t a r g ^ l p j o b s " wai t

log "Al l background jobs f i n i s h e d "

o k c t = " l s / t m p / t a r _ g z i p * . o k | wc - 1 ' if [ Sokct != Svolct ] then log "Attempted to tarjgzip Svolct vols; succeeded for only Sokct" /bin/false checkstat 50

fi

log "Background checks OK"

ft Check t o t a l s i z e of ckp f i l e s and compare t o p rev ious ckp t o t a l s ize_of__this_ckp='su - i n g r e s - c "SSNAPCKP/fre6mb_afLer_ckp STARGETDB" \

I g rep LASTCKP MB j awk ' | p r i n t S2 ) ' '

i f [ "expr Ss i ze_o f_ th i s_ckp - Ss ize^of l a s t _ c k p " - g t 600 } t hen

log "***•* Checkpoint f i l e s much s m a l l e r than p r e v i o u s ckp - Check !! *****' log " * * • ' * SIZE_0F_TI1IS_CKP = 5 i s i z e _ o f this_ckp}MB" log " and SIZE_OF_PREV_CKP = $ i s i z e _ o f iast_ckp)MB •****"

e l s e log "•«*" SIZE_0F_THIS_CKP - Sisize_of_this_ckp)MB" log " and SIZE_OF_PREV CKP = S ( s i z e or_las t_ckp)NB *****"

f i

e l s e

l o g " * ' » * * Not enough room fo r checkpo in t s • • "**" log log " e s t i m a t e d ckp s i z e : S s i z e o f l a s t ckp Mb" log "Es t ima ted space a v a i l a b l e a f t e r ckp: Sfreemb_after_ckp Mb" log " p l e a s e ensure s u f f i c i e n t space on Sckpd i r " log log "Bun SISNAPCKP)/ckp_cleanup to remove unnecessary files." log "If this is not sufficient, either lower ckpkeep^count" log "in ckp_cleanup or get more disk space." log "SiFNAME): ERROR CODE 50"

/bin/false checkstat 50

fi ) ft end TAR GZIP

f Ensure no one is in any SiSNAPDIRSTEM) directories so we can unmount ft the filesystems. FUSERSTOP i) I F_NAME="FUSER STOP" log "Removing p r o c e s s e s wi th pwds in SNAP f i l e s y s t e m s " for d i r i n SiSNAPDIBLIST) do

PIDS="/etc/fuser -c SISNAPDIRSTEM)Sidir)-snap 2>/dev/null" for pid in "print SiPIDS)' do

log "Process SIp id) has a pwd in $|SNAPDIRSTEM)|Sdir)" ps - e f I / u s r / b i n / g r e p S i p i d ) | / u s r / b i n / g r e p -v grep log "Termina t ing p rocess S ip id ) with \ " k i l 1 - 9 \ " " ST k i l l -9 SIpid)

d o n e done

# T h i s i s c h e c k s t a t ERROR CODE 55 I ft e n d PUSER STOP

UMOUNT_SNAP{) i

F_NAME="UMODNT_SNAP" l o g " U n m o u n t i n g s n a p v o l s " s t a t u S ' ^ O f o r i in SISNAPVOLLIST) do

ST umount SiSNAPDEVSTEM)/uolSil)-snap II status=l done

if I Sstatus -eq 1 ] then log "Failed to unmount at least one snapvol" log "Exiling..."

/ b i n / f a l s e c h e c k s t a t 60

f i

ft Remove f s t a b e n t r i e s f o r s n a p v o l s b e c a u s e b o o t f a i l s iC t h e r e a r e ft v o l u m e s t h a t c a n n o t b e f s c k ' d l o g "Removing v x f s t a b e n t r i e s f o r * - s n a p " c p / e t c / v f s t a b / e t c / v f s t a b . b a k l s e d ' / - s n a p / d ' / e t c / v f s t a b . b a k l > / e t c / v f s t a b

l o g "S|F_NAME) c o m p l e t e d s u c c e s s f u l l y " }

SNAPBACK[) i

F_NflME="SNAPBACK"

V O l c t = 0 for i in SiSNAPVOLLIST) do

( ( v o l e t = S v o l c t + 1 ) ) l o g " S t a r t i n g s n a p b a c k f o r v o l S i i ) - s n a p . . . " c m d = " S I V X B I N ) / v x a s s i s t s n a p b a c k v o l S l i ) - s n a p " ST S iSNAPCKP) / runbg s n a p b a c k S ( v o l e t } "Scmd" s

done

log "Waiting for snapbacks..." log "vxprinl -h will show the state of the -02 plexes as SNAPTMP until" log "they are synchronii^ed, when they will be either SNAPDONE or ACTIVE" wait

o k c t = " l s / t m p / s n a p b a c k ' . o k I wc - 1 " i f [ S o k c t != S v o l c t ] t h e n log "Attempted to snapback Svolct vols; succeeded for only Sokct" /bin/fa 1se c h e c k s t a t 40

f i

l o g "SiF_NAME) c o m p l e t e d s u c c e s s f u l l y " 1

# B a c k u p t h e c h e c k p o i n t t o t a p e . T h i s i a d o n e on S I P R O N I S j . BACKUP_CKP() i

F_NAME="BACKUP_CKP"

log "Backing up checkpoint files to tape" )

FILEMNGR() {

ft D e l e t e a l l f i l e s i n t h e SiPRODLOG) d i r e c t o r y t h a t a r e o l d e r t h a n 60 d a y s l o g " C l e a n i n g o p F i l e s i n SiPR0D_LOG)" ST f i n d SiPRODLOG) - m t i m e +60 - t y p e f - p r i n t i x a r g s rm - f c h e c k s t a t 75

) # end FILE_MNGR

ft This menu is for SA use to rerun any one function of the script.

MenualU) I CH0ICE=S1 while U SICHOICE) = "" ]J do

clear print "\r\n\n\n\t\t\tBCV to BPT Run Functions Menu \n"

print print print print print print print print pri nt print

"\t\t "\t\t "\t\t "\t\t "\t\t "\t\t "\t\t "\t\t "\t\t

1) 2) 3) 4) 5) 6)

'') S) 9)

"\tStlO)

Run Bun Bun Bun Run Run Run Run Run Run

Functi on Functi on Function Function Function Function Function Function Function Function

BEADYTOSPLIT CLOSE DBMS RMCMDSTP RM SESSIONS CKP SNAP CKP IIDBDB OPEN DBMS FSCK SNAP MOUNT SNAP TAR GZIP

'checkstat ' checkstat 'checkstat 'checkstat 'checkstat 'checkstat 'checkstat 'checkstat 'checkstat 'checkstat

ERROR ERBOR ERROR ERROR ERROR ERROR ERROR ERROR ERROR ERBOR

CODE CODE CODE CODE CODE CODE CODE CODE CODE CODE

5 10 15 20 25 30 35 40 45 50

p r i n t " \ t \ t l l ) Run Funct ion FUKER_STOP p r i n t " \ t \ t l 2 ) Bun Func t ion UMOUNT_SNAP p r i n t " \ t \ t l 3 ) Bun Funct ion SNAPBACK p r i n t " \ t \ t l 4 ) Run Funct ion BACKUP^CKP p r i n t " \ t \ t l 5 ) Run Func t ion FILE MNGR

'checkstat ERROR CODE 55' 'checkstat ERROR CODE 60' •checkstat ERBOR CODE 65' 'checkstat ERBOR CODE 70" 'checkstat ERROR CODE 75'

print "\t\t H) Definition of checkstat ERROB CODE\'s" print "\t\t q} Quit this Menu" print "\n\t\t NOTE: Only the Function you select will be executed." print -n "\t\t Please select an option: " read CHOICE

done

case SICHOICE) in

1) BEADYT05PLIT » S|I.OG_FILE) 2) CLOEE_DBMS >> S[LOG_FILE) ; , 3) RMCMDSTP » SfLDC;_FILE) ;; 1) RM_SESSTONS » $|LOG_FII.E} , 5) CKP_SNAP » 5|L0G_FILE) ;; 6) CKPIIDBDB » SiLOC_FILE) 7) OPEN_DBMS » $iLOG_FILE) ; 8) FSCK_SNAP » S|LOG_FILE) : 9) MOUNT_SNAP » S(L0G_F1LE) ID) TAR_G7.IP >> SiLOG_FILE) ;; 11) FUSER_STOP » SiLOG_FILF) ;; 12) UMOUNT_SNAP » SILOG_FILE) ;; 13) SNAPBACK » SILOG_FILE) ;; 14) BACKUP_CKP » 5|L0G_PILE) ;; 15) FII.K_MNGB » SiLOG_FILE) ;; Hjh) ERR__HELP ; RETURN ; Menu_^all ;; qlQ ) print "Sn\t\t Exiting \n" ; exit ;; • ) : ;;

esac ) # end Menu_all

ft The BONALL func t i on i s des igned t o run t h e e n t i r e snapckp p r o c e s s . # I f any one p i e c e needs to be run aga in then t h i s s c r i p t needs t o run f with t h e fo l lowing o p t i o n s "snapckp -m A l l " . ft The menu w i l l a l low any one p o r t i o n of the s c r i p t to ft be executed a g a i n . I f you need t o run more t h a t one func t i on then you need t o # execu te "snapckp - r 20 - s 75" t o run from c h e c k s t a t ERBOR 20 through 75 . RUN_ALL() (

log "Running function READYTOSPLIT" READYTQSPLIT » SiLOG_FILE) log "completed function READYTOSPLIT"

log "Running function CLDSE_DBMS" CLOSE_DBMS » S|LOG_FILE) log "Completed func t i on CLOSE_DBMS"

log "Running function RMCMDSTP" RMCMDSTP » SILOG_FILE) log "Completed func t i on RMCMDSTP"

log "Running function RM_SESSIONS" RM_SESSIONS » S|LOG_FILE) log "Completed function RM_SE2SI0NS"

log "Running func t i on CKP_SNAP" CKP_SNAP » SiLOG_FILE| log "Completed func t i on CKP_SNflP"

log "Running func t ion CKPTTDBDB" CKP_IIDBDB » SiLOGFILE) log "Completed func t i on CKPIIDBDB"

log "Running function OPEN_DBMS" OPENDBMS » SILOG_FILE} log "Completed function OPEN_DBMS"

log "Bunning function FSCKSNAP" FSCKSNAP » SiLOGFILE) log "Completed function FSCKSNAP"

log "Running function MOUNT_SNAP" MOUNT_SNAP » SILOG_FILE) log "Completed function M0UNT_3NAP"

log "Running func t i on TAR__GZIP" TAR_GZIP » SILOG_FILE) log "Completed function TAR_GZIP''

log "Running function FUSER_STOP" FUSER_STOP » SILOGFILE) log "Completed function FUSER_STOP"

log "Running function UMOUNTSNAP" UMQUNT_SNAP » SiLOC_FILE) log "Completed func t ion UMOUNT_SNAP"

log "Bunning func t i on SNAPBACK" SNAPBACK » SILOGFILE) log "Completed function SNAPBACK"

log "Bunning function BACKUPCKP" BACKDP_CKP » S|LOG_FILE) log "Completed function BACKUPCKP"

log "Running function FILE_MNCR" FILE_MNGB » SILOG_FTLE) log "Completed func t i on FILE_MNGR"

log "END OF SNAPCKP"

) # end RUN_ALL

##ft####l#ft###ft###lft#ft#########ftlt##ftftft#ftftftft##ftftft#ftft###ft#####ftft###ft####ft#ftft#ftft ft MAIN PORTION OF THIS SCRIPT ft##ftftS#Sftft##ftftftftftft##ftft####i#ftftft##ftftft»#ftftftftftftftft##ftl###ft####ft#l######ftftftftftft»ftf

ft Determine i f ano the r snapckp p r o c e s s i s running

CHK_PROC='ps -e f | g rep snapckp | egrep - v ' g r e p | m o r e | v i | c a t j t a i l ' | wc - 1 ' i f [I S|CHK_PBOC) - g t " 1 " ] ] then

p r i n t "\nERROR: snapckp i s c u r r e n t l y r u n n i n g ! \ n " e x i t

f i

ft Check use r i s r oo t i d ^ " i d I s ed ' s / . * [ \ i . * \ ) ) ' g i d . ' / \ l / ' ' i f [ "Sid" != " r o o t " ] then echo "You must run this script as root" echo "Exiting..." exit 1

fi

loci "************ ****i'***li-tf*ii*************i<***************************^

log " STABT Smodname"

log "Starting Smodname" echo "Starting Smodname"

ft Clean out /tmp to ensure that we do not have a space problem

/usr/bin/find /tmp -type f ! -name "II*" -print | xargs rm -f

ft make sure that the user does not just put in a "-" argument on the command ft line. if [[ SI = "-" ]] then

print "\nNo option entered.\n" exit

fi

ft Get any command line arguments for processing. These options should not be ft published to the general public. They are for SA/DBA USE ONLY!!!! ft t This s c r i p t must be run as roo t because i t mounts and unmounts f i l e s y s t e m s ft and runs f sck ft ft The -r means to re-run from the function the exited from a checkstat ERROR. ft The -s menas to stop when the function completed. # # EXAMPLE: "snapckp - r 5 ~s 20" ft # This w i l l s t a r t with the func t i on with c h e c k s t a t ERROB 5 and s t o p when t the f unc t i on wi th c h e c k s t a t EBBOB 20 has comple ted , • ft If you wish to run from checstat ERROB 30 and continue to completion run ft the following: "snapckp -r 30" ft ft To run one function execute the following: "snapckp -r 30 -s 30" ft ft I f you run "snapckp -m A l l " you w i l l be d i s p l a y e d the menu t o run one ft func t ion a t a t i m e . I t a l s o w i l l a l low you t o d i s p l a y what each c h e c k s t a t ft ERROR means.

while getopts :m:r:s:h args do

case Slargs) in

m) MENU=SOPTARQ case SiNENU) in All) Menuall ,-,-

*) print "\nlnvalid option entered for \"-m\" \"SiMENU)\"\n" ;; esac exi t ;;

s) STOP=$OPTARG ;; r) MENU=SOPTABG ;; h) ERB_HELP ! e x i t ; ; :) p r i n t "\nNo op t ion e n t e r e d S"SIOPTABG)S" \ n " ; e x i t ; ; \ ? ) p r i n t " \ n l n v a l i d op t ion e n t e r e d \"-S(OPTABG)\" \ n " ; e x i t ; ; " ") p r i n t " \ n l n v a l i d o p t i o n e n t e r e d \"-5IOPTARG)\" \ft" ; e x i t ; ; *) MENU=BUN_ALL ; ;

e s a c done

ft QQQ wal l u s e r s and s l e e p 15?

ft I f no command l i n e o p t i o n s a r e e n t e r e d run t h e e n t i r e p r o c e s s

if [[ SlMENU) = "" II SiMENU) < 1 ]] then

RUN^ALL log "SNAPCKP completed successEul lyXn" print » S{LOG_FILE)

else

i1 CHKSTOP - STOP % 5 )) i f (( SiSTOP) > SiSTOPDEF) )) then

print "\nlnvalid STOP value entered! Exiting..-Sn" exit 4

fi if [[ SICHKSTOP) -ne 0 ]) then

print "\nlnvalid STOP value entered! Exiting...\n" exit 4

fi i f (( SiMENU) > SiSTOP) )) t hen

p r i n t "\nCan not STOP b e f o r e you s t a r t ! E x i t i n g . . . \ n " e x i t 4

f i

ft Log an entry in the SILOGFILE) stating we restarted » from -r ft and are attempting to stop at -s ft.

print "\n 'date '+%a %H:%M:%S'' : Restarting snapckp with the following" » SILOGFILE) print " restart option \"-r $(MENU)\" Running through SiSTOP)" » $|LOG_FILE)

while [[ SiMENU) -le SISTOP) ]] do

case SIMENU) in 5) Menu_all 1

10) Menu_all 2 15) Menu^all 3 20) Menu all 4 25) Menu_all 5 3D) Menu_all 6 35) Menuall 7 40) Menuall B 45) Menu_all 9 50) Menu_all 10 55) Menu_all 11 60) Menu_all 12 65) Menuall 13 70) Menu_all 14 75) Menuall 15 ') print "\nlnvalid option entered for \"-r\" \"S|MENU)\"\n" ; exit ;;

esac (( MENU = MENU + 5 ))

done

log "Restarted SNAPCKP completed successfully" fi

Appendix B snapshot - Script called from cktmpl.def to split the mirrors # i / b i n / k s h ft ft snapshot ft ft Wrapper for vxassist snapshot ft ft SI - The location number. This script is call&d by ckpdb ft according to cktmpl.bcv once per data location. For the ft first location (1), we perform the bcv split; we ignore ft an invocation with any other location, ie if you want this ft script to do something the 1 is mandatory as the second parameter. ft ft 3 0 - s e p - 1 9 9 a S P Maybury O r i g i n a l ft ft l O - d e c - 1 9 9 8 S P Maybury Add s y n c ' s b e f o r e b c v s p l i t t o f l u s h p e n d i n g # w r i t e s t o d i s k an d a v o i d ' W r i t e p e n d i n g s ' e r r o r ft 5013 f rom s u b s e q u e n t b c v r e s t o r e . I ft O e - j a n - 2 0 0 3 S P Maybury C o n v e r t t o S o l a r i s / V e r i t a s

T= e x p o r t T

m o d n a m e = ' b a s e n a m e $0"

l o g [ ) I

rasg^Sl e c h o " d a t e '+%a %H:%M:%S'" " h o s t n a m e " Smodname: Smsg

)

c h e c k s t a t ( ) 1

e r r = S ? i f [ S e r r - n e 0 ] t h e n

e c h o Smodname: E r r o r S e r r w h i l e Smsg e x i t SI

f i }

i f [ Sft - n e 1 ] t h e n

e c h o U s a g e : Smodname l o c a t i o n _ n u i n b e r e x i t 1

f i

i f [ SI - e q 1 ] t h e n

l o g " S y n c S y n c s y n c " s y n c ; s l e e p 5 ; s y n c ; s l e e p 5 ; s y n c

l o g " s p l i t t i n g p l e x 02 f rom d a t a v o l u m e s " S T v x a s s i s t - o n a m e - % v - s n a p s n a p s h o t v o l 0 1 v o l 0 2 v o l 0 3 v o l 0 4 v o l 0 5 v o l 0 6 v o l 0 7 volOB c h e c k s t a t 2

f i

l o g " c o m p l e t e d s u c c e s s f u l l y "

Appendix C DBMS Server Scripts openservers, close_servers, show_servers, rm_sessions ft I / b i n / k a h ft S c r i p t t o open a l l dbms s e r v e r s t o new connec t ions ft ft 5-oct -1998 S P Maybury O r i g i n a l

T= expor t T

modname="basename SO'

logo { msg=Sl echo "date '+%a %H:%M:%S'' 'hostname' Smodname: Smsg

) checkstat0 { err=S? if [ Serr -ne 0 ] then echo Smodname: Error Serr while Smsg exit SI

fi

wfile-/ tmp/Smodname.tmp

echo "show i n g r e s " | iinamu j g rep "INGRES " I awk ' i p r i n t S3 ) ' > Swfi le

echo "show l o a d e r " | iinamu I grep "LOADER " I awk ' i p r i n t S3 | ' » Swfi le

s t a tus^O for server in "cat Swfile" do

log "Clos ing s e r v e r S s e r v e r " echo " s e t s e r v e r open" | i i m o n i t o r Sse rve r err=S? i f [ Se r r -ne 0 ] then

echo Smodname: E r r o r Se r r whi le Smsg s t a t u E = l

fi done

sed "s/IINAMO> //" \

sed "s/IINAMU> //" \

exit Sstatus

»!/bin/ksh ft Script to close all dbms servers to new connections ft ft 5-oct-199a 5 P Maybury Original

T= export T

modname='basename SO"

l o g o i

msg=Sl echo ' d a t e ' + l a %H:%M:JS'' 'hostname" Smodname: Smsg

}

c h e c k s t a t ! ) I

err=S? i f I Se r r -ne 0 ] then

echo Smodname: E r ro r Se r r whi le Smsg e x i t SI

f i )

wflie=/tmp/Smodname.Lmp

echo "show i n g r e s " j iinamu j grep "INGRES " | sed "s/IINAMU> / / " \ 1 awk ' i p r i n t S3 ) ' > Swfi le

echo "show loade r " | iinamu I g rep "LOADER " | sed "s/IINAMO> / / " \ I awk ' I p r i n t S3 )• » Swfi le

s t a t u s = 0 for server in "cat Swfile' do

log "Clos ing s e r v e r S s e r v e r " echo " s e t s e r v e r c l o s e d " | i i m o n i t o r Sse rve r err=S? i f [ S e r r -ne 0 ] then

echo Smodname: E r ro r Se r r whi le Smsg s t a t u s ^ l

f i done

exit Sstatus

ft!/bin/ksh ft S c r i p t t o open a l l dbms s e r v e r s t o new c o n n e c t i o n s ft ft 5-ocL-199a S P Maybury O r i g i n a l

T= expor t T

modname='ba3ename 50"

logo I

msg=S] e c h o ' d a t e '+%a %H:%M:1]S' ' ' h o s t n a m e ' Smodname: Smsg

)

c h e c k s t a t O i

e r r=S? i f [ Se r r -ne 0 ) then

echo Smodname: E r r o r Se r r whi le Smsg e x i t SI

f i )

log "Starting. .."

status=0 fo r s e r v e r i n ' echo show | iinamu j g rep "INGRES " | sed "s/IINAMO> / / " I awk I p r i n t S3 ) ' " do

log "showing s e r v e r Sse rve r " echo "show s e r v e r l i s t e n " | i i m o n i t o r Sse rve r checkstat 1

done

ft!/bin/ksh ft S c r i p t t o remove a l l u s e r s e s s i o n s from a l l dlwns s e r v e r s in c u r r e n t ft i n s t a l l a t i o n . • ft 7 -oc t -1999 S P Maybury O r i g i n a l ft 7 - jan-2003 S P Maybury Removed remsh b i t s

T= export T

modname="basename SO'

l o g o i

msg=Sl echo "date '4%a %H:%M:%S'" 'hostname' Smodname: Smsg

}

checkstat[) i err=S? if [ Serr -ne 0 1 then

echo Smodname: E r r o r Se r r whi le Smsg e x i t SI

f i )

wfile=/tmp/Smodname. t.mp

. / u s r / I n g r e s / d b a e n v

echo "show Ingres" | iinamu | grep "INGRES " | sed "s/IINAMU> //" \ I awk 'I print S3 )' > Swfile

echo "show loader" j iinamu j grep "LOADEB " | sed "s/IINAMU> //" \ I awk '1 print S3 )' » Swfile

for server in "cat Swfile" do log "The following sessions will be removed from server Sserver:" echo "show user sessions" I iimonitor Sserver \

I qrep session | sed -e 's/IIMONITOR>//'

fo r s e s s i o n in "echo "show u s e r s e s s i o n s " j i i m o n i t o r Sse rve r \ I g r e p s e s s i o n | s ed - e 's/IIMONITOR>//' | awk ' t p r i n t S2 } ' N I awk - F : ' ( p r i n t SI } ' "

do echo Removing session Ssession ST echo "remove Ssession" | iimonitor Sserver

done done

I Give 1 he servers a chance to actually remove the sessions sleep 5

ft Check no sessions remaining session_count='$ING_LOCAL/ctsessions" echo session_count=Ssession count [( Ssession_count -etj 0 ]] checkstat 1

Appendix D freemb_after_ckp - Space-checking script; runbg - Run cmd in background ft!/bin/ksh ft freemb a f t e r c k p # S c r i p t t h a t checks t h e s i z e of t h e l a s t checkpoin t and the ft space a v a i l a b l e in t h e /chkpnt f i l e s y s t e m . I f t h e r e i s ft s u f f i c i e n t space then the r e t u r n s t a t u s i s 0, e l s e i t i s 1 . ft There must be a t l e a s t one v a l i d s e t of checkpoin t f i l e s i n ckpd i r « ft 27 -oc t -1998 S P Maybury O r i g i n a l ft

modname="basename So"

if I -i "SI" ] then echo "Usage: Smodname dbname" exit 1

else TflBGETDB=Sl

fi

DBLOC_LIST="001 002 003 004 005 006 007 008" if Checkpoint location suffixes

ckpdir='ingprenv II_CHECKPOINT"/Ingres/ckp/default/STARGETDB

if [ -r "Sckpdir/c*.ckp" ] then

l a t e 3 t _ c k p f i l e = ' i s - r t S c k p d i r / C . c k p | t a i l - 1 ' l a tes t_ckpno= 'ba3ename S l a t e s t _ c k p f l i e | cu t - c 2-5"

ckpsiz=0 for dbloc in SDBLOC_LIST do

f i l e b = ' l s - 1 S c k p d i r / c $ i l a t e s t _ c k p n o ) S | d b l o c ) - c k p 1 awk ' i p r i n t S5 ) ' " f i l e k = ' e x p r S f i l e b / 1024' ckps iz="expr Sckps iz + SCilek"

done ckpm="expr Sckpsi 7. I 10 24 "

e l s e ckpm=0

f i

f Space available topdir="echo Sckpdir | cut -f 2 -d "/"' space_avail="df -k Sckpdir | grep "/Stopdir" | awk •( print S^ }'" space_avm="expr Sspaceavail / 1024"

excess^'expr Sspace_avm - Sckpm" echo "LAST_CKP_MB Sckpm" echo "EXCESS Sexcess"

# ! / b i n / k s h # runbg ft Wrapper fo r any command cmd t h a t t ouches a f i l e t o i n d i c a t e success o r f a i l u r e ft This e n a b l e s i t t o be run i n the background and fo r t h e p a r e n t p r o c e s s ft to check for status

T=

i«odnaine="basename SO"

i f [ -z "SI" 1 then

echo "Usage: Smodname t a s k end" e x i t 1

e l s e t a sk=Sl

f i

i f t -z "S2" ] then

echo "Usage: Smodname t a s k cmd" e x i t 1

e l s e cmd=S2

f i

s t a t f i l e = / t m p / S t a s k

rm -f SIstatfile)' echo "Smodname: Scmd" eval Scmd err=5? if [ Serr -ne 0 ] then touch Sistatfile).fail exit 1

fi

touch S i s t a t f i l e ) . o k

i420.txt 2/4/03 10:04:48 AM Pacific Standard Time [email protected]

o: [email protected]

PERFORMANCE DATA GATHERING UTILITY 0420)

This document prowdes technical details about the 1420 utility, as well as some sample reports generated tjy the associated rp reporting utility.

NOTE: This software is supplied by Computer Associates on an unsupported basis.

Prerequisites

The 1420 utility requires the imadb database to be available. In installations where each DBMS sener is configured with a specific list of databases to handle, this means that the DBA must ensure that there is a DBMS server that handles imadb.

There is a one-time installation step that must be done in each Ingres installation: the DBA must mn the script i420.install, which registers the table dmfetats. For the most part, i420 relies on canned imadb tables, but the ones provided for DMF cache stats are «dng the infomnation about the page size; hence the need to

ister an additional table.

Collecting performance data

Once i420.instal! has been run, 1420 needs to be scheduled to mn at frequent periodic intenols. We recommend at least houriy. During testing, we noticed that there appears to be an interaction between 1420 and the process of starting an Ingres DBMS server. It is recommended that 1420 be tumed off during periods where Ingres DBMS servers may be recycled.

User Interface

The command-line syntax for 1420 is shown below.

Usage: 1420 [+t|-i] [+l|-l] [-o filename] H (default) do not produce summary infis on stdout +1 produce summary infiD on stdout -I do not append records to output file +1 (default) append records to output file -o filename override default output file name perf.dat

^ i e default settings are appropriate for tratch usage. Note that there ^ ^ two kinds of output. One is "interactiw" output, controlled via

tne +/-i flags. This is intended to produce a quick snapshot on standanJ output, similar to what the output from trace point dm420 would look like. Here is a partial example:

Tuesday, February 04,2003 America Online: crougliD Page: 1

^ e n « r = atlhpdv3::/@/tmp/ii.20792 I Cache 2K +

Hit Rate = 94.4% (3097 / 3280 )

Single

FUM1T=31, WBEND=500, WBSTART=600, MUMIT=750, TOTAL=1000 1000 free, 0 fixed, 0 modified

Group

1000 buffers (16 pages each), 1000 free, 0 fixed, 0 modified

1 Sen«r= atlhpdv3::/@/tmp/ii.20792 1 Cache 4K +

Hit Rate = <undefined>

Single

FLlMtT=129, WBEND=2064, WBSTART=2477, MLIM1T=3096, TOTAL=4128 4128 free, 0 fixed, 0 modified

Group

254 buffiers (8 pages each), 254 free, 0 fixed, 0 modified

The other kind of output is "logged" output, controlled by the +/-1 flags. This is the core capability of 1420. By tuming on logged output, 1420 writes perfomDance data to a flat file (perf.dat by default, but the -o flag can be used to specify a different path and filename), which can then later be processed by the (stand-alone) utility rp, which constructs six different types of reports. Note that rp does not need, and is not expected to be, mn on the same machine wrtiere the statistics were collected. Each line in perf.dat is self-contained, identifying the timestamp, senrer name and data item. It is even possible to append perf.dat files from different installations onto a single file and ha\e rp report the results either aggregated or separated.

Generating Performance Reports

f generate perfomiance reports, you must obtain the peri'.dat file fisr the installation you want to report on. You then mn the rp tool, anywhere you ha^e Ingres and imadb. (The imadb database is only

Tuesday, February 04, 2003 America Online: CloughD Page: 2

required tecause rp creates some temporary tables and needs to be ^ ^ e c t e d to a database; all temporary tables generated by rp are ^ R i o n scope tables, so they all disappear when the program finishes).

The rp utility has the following command-line syntax:

Usage: rp <datfile> <repno> datfile is output ft"om 1420 Values for repno: 1 - DMF Cache Report, all vnodes, afl servers, all caches 2 - DMF Cache Report, this vnode, all senrers, all caches 3 - DMF Cache Report, all vnodes, by cache size 4 - DMF Cache Report, installation summary 5 - Locking System Report 6 - Logging System Report

Instead of the numerical report id's, the following syntax is also supported:

-dmfl, -dmf2, -dmf3, -dmf4, -lock, -log.

Partial samples fiar reports 1,2,4,5 and 6 follow. Due to the fact that the sample only had one Vnode, reports 2 and 3 would be identical, so only report 2 is shown.

NOTE: due to existing limitations in the imadb implementation, reports 1 and 2 do not work as advertised. There is no vray (wa imadb) to get per-server

s the way trace point dm420 allows. The most useful cache report is 3. single-server installation, such as is typical on Sun Solaris or Windows

NT/2000, this is not a problem.

==s====s=:==s:a=======ss====s== Type 1 Reporf ===================s========

SERVER report for Senrer embhpdbl ::/@/tmp/ii.19872, 2K Cache, Databases: Oidbdb, imadb) DATE TIME HITS FIX RATH lOWT GWT SYNC GSYN READ WRITE GREAD GWRITE

4-dec-OO 19:45 20:45 21:45 22:45 23:45

TOTALS: 4.00 HOURLY; 1.00

4421 417 407 407 407

6451 68.5% 512 81.4% 470 86.5% 470 86.5% 470 86.5%

0 0 0 0

6059 8373 72.4% 1513 2092 72.4%

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 0 0 0 0 0 0

0 31 0 0 0

0 0

977 64 63 63 63

1008 251

1040 0 0 0 0

1293 323

112 0 0 0 0

112 27

0

0 0

SERVER report for Seiver embhpdbl ::/@/tmp/ii.2174, 2K Cache, Databases: (east) DATE TIME HITS FIX RATE lOWT GWT SYNC GSYN READ WRITE GREAD GWRITE

5-dec-OO 06:45 1638804 1743170 94.0% 0 0 0 0 826 103549 0 07:45

^ 08:45 m 09:45

10:45 11:45 12:45

1793015 680855

1493439 1406

15333 1114

1918662 93.4% 0 0 0 0 140 125501 713507 95.4% 0 0 0 0 517 32135 1518666 98.3% 0 0 0 0 13 25214 1460 96.3% 0 0 0 0 49 2 0 25826 59.3% 0 0 0 0 9062 134 1114 100.0% 0 0 0 0 0 0 0

Tuesday. February 04,2003 America Online: CloughD

0 0 0 0 0 0

0 8 0

0

F-age: 3

13:45 14:45 15:45 16:45 17:45 18:45 19:45 20:45 21:45 22:45 23:46

6322 1695 3726 4212

0 0 0 0 0 0 0

6373 170E 3732 4246 0 0 0 0 0 0 0

99.1% 99.1% 99.8% 99.1%

0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0

0 0 0 0

0 0 0 0 0 0 0

50 14 6

31 0 0 0 0 0 0 0

1 0 0 0

0 0 0 0 0 0 0

0 0 0 13

0 0 0 0 0 0 0

0 0 0 0

TOTALS: 17.01 HOURLY: 1.00

639921 5938465 95.0% 331613 349167 95.0%

0 0 0 0 10708 286536 0 0 0 0 629 16847

21

SERVER report for Senrer embhpdb1::/@/tmp/ii.2174, 4K Cache, Databases: (east) DATE TIME HITS FIX RATE lOWT GWT SYNC GSYN READ WRITE GREAD GWRITE

5<fec-00 06:45 1123113 1201673 07:45 08:45 09:45 10:45 11:45 12:45 13:45

^ 14:45 W 15:45

16:45 17:45 18:45 19:45 20:45 21:45 22:45 23:46

1267328 1720498 790431 46593

510401 0

40 0

28928 0 0 0 0 0 0 0 0

93.4% 1360503 93.1% 1845021 93.2% 851918 92.7% 56301 82.7% 608038 83.9% 0 61 0

0.0% 65.5% 0.0%

0 0

0 30194 95.8% 0 0 0 0 0 0 0 0

0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

0 0 0 0 0 0 0 0

0 0

0 0 0

0 0

0 0 0 0 0 0 0 0 0 0

0 0 0

0 0 0

0 0

0 0 0 0 0 0 0 0 0 0

0 0 0

0 0 0

0 0

0 0 0 0 0 0 0 0 0 0

0 0 0

0 0 0 0 21

0 0

0 0 0 0 0 0 0 0

0 15 1

1 8 692

0

0 0

0 0 0 0 0 0 0 0

20 78580 93170

124544 61486 9697

95502 0

0 0 0

1266 0 0 0 0 0 0 0 0

( 0

0

0 0 0 0 0 0 0 0 0 0

0 0

TOTALS: 17.01 5487332 5953709 92.2% 0 0 0 0 758 464245 1 0 HOURLY: 1.00 322641 350063 92.2% 0 0 0 0 44 27296 0 0

Type 2 Report

VNODE report for node embhpdbl, 2K Cache

DATE TIME HITS FIX RATE lOWT GWT SYNC GSYN READ WRITE GREAD GWRITE

4-dec-OO 19:45 173886880 194723920 89.2% 0 108 0 1314 324328116987024 1150816 1506 20:45 8449376 11244544 75.1% 0 30 0 0 31 2795586

t, 21:45 10350544 22:45 10718448 23:45 5323888

ec-00 02:45 368 03:45 432 04:45 406

12587040 82.2% 13143872 81.5% 7277936 73.1%

631 58.3% 538 80.2% 0 470 86.3% 0

180 78 0

0 0 0

0 0 66 2236632 0 0 36 2425522

0 2136 32886 1919114 0 0 238 25 0

0 41 65 2 0 0 64 0

0 66 54 81852

0 0

0 0 4062

Tuesday, February 04,2003 America Online: CkiughD Page: 4

05:45 06:45 07:45 08:45 09:45 10:45 11:45 12:45 13:45 14:45 15:45 16:45 17:45 18:45 19:45 20:45 21:45 22:45 23:46

6-dec-OO 01: 02:46 03:46 04:46 05:46 06:46 07:46 08:46 09:46 10:46 11:46 12:46 13:46 14:46 15:46 16:46 17:46

46

406 9833467 10758317 4085630 8960886 8856 92544 7092 38336 10576 22764 25676 408 404 408 404 408 408 404

362 413 424 406

4501123 5138184 11408446 14545886 4455284 10862112 115712 1673800 38448 5596 1452

111432 1775632

0

0 64 0 0 4956 621374 0 840 753059 0 3102 192879 0 78 151338 294 77 2 I

0 54484 869 54 0 64 0 0

70 64 64 64

64

0

300 84 36 186

0 0 0 78

470 86.3% 0 0 0 0 10459744 94.0% 0 0 0 11512251 93.4% 0 0 0 4281610 95.4% 0 0 0 9112306 98.3% 0 0 0 9244 95.8% 0 0 0 0

155680 59.4% 0 0 0 7152 99.1% 0 0 0 0 38708 99.0% 0 0 0 0 10724 98.6% 0 0 0 0 22864 99.5% 0 0 0 0 25944 98.9% 0 0 0 0 472 86.4% 0 0 0 0 468 86.3% 0 0 0 0 468 87.1% 0 0 0 0 472 85.5% 0 0 0 0 472 86.4% 0 0 0 0 472 86.4% 0 0 0 0 468 86.3% 0 0 0 0

622 58.1% 0 0 0 512 80.6% 0 0 0 0

496 85.4% 0 0 0 0 470 86.3% 0 0 0 0

4875712 92.3% 0 0 0 5477919 93.7% 0 0 0 12789249 89.2% 0 0 0 0 673734 687727 42 15998732 90.9% 0 0 0 0 486 1452525 0 4810012 92.6% 0 0 0 0 2694 351977 0 11127424 97.6% 0 0 0 0 624 264689 0 187896 61.5% 0 0 0 0 16248 77 42110 4245344 39.4% 0 0 0 0 1973862 22960 949398 41088 93.5% 0 0 0 0 2334 304 48 0 5692 98.3% 0 0 0 0 30 64 0 0 1512 96.0% 0 0 0 0 0 6 4 0 0 183640 60.6% 0 0 0 0 16284 76 21504 0 4416716 40.2% 0 0 0 0 1987608 22918 973074

0

64 64 64 65 64 64

238 22 34 65 0 7 65 2 0 64 0

0 4884 369442 0 360 339295

0 0

0 0

0 0 0

0

0

0

TOTALS: 46.03 297212416 338782080 87.7% HOURLY: 1.00 6457513 7360694 87.7% 0

396 0 3450 8020366 31596801 3219102 5568 8 0 74 174257 686501 69941 120

VNODE report for node embhpdbl, 4K Cache

DATE TIME HITS FIX RATE lOWT GWT SYNC GSYN READ WRITE GREAD GWRITE

4-dec-OO 19:45 138484192 150323376 92.1% 0 0 20:45 21:45 22:45 23:45

0 0 0 0

5-dec-OO 02:45 0 03:45

^ ^ 04:45 m 05:45

06:45 07:45

0 0 0

6738896 7604103

0 0.0% 0 0 0 0 0 0.0% 0 0 0 0 0 0.0% 0 0 0 0 0 0.0% 0 0 0 0

0 0.0% 0 0 0 0 0 0.0% 0 0 0 0 0 0.0% 0 0 0 0 0 0.0% 0 0 0 0

7210268 93.4% 0 0 0 8163158 93.1% 0 0 0

8286 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0

0 0 0

120 90

532494 1080809^ 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

471496 0 0 559020 0 0

08:45 10322761 11069894 93.2% 0 0 0 0 6 747259 0 0

Tuesday, Febnuuy 04,2003 America Online: CloughD Page: 6

09:45 ^ 10:45 • 11:45

12.45 13:45 14.45 15:45 16:45 17:45 18:45 19:45 20:45 21:45 22:45 23:46

4742460 279668 3062296

0 240

0 173568

D 0 0 0 0 0 0 0

6-dec-OO 01:46 0 02:46 03:46 04:46 05:46 06:46 07:46 08:46 09:46 10:46 11:46 12:46

^ 13:46 W 14:46

15.46 16:46 17:46

0 0 0

623599 5730484 5263829 4838985 8563233 5366636 2650076 5258434

940 0 0

806816 7939208

5111370 92.7% 337922 82.7% 3648112 0 0.0%

83.9% 0

364 65.9% C 0 0.0% 0

181168 95.8% 0 0.0% 0 0.0% 0 0.0% 0 0.0% 0 0.0% 0 0.0% 0 0.0% 0 0.0%

0 0 0 0 0 0 0 0

0 0.0% 0 0.0% 0 0.0% 0 0.0%

0 0 0

666478 93.5% 6407246 5683769 5190727 9171068 5785004 2911360 7156180

89.4% 92.6% 93.2% 93.3% 92.7% 91.0% 73.4%

1624 57.8% 0 0.0% 0 0.0%

0 0

932428 86.5% 10501520 75.6%

0 0 0

0 0

0 0

0 0 0 0 0 0 0 0

3 C 0 0 0 0 0 0 0 0 0 0 0

0 c 0 0 0

0

0 0 0

0 0

0 0

0 0 0 0 0 0 0 0

0 0 0

0 0

0 0

0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

01214^ ) 0 0 0 0 0

0 0 0

0 0 0 0

6 48 4152

0 126

0 0 0 0 0 0 0 0 0 0

0 0 0 0

0 0 0 0 0 0 0

\ 0

0 0 0 0 0 0 0 0 0 0

0 0 0

66 48 90 36 6 0 6

368905 58186

573008 0

0 0

0 0 0

7596 0 0 0 0 0 0 0 0

0 0 0 0

0 0 0 0 0 0 0 0

0 0 0 0

6 0

0

0 0

0 0 0 0 0

42812 676933 420003 351817 607915 418371 261260

480996 624

0 0

0 0 20386 0

0 0

0

60 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0

912761 460050 48

0 0

125593 476892

0

0 0 1399793 59926(

TOTALS: 46.03 218450432 240453056 90.8% 0 0 40816 0 1495806 18810882 1519348 0 HOURLY: 1.00 4746257 5224306 90.8% 0 0 886 0 32499 408702 33010 0

Type 4 Report

SUMMARY report for all vnodes, all caches

DATE TIME HITS FIX RATE lOWT GWT SYNC GSYN READ WRITE GREAD GWRITE

4-<fec-00 19:45 312371072 345047296 90.5% 0 108 8286 1314 3775775 27795118 1610800 1506 20:45 8449376 11244544 75.1% 0 30 0 0 21:45 10350560 12587040 82.2% 0 180 0 0 22:45 10718432 13143872 81.5% 0 78 0 0

31 2795586 66 2236632 36 2425522

66 0 54 0

23:45 5323872 5-dec-OO 02:45 368

03:45 432 04:45 406

7277920 73.1% 0 0 0 2136 32886 1919114 81852 4062 631 58.3% 0 0 0 0 238 25 0 0

538 80.2% 0 0 0 0 41 65 2 0 470 86.3% 0 0 0 0 0 64 0 0

05:45 406 470 86.3% 0 0 0 64 0 06:45 16572363 17670012 93.7% 0 07:45 18362416 19675408 93.3% 0

0 0 5076 1092870 0 0 930 1312079

Tuesday, February 04,2003 America Online: OoughD Page: S

08 09: 10: 11 12 13 14 15: 16 17 18 19 20: 21 22 23

6-dec-OO 02 03 04 05: 06 07 08 09 10 11 12 13 14 15 16 17

TOTALS: HOURLY:

45 45 45 45 45 45 45 45 45 45 45 45 46 01:46 46 46 46

58636 573877 0 64 0

70 0 64 0

45 14408396 15351508 93.8% 0 0 0 0 3108 940138 45 13703348 14223672 96.3% 0 0 0 0 84 520243 45 288520 347164 83.1% 0 0 0 0 342 58263

3154840 3803792 82.9% 0 0 0 0 7092 7152 99.1% 0 0 0 0 38576 39080 98.7% 0 0 0 0 10576 10720 98.6% 0 0 0 0 196332 204024 96.2% 0 0 0 0 25676 25952 98.9% 0 0 0 0 408 464 87.9% 0 0 0 0

0 0 0 0 0 0

426 84 36

186

0 0 0 0

2 0 60 0 0 0 0

404 408 404 408 408 404

472 472 472 472 472 464

85.5% 86.4% 85.5% 86.4% 86.4% 87.0%

362 622 58.1% 0 0 0 0 413 512 80.6% 0 0 0 0 424 496 85.4% 0 0 0 0 406 470 86.3% 0 0 0 0

46 5124722 5542190 92.4% 0 0 0 0 46 10868668 11885166 91.4% 0 0 0 46 16672275 18473016 90.2% 0 0 0 46 19384870 21189456 91.4% 0 0 0 46 13018516 13981080 93.1% 0 0 0 46 16228752 16912432 95.9% 0 0 0 46 2765792 3099256 89.2% 0 0 0 C 46 6932232 11401520 60.8% 0 012144 46 39384 42720 92.1% 0 0 0 0 46 5600 5688 98.4% 0 0 0 0 46 1448 1512 95.7% 0 0 0 0 46 918248 1116072 82.2% 0 0 0 0 46 9714840 14918232 65.1% 0 0 20386 46.03 515662976 579235136 89.0% 1.00 11203773 12585001 89.0%

7660 64 78

64 0 0

0 0 0

64 64 64 65 64 64

238 22 0 34 65 0 7 65 2 0 64 0

I 4950 412254 0 408 1016228 0 673824 1107730 0 522 1804342 0 2700 959892 0 624 683060 t 16254 261337

0 0 0 0 0 0

0 0 0 0 0 0 0 0 42 0 0 0 0 0 0 0

42110 0 0 2454858 935721 1409448 0 2958 364 96 0 30 64 0 0 0 64 0 0 16284 125669 21504 0 0 2464500 1422711 1572334 0

0 396 40816 3450 9516172 50407683 4738450 5568 0 8 886 74 206757 1095204 102951 120

Type 5 Report

Locking System Statistics for vnode embhpdbl

DATE TIME REQUESTS WAITS WTS/10M ESCALATE DEADLOCK

4-dec-OO 19:45 20:45 21:45 22:45 23:45

^ec-00 02:45 m 03:45 ^ 04:45

05:45 06:45

856163 609 27759 1426 31636

987 880 594 594 6819

0 0 0 0 1

0 0 0 0

3 0 0 0 0

35

10131 0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0 0

0 0 0 0 0

Tuesday, Febtuaiy 04,2003 America Online: CloughD Page: 7

07:45 ^ 08:45 B 09.45

10:45 11:45 12:45 13:45 14:45 15:45 16:45 17.45 18:45 19:45 20:45 21:45 22:45 23:46

6-dec-OO 01:46 02:46 03:46 04:46 05:46 06:46 07:46 08:46 09:46 10:46

^ 11:46 W 12:46

13:46 14:46 15:46 16:46 17:46

TOTALS: 46.03 HOURLY: 1.00

7955 31342 1504 2114 29923 1708 6843 2241 4323 4866 595 594 594 594 599 594 594

972 609 866 594

6070 9265

227158 6155 27677 8680 2255 76356 6290 1464 767

1537 75359

0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0

1 0 0 0 0 1 0

0 0 3 0 0 0 0 0 0 0

1476524 32080

0 0 0 0 0 0 0 0 0

4110 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0

10288 0 0 0 0

1079 0

0 0

3456 0 0 0 0 0 0 0

11 0

- = = T i /

0 0 0 0

0 0

0 0 0

0 0 0 0 0 0 0

0 0

no R P o

0 0 0 0 0 0 Q 0 0

0 0 0 0 0 0 0 0

0 0 0 0 0

0 0

0 0 0

0 0 0 0 0 0 0

0 0

nnrt

Logging System Statistics for vnode embhpdbl

DATE TIME

4-dec-OO 19:45 20:45 21:45 22:45 23:45

5-dec-OO 02:45 03:45

^ 04:45 m 05:45 ^ 06:45

07:45 08:45

TRANSAC WRITES

4013 123 177 136 179

77 134 123 123 309 204 496

11681 1

43 79

19486 9

9 1 1 18 21 90

WRITE 10s S-WTS(/10M) F-WTS(/10M) KBYTES UTIL % LOGUSE

2147 0( 0) 0( 1 29 41

0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0)

1572 0( 0) 0( 0) 11

3 1 1 18 21 90

0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0)

0) 5105 59.4% 5130536 0 0.0% 5130536 18 15.5% 5118816 29 17.6% 5116474 4675 74.3% 2342

3 6.8% 0 3 25.0% 0 0 0.0% 0 1 25.0% 0 9 12.5% 0 10 11.9% 0 45 12.5% 0

Tuesday, F^HUaiy 04,2003 America Online: CloughD F^ge: 8

09:45 ^ 10:45 9 11:45

12:45 13:45 14:45 15:45 16:45 17:45 18:45 19:45 20:45 21:45 22:45 23:46

6-dec-OO 01:46 02:46 03:46 04:46 05:46 06:46 07:46 08:46 09:46 10:46 11:46 12:46

^ 13:46 W 14:46

15:46 16:46 17:46

TOTALS: 46.03 HOURLY: 1.0C 1

133 148 254 123 161 123 164 159 123 123 123 123 125 123 123

71 123 134 123 305 239 248 186 448 218 144 195 313 133 125 129 174

1153C 250

4 14 67 1 7 1 9

50 1 1 1 1 1 1 1

9 1 9 1

16 54 77 20 80 28 15 9 51 6 2 3 12

4 8

29 1 7 1 9 21 1 1 1 1 1 1 1

11 1 3 1 16 54 44 20 80 28 9

9 20 6 2 3 6

31992 695

0( 0) 0( 0) 0( 0)

0( 0) 0( 0) 0( 0) 0( 0) 0( 0)

0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0)

0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0)

0( 0) 0( 0) 0( 0) 0( 0)

4335 94

0( 0) 0( 0) 0( 0)

0( 0) 0( 0) 0( 0) 0( 0) 0( 0)

0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0)

0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0) 0( 0)

0( 0) 0( 0) 0( 0)

0( 0) 0( 0) 0( 0) 0( 0)

0( 0)

2 12.5% 5 15.6% 21 18.1% 1 25.0% 3 10.7% 1 25.0% 4 11.1% 14 16.6% 1 25.0% 0 0.0% 1 25.0% 0 0.0% 1 25.0% 0 0.0% 1 25.0%

3 6.8% 1 25.0% 2 16.6% 1 25.0% 8 12.5%

27 12.5% 29 16.4% 10 12.5% 40 12.5% 14 12.5% 5 13.8% 5 13.8% 14 17.5% 3 12.5% 1 12.5% 2 16.6% 3 12.5%

0( 0) 10121

0 0 0

0 0 0 0

0 0 0 0 0 0 0 0

( 0 0 0 0 0 0 0 0 0

0 0

0 0 0 0 0

K

Headers Retum-Path: <[email protected]> Received: fi-om riy-xfD1.nnx.aol.com (rty-xfOI.mail.aof.com [172.20.105.225]) tiy air-xro4.mail.aol.com (v90_r2.5) with ESMTP id MAILINXF44-0204130448; Tue, 04 Feb 2003 13:04:48 -0500 Received: from chinook.psmfc.otg (chinook.psmfc.org [199.170.103.4]) by riy-xf01.nnx.aol.com (v90_r1.1)with ESMTP id MA1LRELAYINXF15-0204130429; Tue, 04 Feb 2003 13:04:29 1900 Received: from sockeye.psmfc.org (sockeye [199.170.103.50])

by chinook.psmfc.org (8.8.8+Sun/8.8.8) with ESMTP id KAA22922 for <[email protected]>; Tue, 4 Feb 2003 10:04:27 -0800 (PST) «om: [email protected]

Iceived: (from ingres@localhost) by sockeye.psmfc.oig (8.9.3+Sun/8.9.3) id KAA05028 for [email protected]; Tue, 4 Feb 2003 10:04:24 -0800 (PST)

Date: Tue, 4 Feb 2003 10:04:24 -0800 (PST)

Tuesday, FetTuary 04,2003 America Online: CloughD Page; 9

Message-Id: <[email protected]> «[email protected] ject: i420.txt

Tuesday, February 04,2003 America Online: ChMighD F^ge: 10

/(/^r/\'s<^fQs/(^^^^\ (^ (/ckf-cLzdhwfi (pr^ ^UneMci^'j

#!/bin/ksh # # This script manages files created during the checkpoint process. # The strategy is to minimise use of filespace while retaining all # files for $(CKP_KEEP_COUNT) valid checkpoints, if possible. # # The following files are managed: # Ckp files; (II_CHECKPOINT)/ingres/ckp/default/<dbname>/cCCCC???.ckp # Jnl files: (ll_JOURNAL)/Ingres/jnl/default/<dbname>/jJJJJJJJ.jnl # Dmp files; (II_DUMP)/angres/dmp/default/<dbname>/dDDDDDDD.dmp # Arch cnf files: (II_DUMP)/ingres/dmp/default/<dbname>/cOOOCCCC.dmp # Lst files: (ll_DUMP)/ingres/dmp/default/<dbname>/cCCCCOOO.1st # # Checkpoint Files: # 0) Never delete the only set of non-zero-length files for a valid ckp # 1) Delete any files associated with any invalid checkpoint. # 2) Delete any file associated with a checkpoint that is no longer # referenced in the current database cnf file. # 3) Keep all files for the last $(CKP_KEEP_C0UNT1 valid checkpoints for # each database and delete the rest. # # Dump Files: # 1) Delete any files associated with any invalid checkpoint. # 2) Delete any file associated with a checkpoint that is no longer # referenced in the current database cnf files in this installation. # # Journal Files: # 1) Keep all journal files for interval from oldest retained # checkpoint to present. Even if intervening checkpoints are invalid. # Zip them. # 2) Delete the rest # # cdmp and 1st Files: # 1) Delete any files associated with any invalid checkpoint. # 2) Delete any file associated with a checkpoint that is no longer # referenced in the current database cnf files in this installation, # # Customisation for prosrv02/prorpt02, which share checkpoint files on # the NFS-mounted filesystem /chkpnt: # Each day all the databases are copied from prosrvOl to prorptOl # using the bcv mechanism. This is done within the context of a # trisplus offline checkpoint, which achieves the necessary valid # physical fileset for the trisplus database. # # Only offline checkpoints are used. Physical checkpoints are run only # on prorpt02. When a checkpoint is run on prDrpt02, an offline null # checkpoint is run on prosrv02 with the database frozen in the same # state in order to synchronize checkpoint numbers, ie the latest checkpoint # listed for the prosrv02 database will match the real checkpoint files # created on prorpt02. # # History: # 2-nov-1998 S P Maybury Original # 31-dec-1998 S P Maybury Change CKP_KEEP_COUNT for trisplus to 1. # ll-oct-1999 S P Maybury Update for Ingres II and change CKP__KEEP_COUNT # for trisplus to 2 # 22-oct-1999 S P Maybury Remove -s flag from iidbdb - unnecessary in IngresII # Determine oldest_jrnlno_reqd Independently of the # existence of the c*.dmp file. # Ol-mar-2000 R W Kuehn Added a function to retain the last three c*.lst files # in the /dump/ingres/dmp/default/<dbname> directory. # lO-jul-2000 S P Maybury Rewrite to simplify script. # Retain referenced cdmp and 1st files. # Enable host-specific configuration by changing # prosrvOl to prosrv02

Change CKP_KEEP_COUNT from 2 to 1 (1 ckp is now 45% of /chkpnt) for trisplus Zip remaining jnl files. Remove all gz files from jnl directory near start

T-export T

modname='basename $0'

logo { if [ -n "$1" ] then msg="$l" echo "^date '+%a %H:%M:%S'' ${hostname} ${modname}: $msg"

else echo

fi 1

checkstat() { err=$? if [ $err -ne 0 ] then echo Smodname: Error $err while $msg echo false >&2 exit $1

else echo true >&2

fi

# Send false to stderr so that if this script # is run from remsh, stderr from remsh is also # false. This can be redirected to a file and # and executed to yield an equivalent $?

# As above, this translates to $?=0

repstat() ( err=$? if [ $err -ne 0 ] then echo Smodname: Error $err while $msg

fi }

# Set up environment . /etc/.ingres_profile . -ingres/dbaenv

MAX_CKPNO=9999 MAX_DMPNO-9999999 MAX_JNLNO-9999999

# Note on checkpoint numbers # # To facilitate comparison between strings derived from filenames (possibly # with leading zeros) and ckpnos read from infodb, we write ckpnos with leading # zeros to list files, using %04d format

CREATE_LISTS() { # Make lists from infodb and directories for comparison log log "*** DATABASE ${dbname} ***" log

log "********+***+********" log "* Preparing Lists *" log "****+****************" log "making infodb output file $infodb_file" infodb S{dbname} > $infodb_file

log "Defining top and bottom of lists" jlist_prehead=\

~grep -n "Checkpoint History for Journal" $infodb_file | cut -f 1 -d :' dlist_prehead=\

'grep -n "Checkpoint History for Dump" $infodb_file I cut -f 1 -d ; dlist_footline=\

•grep -n "Cluster Journal History" $infodb_file | cut -f 1 -d :'

jlist_headline='expr $jlist_prehead + 2" jlist_footline='expr $dlist__prehead - 1~ jlist__len^'expr $ jlist_footline - S jlist_headline" jlist_headline='expr $ jlist_prehead + 3'' dlist_headline='expr $dlist_prehead + 3" dlist_len=^expr Sdlist_footline - $dlist_headline'

log "extracting jnl ckp list" tail +$jlist_headline $infodb_file I head -$jlist_len \

I awk '{ if ( $9 ~ 0 ) { val="INV" } else { val-"VAL" }; \ printf("%04d %d %d %s %s\n",$6,?7,$8,val,$10) }' > $jlist

checkstat 10 # Format: ckpno lst_jnl last__jnl VAL/INV ONLINE/OFFLINE cat $jlist

# Check jlist if [ 'cat Sjlist I wc -1' -eq 0 ] then

log "No checkpoints listed in infodb - exiting with no cleanup attempted' exit 0

fi

latest_referenced_ckpno='tail -1 Sjlist | awk ') print $1 }'' log "Latest referenced ckpno = ${latest_referenced_ckpno}"

log "extracting dmp ckp list" tail +$dlist_headline Sinfodb_file | head -$dlist_len \

I awk '{ if ( $9 ~ 0 ) { val="INV" } else { val="VAL" }; \ printf("%04d %d %d %s %s\n",$6,$7,$8,val,$10) }' > $dlist

checkstat 20 # Format: ckpno lst_dmp last_dmp VAL/INV ONLINE/OFFLINE cat $dlist

log "making a list of dmpno, ckpno from dlist" grep ONLINE $dlist | awk 'f

ckpno=Sl; first_dmpno=$2; last_dmpno=$3; valid=$4 if ( first_dmpno 1= 0 ) { if ( last_dmpno == 0 ) ( last_dmpno=first_dmpno

} dmpno=first_dmpno while ( dmpno <= last_dmpno )

{

printf("%07d %04d %s\n",dmpno,ckpno,valid) dmpno=dmpno + 1

} }

}' » $dmpckp_list cat $dmpckp_list

latest_referenced_dmpno='tail -1 $dmpckp_list | awk '{ print $1 }'" log "Latest referenced dmpno = S{latest_referenced_dmpno}"

log "making a list of checkpoint files" Is -1 ${CKPDIR}/c*.ckp I rev | cut -f 1 -d "/" I rev | cut -c 2-5 \

> $file_ckpnos checkstat 30 cat $file_ckpnos I sort -u

log "making a list of cdmp files" Is -1 ${DMPDIR}/c*.dmp I rev | cut -f 1 -d "/" | rev | cut -c 5-8 \

>$file_cdckpnos checkstat 4 0 cat $file_cdckpnos

log "making a list of 1st files" Is -1 $)DMPDIR}/c*.lst I rev | cut -f 1 -d "/" I rev | cut -c 2-5 \

>$file_lsts checkstat 40 cat $file_lsts

log "making a list of ddump files" Is -1 $DMPDIR/d*.dmp | rev | cut -f 1 -d "/" | rev | cut -c 2-8 \

I sed -e s/'-O// -e s/'-O// -e s/^0// | sed -e s/^0// \ I sed -e s/"0// -e s/"0// -e s/"0// >$file_ddmpnos

repstat 95 cat $file_ddmpnos

} # end CREATE_LISTS

CHECK_CKP_FILES 0 { # Check files for CKP_KEEP_COUNT checkpoints working backwards # Make a list of: # - checkpoints to keep in keep_jlist

ok_ckp_count=0 checked_ckp_count=0 oldest_ckpno_reqd=$MAX_CKPNO oldest_dmpno_reqd=$MAX_DMPNO Oldest_jnlno_reqd-$MAX__JNLNO # By default, do not keep jnl files log "We need ${CKP_KEEP_COUNT} checkpoints on disk" for ckpno in ^grep "VAL" $jlist I cut -f 1 -d " " | sort -r \

I awk '( printf("%04d ",$1) }'" do if [ $ok_ckp_count -It $iCKP_KEEP_COUNT} ] then

log "*** Checking required files for checkpoint $ckpno ***"

ckp_files_present=0 cdmp_fiie_present=0 lst_file_present=0 ddmp_files_present=0

log "checking for ${DATA„LOC_COUNT} ckp files for ckpno $ckpno" if [ 'grep ""$ckpno" $file_ckpnos I wc -V -eq $(DATA_LOC_COUNT} ] then log "Found them" ckp_files_present=l

else log "****** + *********************•**********************"

log "* Checkpoint files missing for checkpoint $ckpno *" loo "***********•*********** + ****•*************** + *******"

fi

log "checking for cdmp file for ckpno $ckpno" if [ "grep ""Sckpno" $file_cdckpnos I wc -1' -eq 1 ] then

log "Found it" cdmp_file_present=l

else log "cdmp file for ckpno ${ckpno} missing"

fi

log "checking for 1st file for ckpno $ckpno" if [ 'grep "''$ckpno" $file_lsts | wc -l" -eq 1 ] then

log "Found it" lst_file_present=l

else log "1st file for ckpno ${ckpno} missing"

fi

ddmp_files_present^l if [ "grep ""Sckpno" $dlist | wc -V -eq 1 ] then

log "checking for all ddmp files for ckpno $ckpno" for dmpno in 'grep " Sckpno " $dmpckp_list I cut -f 1 -d " do

if [ 'grep ""Sdmpno" $file_ddmpnos | wc -1' -eq 0 ] then

log "dump file Sdmpno missing for checkpoint Sckpno" ddmp_files_present=0

else log "Found dump file Sdmpno "

fi done

else log "No dump files required"

fi

if [ Sckp_files_present -eq 1 \ -a Scdmp_file_present -eq 1 \ -a Slst_file^present -eq 1 \ -a Sddmp_files_present -eq 1 ]

then log "Adding ckpno ${ckpno} to keep_jlist" grep ""Sckpno" Sjlist >> $keep_jlist

ok_ckp_count='expr Sok_ckp_count -+• 1"

log "ok_ckp_count = $ok_ckp_count" log "Need checkpoint Sckpno"

fi

jnlno_field^'grep ""Sckpno" Sjlist | cut -f 2 -d " "' if [ Sjnlno_field -ne 0 ] then

oldest_jnlno_reqd=$jnlno„field fi

if [ Soldest_jnlno__reqd -eq SMAX_JNLNO ] then

log "No journal files required" else

log "Oldest jnlno required = $oldest_jnlno__reqd" fi log

log fi

done

if [ Sok_ckp_count -ge S{CKP_KEEP_COUNT} ] then

log log "Got required number of checkpoints" log

else log log "We have only ${ok_ckp_count} checkpoints of S{CKP_KEEP_COUNT} required" needed_ckp_count='expr $iCKP_KEEP_COUNT} - ${ok_ckp_count}' log "We will keep checkpoint files for the latest ${needed_ckp_count} valid

ckps" log "even though we do not have all the supporting files."

ext ra_ckp_count=0 for ckpno in 'grep "VAL" Sjlist I awk '{ print $1 }' I sort -r' do

if [ ! -r $keep_jlist ] II [ 'grep ""Sckpno" Skeep_jlist I wc -1' -eq 0 ] then

if [ ${extra_ckp_count} -It S{needed_ckp_count} ] && \ [ 'grep ""Sckpno" Sfile_ckpnos | wc -1' -eq ${DATA_LOC_COUNT} ]

then log "Adding ckpno ${ckpno} to keep_jlist" grep ""Sckpno" Sjlist » $keep_jli3t extra_ckp_count='expr S{extra_ckp_count} + 1'

fi fi

done fi

# We need journals forward continuously from after the oldest ckp kept if [ -r Skeep_jlist ] then

candidate_jnlno='awk '{ print S2 }' $keep_jlist | grep -v ""0$" | sort t head -1'

if [ -n "S{candidate_jnlno}" ] then

oldest_jnlno_reqd=${candidate„jnlno} log "Oldest jnlno required = $oldest_jnlno„reqd"

fi else

log "We do not have any checkpoints, so we will keep all the journals" fi

} # end CHECK_CKP_FILES

CLEAN_CKP() { # Manage S(CKPDIR}/c*.ckp files # This function must not be called before keep_jlist is complete

Is -It S{CKPDIR}/c*.ckp I while read name do

ckpno='echo S{name} | rev | cut -f 1 -d "/" | rev | cut -c 2-5'

if [ 'grep ""Sckpno" Sjlist I wc -1' -eq 0 ] then

if [ Sckpno -le Slatest_referenced_ckpno ] then

log "Removing S{name} - not referred to" ST /usr/bin/rm ${name} checkstat 90

fi elif [ 'grep ""Sckpno" Sjlist 1 grep VAL | wc -V -eq 0 ] then

log "Removing S{name} - part of an invalid checkpoint" $T /usr/bin/rm S{name} checkstat 100

elif [ 'grep ""Sckpno" $keep_jlist 1 wc -1' -eq 0 ] then

log "Removing Sfname} - not from a required checkpoint" ST /usr/bin/rm S{name} checkstat 110

fi done

} # end CLEAN_CKP

CLEAN_CDMP() { # Manage S{DMPDIR}/c*.dmp files # Remove files no longer referenced or for invalid ckps

Is -It S{DMPDlR}/c*.dmp I while read name do

ckpno='echo ${name} | rev | cut -f 1 -d "/" | rev | cut -c 5-8'

if [ 'grep ""Sckpno" Sjlist | wc -1' -eq 0 ] then

# This file is not referred to if [ Sckpno -le Slatest_referenced_ckpno ] then

log "Removing ${name} - not referred to" $T /usr/bin/rm ?{name} repstat 12 0

fi elif [ 'grep ""Sckpno" Sjlist f grep VAL | wc -1' -eq 0 ] then

log "Removing S{name} - part of an invalid checkpoint" ST /usr/bin/rm Stname} repstat 130

fi done

} # end CLEAN_CDMP

CLEAN_LST() { # Manage ${DMPDIR}/c*.1st files # Remove files no longer referenced or for invalid ckps

Is -It S{DMPDIR}/c*.lst I while read name do

ckpno='echo Siname} | rev | cut -f 1 -d "/" | rev | cut -c 2-5"

if [ 'grep ""Sckpno" Sjlist | wc -1' -eq 0 ] then

# This file is not referred to if [ Sckpno -le Slatest_referenced_ckpno ] then

log "Removing ${name} - not referred to" ST /usr/bin/rm $(name) repstat 14 0

fi elif [ 'grep ""Sckpno" Sjlist I grep VAL | wc -1' -eq 0 ] then

log "Removing S{name} - part of an invalid checkpoint" $T /usr/bin/rm S{name} repstat 150

fi done

) # end CLEAN_LST

CLEAN_DDMP() { # Manage ${DMPDlR}/d*.dmp files # Remove files no longer referenced or for invalid ckps

Is -It S{DMPDIR}/d*.dmp I while read name do

dmpno-'echo ${name} I rev | cut -f 1 -d "/" I rev | cut -c 2-8'

if { 'grep ""Sdmpno" $dmpckp_list | wc -1' -eq 0 ] then

# This file is not referred to if [ Sdmpno -le $latest_referenced_dmpno ] then

log "Removing S{name} - not referred to" ST /usr/bin/rm ${name} repstat 160

fi elif { 'grep "^Sdmpno" Sdmpckp_list | grep VAL | wc -1' -eq 0 } then

log "Removing S{name} - part of an invalid checkpoint" ST /usr/bin/rm S{name} repstat 170

fi done

} # end CLEAN_DDMP

CLEAK_JNL () { # Remove all journal files prior to (first after oldest required ckp) # This function must not be called before $oldest_jnlno_reqd is set

log "removing all gzipped jnl files prior to S{oldest_jnlno_reqd}" # To make this rerunnable we must check that each zipped file is prior to # Soldest_jnlno_reqd Is -It S{JNLDIR}/j*.jnl.gz | while read name do

jnlno^'echo ${name} | rev | cut -f 1 -d "/" | rev | cut -c 2-8"

if [ Sjnlno -It Soldest_jnlno_reqd ] then

# Journal file is not required log "Removing zipped jnl file S{name}" ST /usr/bin/rm ${name} checkstat 180

fi done

# gzip all remaining journal files # For this to be runnable at any time, we must not zip the latest file latest_jnl_file="ls -1 ${JNLDIR}/j*.jnl | sort | tail -1'

Is -It S{JNLDIR}/j*.]nl I while read name do

if { "S{name}" != "${latest_jnl_file}" ] then

log "gzipping jnl file ${name}" $T gzip S{name} checkstat 190

fi done

} # e n d CLEAN_JNL

################################################################# # MAIN #################################################################

# Define temp file stem temp_stem=/tmp/ckp_clean_

# Clean up temp files produced by previous runs of this script rm -f ${temp_stem}*

# Global Defaults - reset these below for individual installations or databases # if necessary

CKPLOC='ingprenv | grep "II_CHECKPOINT" | cut -f 2 -d =' DMPLOC='ingprenv | grep "II_DUMP" | cut -f 2 -d =' JNLLOC-'ingprenv j grep "II_JOURNAL" | cut -f 2 -d ='

CKP_KEEP_C0UNT=1 # ckp, ddmp (at ckp granularity) files

hostname^'hostname'

# Set defaults specific for hostname if [ "Shostname" = "prosrv02" ] then # Be careful with this! If enabled, this script will delete LATER # (unreferenced) checkpoints created on prorpt02. MANAGE_CKP_FILE S =N

else MANAGE„CKP_FILES=Y

fi

log "MANAGE_CKP_FILES= S{MANAGE_CKP_FILES}"

# Get list of databases and loop through it for dbname in 'cat $ING_LOCAL/ckp_clean_${hostname}_dblist' do infodb_file=S{temp_stem}Sidbname}_infodb jlist=${tempestem}${dbname}_jlist dlist=${temp_stem}S{dbname}_dlist keep_jlist=${temp_stem}S{dbname}_keep_jlist dmpckp_list=${temp_stem}S(dbname}_dmpckp_list file_ckpnos=${temp_stem}S{dbname)_file_ckpnos file_cdckpnos=${temp_stem}${dbname}_file_cdckpnos file_lsts=S{temp_stem}${dbname}_file_lsts file„ddmpnos=S{temp_stem}S{dbname}_file_ddmpnos

# Set database-specific parameters case S{dbname} in

"trisplus") CKP_KEEP_C0UNT=1 DATA_L0C_C0UNT=16

"iidbdb") CKP_KEEP_C0UNT=16 DATA_L0C_C0UNT=1

t r *) I I

esac

log "CKP_KEEP_C0UNT = SCKP_KEEP_COUNT"

log "DATA_LOC_COUNT = SDATA_LOC„COUNT"

# Identify managed directories CKPDIR=S{CKPL0C}/ingres/ckp/default/S{dbname} DMPDIR=${DMPLOC)/ingres/dmp/default/sidbname1 JNLDIR=S{JNLLOC)/Ingres/jnl/default/S{dbname}

CREATE_LISTS

CHECK_CKP_FILES

if [ "S{MANAGE_CKP_FILES}" = "Y" ] then

CLEAN_CKP fi

CLEAN_CDMP

CLEAN_LST

CLEAN_DDMP

CLEAN_JNL

log " — End of cleanup for database S{dbname} done