Db2 12 for z/OS and Asynchronous System Managed Structure ...

42
GSE UK Virtual Conference 2021 Virtually the best way to learn about Z Db2 12 for z/OS and Asynchronous System Managed Structure Duplexing: Update Mark Rader IBM November 2021 Session 2AP GSE UK Virtual Conference 2021 Virtually the best way to learn about Z

Transcript of Db2 12 for z/OS and Asynchronous System Managed Structure ...

Page 1: Db2 12 for z/OS and Asynchronous System Managed Structure ...

GSE UK Virtual Conference 2021Virtually the best way to learn about Z

Db2 12 for z/OS and Asynchronous System Managed Structure Duplexing: Update

Mark Rader

IBM

November 2021

Session 2AP

GSE UK Virtual Conference 2021Virtually the best way to learn about Z

Page 2: Db2 12 for z/OS and Asynchronous System Managed Structure ...

GSE UK Virtual Conference 2021Virtually the best way to learn about Z

GSE UK Conference 2021 Charity Raffle• The GSE UK Region team hope that you find this presentation and others that

follow useful and help to expand your knowledge of z Systems.

• Please consider showing your appreciation by kindly donating to our charities this year, Royal National Lifeboat Institution (RNLI) & Guide Dogs for the Blind. Then follow the link on your receipt to enter your receipt number & amount donated into the GSE Raffle. You will get a raffle entry for every pound donated.

• Follow the link below or scan the QR Code:

http://uk.virginmoneygiving.com/GuideShareEuropeUKRegion

GSE UK Virtual Conference 2021Virtually the best way to learn about Z

Page 3: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Agenda

Db2 data sharing and

coupling facility (CF)

structure duplexing

System managed

duplexing of Db2 lock

structure (LOCK1)

Asynchronous system

managed duplexing of

LOCK1 structure

Customer experiences

3

ICF ICFz/OS z/OS

DB2A DB2B

Lock1

SCA

21

CEC1 CEC2

Lock1

SCA

Page 4: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Parallel Sysplex basics for Db2 for z/OS

— Multiple Db2 members in a Db2 data sharing group

— Coupling facility (CF) structures used for high-speed sharing of locks, status information, and data (tables, indexes)

— ssnmIRLM allocates lock structure: dsngrpnm_LOCK1 (LOCK1)

— ssnmMSTR allocates shared communication area: dsngrpnm_SCA (SCA)

— ssnmDBM1 allocates group buffer pool (GBP) for each local BP with shared data

• dsngrpnm_GBPn (GBP0, GBP1, GBP2,…GBP8K0, …GBP16K0, …GBP32K, … )

—LOCK1 and SCA are required structures

• If Db2 cannot allocate them, Db2 fails

• All IRLMs required to rebuild LOCK1

• All Db2s required to rebuild SCA

© Copyright IBM Corporation 20214

Page 5: Db2 12 for z/OS and Asynchronous System Managed Structure ...

‘Original’ Parallel Sysplex configuration

— 2 central electronic complexes (CECs) with production LPARs

— 2 external Coupling Facilities (CFs)

— Total: 4 CECs

— SCA and LOCK1 isolated from MSTRs and IRLMs

— GBPs spread across CFs

— Failure of any single CEC tolerated

• Structures rebuilt to other CF by connectors

• Db2 member(s) restarted on other LPAR to release retained locks

© Copyright IBM Corporation 20215

Page 6: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Second Parallel Sysplex configuration

— 1 external CF

— 1 integrated CF (ICF)

— 3 CECs

— SCA and LOCK1 isolated from MSTR and IRLM

— GBPs spread across CFs

• Duplexed GBPso “User managed duplexing”

o User = Db2

• Primary GBPs spread across CFso Secondary GBP on opposite CF

© Copyright IBM Corporation 20216

Page 7: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Third Parallel Sysplex configuration

— 2 integrated CFs, 2 CECs

— SCA and LOCK1 not isolated from MSTR and IRLM

— Duplexed GBPs spread across CFs

— What if one CEC is lost?

— It depends…

— Could be loss of 1 Db2 member

— Could be loss of whole Db2 data sharing group

• Because loss of SCA or LOCK1 and one Db2 meansthe SCA or LOCK1 cannot be rebuilt

© Copyright IBM Corporation 20217

Page 8: Db2 12 for z/OS and Asynchronous System Managed Structure ...

— System managed duplexing

• 2 integrated CFs, 2 CECs

• SCA and LOCK1 duplexed (synchronous)

• Now: “synchronous system managed duplexing”, or synchronous SMD

— User managed duplexing

• Duplexed GBPs spread across CFs

— If either CEC fails

• Db2 on other CEC available

• Restart failed Db2 on other LPARo Release retained locks

— Increased cost of lock requests

• Performance and CPU costs

— New with Db2 12: asynchronous CF duplexingmitigates cost and performance

• Can also call this “asynchronous system managed duplexing”, or asynchronous SMD

Third Parallel Sysplex configuration: recommended

© Copyright IBM Corporation 20218

Page 9: Db2 12 for z/OS and Asynchronous System Managed Structure ...

System managed duplexing

9

Page 10: Db2 12 for z/OS and Asynchronous System Managed Structure ...

CF structure rebuild

— Rebuild

• Process by which sysplex allocates new instanceof CF structure, populates the structure with data,and proceeds to use the new structure instance

— Simplex rebuild

• Discards the old instance, uses the new

— Duplex rebuild

• Uses both instances, synchronizing updates so thestructure contents remain the same

© Copyright IBM Corporation 202110

Page 11: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Two ways to accomplish the rebuild

— User managed

• Can be tailored to application

• Significant programming effort (on part of exploiter, e.g. Db2)o Exploiter must coordinate process and propagate data

• Must be at least one active connector to do the work

• Variety of ways to propagate data to new structureo Copy from old structure

o Reconstruct from in-storage data

o Reconstruct from DASD and logs

o Start fresh with empty structure

— System managed

• General purpose

• Virtually no development costo System coordinates process and propagates data

• System does the work; can rebuild even if there are no connectors

• To propagate data to new structure, the system must have access to the old structure instance

© Copyright IBM Corporation 202111

Page 12: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Structure rebuild use cases

User managed duplex System managed duplex

Planned reconfigurationFailure recovery

Planned reconfigurationLimited/no use for failure recovery (must copy old structure)

Only for cache structuresDb2 group buffer pools

Improved availability

Robust failure recovery

User managed rebuild System managed rebuild

© Copyright IBM Corporation 202112

Page 13: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Cost factor estimates for synchronous System Managed duplexing

— Percentage of requests that get duplexed

• Cache 20% is typical (ranges 1% to 100%)

• List 100% (or close to it)

• Lock 100%

— Cost of duplexed request vs. simplex request

• z/OS CPU = 3x to 4x

• CF CPU = 4x to 5x

• CF Link = 6x to 8x

— Total impact to system would depend on particular structures, request frequency, read/write ratios

— Typical: significant increase in request service time relative to simplex (magnified by distance)

• 2nd order effects on workload difficult to predicto Queuing

o Contention

o Timeouts

o Includes impact on CFs themselves

[not Db2 GBPs]

Db2 SCA

Db2 LOCK1

Examples:

Highly visible

© Copyright IBM Corporation 202113

Page 14: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Synchronous System Managed CF structure duplexing

© Copyright IBM Corporation 2021

14z/OS

CF1 CF2

ExploiterXES

(split/merge)

1. Request in

2a. Split req out 2b. Split req out

3a+b. Exchange of RTE signals

4a+b. Exchange of RTC signals

5a. Response 5b. Response

6. Response outz/OS

The coupling facilities communicate to coordinate processing of the request so that both structure instances make the same update.

RTE – ready to executeRTC – ready to complete

Distanceimpacts?

14 © Copyright IBM Corporation 2021

Page 15: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Asynchronous system managed duplexing for LOCK1

15

Page 16: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Asynchronous System Managed CF structure duplexing

© Copyright IBM Corporation 202116

z/OS

CF1(seq#=n)

CF2(seq#<=n)

ExploiterXES

1. Request in

2. Req out

4. Response out

(returns primary seq#)

z/OS

5. Async mirroring of commands’ sequenced structure object updates (returns response indicating safe arrival/store, and returns LCOSN)

Query/syncpoint secondary OSN

(at commit points)

Knows both primaryand secondary OSNs

6. Ordered execution of sequenced structure updates

3. Response (returns OSN(s) and LCOSNP)

Distanceimpacts?

OSN= Operation Sequence Number

LCOSN = Last OSN completed by secondary

LCOSNP = LCOSN known to primary

Values known to each system are likely different.

Page 17: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Terminology distinction, as suggested by Frank Kyne

— Synchronous system managed duplexing

• Function available since 2001 as ‘system managed duplexing’

• ‘synchronous’ in this case is relative to the program making the lock request; it cannot continue processing until both instances of the lock structure are updated

• XES lock request to the CF may be synchronous or asynchronous relative to the host processor

o Synchronous system managed duplexing may still have asynchronous lock requests, especially in cases of distance between hosts and CF LPARs

• Short cut: ‘synchronous SMD’

— Asynchronous system managed duplexing

• Function introduced in 2016, first exploited by Db2 LOCK1 structure

• ‘asynchronous’ is relative to the program making the lock request; the program can continue processing while the secondary instance of the lock structure is updated

• XES lock request to the CF may be synchronous or asynchronous relative to the host processor

o Asynchronous system managed duplexing may involve synchronous our asynchronous lock requests

• Short cut: ‘asynchronous SMD’

© Copyright IBM Corporation 202117

Page 18: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Asynchronously duplexed structure may require “sync up” since secondary instance normally lags the primary

— z/OS may need to make sure the secondary instance gets caught up before it can allow traditional failure processing to proceed

— Each system maintains a Secondary Update Recovery Table (SURT) to log local in-flight updates not yet known to have been hardened in the secondary structure instance

— If sync up is needed, a sysplex wide coordinator is nominated to:

• Gather the logs (SURTs) and use them to…

• Reconstruct the final result of uncommitted in-flight requests, and

• Update secondary instance to match the reconstructed results

— Connectors:

• Might have requests held/delayed until sync up is completed

• Might need to back out uncommitted transactions related to requests with ADupReqSeqNum values higher than the highest hardened request

Is it really worth all this effort? …Applies when “lose” SURT for some connector

Failover for async duplex might be

a little slower than for sync duplex

© Copyright IBM Corporation 202118

Page 19: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Near simplex service times!

Short = 150m

Long = 10km

e.g. 5 µsec -> 5.75 µsec

© Copyright IBM Corporation 202119

“Down” is comparing

the ‘orange’ Sync SMD

service time with the

‘gray’ Async SMD

service time

Page 20: Db2 12 for z/OS and Asynchronous System Managed Structure ...

With less overhead!

Up is good!

Short = 150m

Long=10km

© Copyright IBM Corporation 202120

“Up” is comparing

the ‘orange’ Sync SMD

throughput with the

‘green’ Async SMD

throughput

Page 21: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Lock structure duplexing is now practical in more situations

— Performance is very similar to simplex operation

• Even at distance !

— With all the benefits of robust duplex failure recovery

• Though failover is a bit more work since secondary is not identical copy

• And recovery process is not necessarily transparent to exploiter

— Exploiter participation?

• Not needed for lock structures without record data if connectors support system managed processes (XES can do it all)

• Is needed for structures with record datao Exploiter must be prepared to roll back a transaction initiated by a failed connector that could not be committed

because updates are missing in the surviving secondary

o Db2 LOCK1 has record data, so Db2 must participate as exploiter

o Db2 will not write the log output buffer until the corresponding lock information is updated in the secondary structure

© Copyright IBM Corporation 202121

Page 22: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Requirements for asynchronous duplexing of CF lock structures

— At least two peer connected coupling facilities

• CFLEVEL=21 minimum service level 02.16, or CFLEVEL=22 or higher; realistically CFCC 23

• z13 GA2+, z13s, z14, z14 ZR1, z15

• If z14-GA2, need driver 36

• z13 should also have driver update

— z/OS V2.2 with APARs:

• OA47796, OA49148, OA51945, OA52015, OA52618 (minimum)

— z/OS V2.3 with APARs:

• OA52618 (minimum)

— Db2® 12 with enabling APAR PI66689

— IRLM 2.3 with APAR PI68378

— All systems in the Sysplex must be capable of using async duplex protocols

* Note: additional APARs may be required © Copyright IBM Corporation 202122

Page 23: Db2 12 for z/OS and Asynchronous System Managed Structure ...

RMF: CF Usage Summary

STATUS – Appending ‘A’ indicates that the structure is async duplexed(for this case, secondary structure instance)

© Copyright IBM Corporation 202123

Page 24: Db2 12 for z/OS and Asynchronous System Managed Structure ...

RMF: Report of Coupling Facility Structure Activity

© Copyright IBM Corporation 202124

============================================================

====================

Page 25: Db2 12 for z/OS and Asynchronous System Managed Structure ...

RMF: Report of Coupling Facility Structure Activity (continued)

Things to note in the reports (prior slide):

— Big difference in number of requests to each lock structure instance

• Primary instance is target for the actual lock requests

• Secondary is target for “inquiries” as to progress

— Response times are different

• Locks vs “inquiries”

• The requests have very different natures

© Copyright IBM Corporation 202125

Page 26: Db2 12 for z/OS and Asynchronous System Managed Structure ...

RMF: Asynchronous CF Duplexing Summary

— New section in the RMF CF Activity Report

• Values only reported for the secondary structure

• Customer example follows

— Response times are different

• Locks vs “inquiries”

• The requests have very different natures

© Copyright IBM Corporation 202126

Prod_LOCK1

Page 27: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Customer experiences

27

Page 28: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Customer experiences

European bank:

synchronous SMD to

asynchronous SMD

North American bank:

simplex (no duplexing) to

asynchronous SMD

North American

financial institution:

synchronous SMD to

asynchronous SMD

Note: there are more

successful customer

experiences;

those included here

shared performance data

or charts

28

Page 29: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Customer experience

— European bank

• 2 sites, 2-3 km apart

• Had deployed System Managed duplexing except on first workday of each montho Too expensive then

• Now use async CF duplexing on all days of the month

© Copyright IBM Corporation 202129

Page 30: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Customer experience

— North American bank (1|2)

• Had 2-CEC and 2-ICF configuration, and was exposed to ‘double failure’ situation; chose not to use synchronous SMD (although did test it)

• Implemented asynchronous SMD in May of 2019

LPAR1 to CF LPAR1 to CF

© Copyright IBM Corporation 202130

Page 31: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Customer experience

— North American bank (2|2)

• Had 2-CEC and 2-ICF configuration, and was exposed to ‘double failure’ situation; chose not to use synchronous SMD (although did test it)

• Implemented asynchronous SMD earlier 2019

LPAR2 to CF LPAR2 to CF

© Copyright IBM Corporation 202131

Page 32: Db2 12 for z/OS and Asynchronous System Managed Structure ...

— North American financial institution (1|3) – synchronous SMD

• Has all-ICF configuration; used synchronous SMD and tolerated high lock request service time

• Synchronousrequests

• Service time andrequests almostidentical

• Variability may be due to other CF factors

• Service time much higherthan simplex mode; expect5 µsec or lessif simplex

• Requests peak at about 26 M per 15 minuteRMF interval

Customer experience

© Copyright IBM Corporation 202132

Prod_LOCK1

Prod_LOCK1

CFP1

CFP2

Page 33: Db2 12 for z/OS and Asynchronous System Managed Structure ...

— North American financial institution (2|3) – synchronous SMD

• Has all-ICF configuration; used synchronous SMD and tolerated high lock request service time

• Asynchronousrequests

• Service time andrequests almostidentical

• Async timeabout 2xsync time

• Requestspeak at 86 Mper RMFinterval(over 3x higher than sync)

Customer experience

© Copyright IBM Corporation 202133

Prod_LOCK1

Prod_LOCK1

CFP1

CFP2

Page 34: Db2 12 for z/OS and Asynchronous System Managed Structure ...

— North American financial institution (3|3) – asynchronous SMD

• Implemented asynchronous SMD in 2020

• Async SMD

• Sync service timecloser to simplex

• Peak 6.6 µsec

• Requests peak at 15 M per

RMF interval

• Major difference on secondary lockstructure

Customer experience

© Copyright IBM Corporation 202134

Prod_LOCK1

Prod_LOCK1

CFP1

CFP1

Page 35: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Are you a candidate for async CF duplexing?

— Are you System Managed duplexing your LOCK1 structure?

• Should you be?

• Are you exposed to the availability risk of not duplexing your LOCK1 and SCA?

— If you are a candidate for asynchronous CF duplexing, please let me know.

• This technology can be very helpful, and there are a number of installations that should be doing this.

© Copyright IBM Corporation 202135

Page 36: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Session summary

— Asynchronous CF duplexing for lock structures provides

• Robust failure recovery

• Simplex-like response time

• Even at distance

— Request: are you interested in deploying async systems managed duplexing?

• I am interested in working with you.

• Please consider sharing your data; I will anonymize it…

© Copyright IBM Corporation 202136

Page 37: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Acknowledgements

— Among many others, including the customers who have shared their experiences, this presentation would not be possible without the contributions of:

— Mark Brooks - IBM Poughkeepsie

— Frank Kyne - Watson and Walker

— Dave Surman - IBM Poughkeepsie

© Copyright IBM Corporation 202137

Page 38: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Time for Q & A

38

Page 39: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Notices and disclaimers

— © 2021 International Business Machines Corporation. No part of this

document may be reproduced or transmitted in any form without

written permission from IBM.

—U.S. Government Users Restricted Rights — use, duplication or

disclosure restricted by GSA ADP Schedule Contract with IBM.

—Information in these presentations (including information relating to

products that have not yet been announced by IBM) has been reviewed for

accuracy as of the date of initial publication and could include unintentional

technical or typographical errors. IBM shall have no responsibility to update

this information. This document is distributed “as is” without any

warranty, either express or implied. In no event, shall IBM be liable for

any damage arising from the use of this information, including but not

limited to, loss of data, business interruption, loss of profit or loss of

opportunity. IBM products and services are warranted per the terms and

conditions of the agreements under which they are provided.

—IBM products are manufactured from new parts or new and used parts.

In some cases, a product may not be new and may have been previously

installed. Regardless, our warranty terms apply.”

—Any statements regarding IBM's future direction, intent or product

plans are subject to change or withdrawal without notice.

—Performance data contained herein was generally obtained in a

controlled, isolated environments. Customer examples are presented as

illustrations of how those

—customers have used IBM products and the results they may have

achieved. Actual performance, cost, savings or other results in other

operating environments may vary.

—References in this document to IBM products, programs, or

services does not imply that IBM intends to make such products, programs

or services available in all countries in which IBM operates or does

business.

—Workshops, sessions and associated materials may have been

prepared by independent session speakers, and do not necessarily reflect

the views of IBM. All materials and discussions are provided for

informational purposes only, and are neither intended to, nor shall

constitute legal or other guidance or advice to any individual participant or

their specific situation.

—It is the customer’s responsibility to insure its own compliance with legal

requirements and to obtain advice of competent legal counsel as to

the identification and interpretation of any relevant laws and regulatory

requirements that may affect the customer’s business and any actions the

customer may need to take to comply with such laws. IBM does not

provide legal advice or represent or warrant that its services or products

will ensure that the customer follows any law.

39

Page 40: Db2 12 for z/OS and Asynchronous System Managed Structure ...

Notices and disclaimers

—Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products about this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. IBM does not warrant the quality of any third-party products, or the ability of any such third-party products to interoperate with IBM’s products. IBM expressly disclaims all warranties, expressed or implied, including but not limited to, the implied warranties of merchantability and fitness for a purpose.—The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents, copyrights, trademarks or other intellectual property right.

—IBM, the IBM logo, ibm.com and [names of other referenced IBM products and services used in the presentation] are trademarks of International Business Machines Corporation, registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at: www.ibm.com/legal/copytrade.shtml

40

Page 41: Db2 12 for z/OS and Asynchronous System Managed Structure ...

GSE UK Virtual Conference 2021Virtually the best way to learn about Z

GSE UK Virtual Conference 2021Virtually the best way to learn about Z

Please submit your session feedback!

• Do it online at https://conferences.gse.org.uk/2021/feedback/2AP

• This session is 2AP

Page 42: Db2 12 for z/OS and Asynchronous System Managed Structure ...

GSE UK Virtual Conference 2021Virtually the best way to learn about Z

Become a member of GSE UK

• Company or individual membership available

• Benefits include:

• GSE Annual Conference: Receive 5 free places + 2 free places for trainees

• 20% discount on fees for IBM Technical Conferences

• 20% on IBM Training Courses in Europe

• 15% discount for IBM STG Technical Conferences in the USA

• 20% discount on the fee for taking the Mainframe Technology Professional (MTP) exams

• European events – via GSE HQ

• Contact [email protected] for details

GSE UK Virtual Conference 2021Virtually the best way to learn about Z