UKOUG Tech15 - Overheads of RAC?

54
Our Awards: Zahid Anwar, Senior Oracle DBA Consultant, Version 1 6th December 2015 UKOUG Overheads of RAC?

Transcript of UKOUG Tech15 - Overheads of RAC?

Page 1: UKOUG Tech15 - Overheads of RAC?

Our Awards:Our Awards:

Zahid Anwar, Senior Oracle DBA Consultant, Version 1

6th December 2015

UKOUG

Overheads of RAC?

Page 2: UKOUG Tech15 - Overheads of RAC?

Introducing Version 1

A values-driven organisation that aims to prove that IT

can make a real difference to our clients’

businesses

IT Transformation to deliver business benefit in a

cost efficient manner through our service excellence,

innovation and service improvement

A circa. €75/£53m, 700-strong business with bases

across the UK and Ireland.

Broader and better than niche players, better

service than global players, nearshore and onshore

rather than offshore; values-driven approach to

delivering trusted advice to customers

Page 3: UKOUG Tech15 - Overheads of RAC?

Enterprise Architecture & Change

Microsoft Solutions

Oracle Solutions

Development Technologies and Services

Business Intelligence & Analytics

Infrastructure & Cloud Services

Licence Management

Section 1 - Introducing Version 1<Insert name>4 Main Sectors

3 Key Technology Specialisations 2 Delivery Practices

7 Areas of Deep Expertise

Business Solutions

Managed Services

About Version 1

26%

21%35%

18% Commercial

Financial

Public

Utilities

Page 4: UKOUG Tech15 - Overheads of RAC?

Our History

Page 5: UKOUG Tech15 - Overheads of RAC?

A bit about myself

• Senior Oracle DBA Consultant (10+ years experience)

• An Oracle Certified Master, 2nd in Version 1

• Aspiring to become an Oracle Ace

• Oracle 10g &11g Certified RAC Expert and 11gR2 RAC and GI Certified Expert

• Exadata and ODA Specialist

• Follow me on:

– facebook.ZedDBA.co.uk (blog)

– twitter.ZedDBA.co.uk or @ZedDBA

– LinkedIn.ZedDBA.co.uk

– www.ZedDBA.co.uk (coming soon!)

Page 6: UKOUG Tech15 - Overheads of RAC?

What is RAC?

• Instance

– Comprises of Oracle related Memory and OS

Processes on a server

• Database

– Consists of a collection of data files, control files

and redo logs located on disk

Server

Instance

Database

Page 7: UKOUG Tech15 - Overheads of RAC?

What is RAC?

• Real Application Clusters (RAC)

– Allows multiple instances to run on separate servers (nodes) concurrently

accessing a single database

– The single database is placed on shared storage accessible to all nodes

– Instances communicate over an Interconnect network

Node 1

Instance 1

Database

Node 2

Instance 2Interconnect

Shared Storage

Page 8: UKOUG Tech15 - Overheads of RAC?

Why use RAC?

• The main aim of RAC is to implement a clustered database to provide:

– Increased availability/resilience

– Increased scalability

– Improved maintainability

– Reduction in total cost of ownership

• Commodity Hardware

• Consolidation Platform

• Need to take Oracle RAC licenses into consideration

Page 9: UKOUG Tech15 - Overheads of RAC?

Cache Fusion

• Cache Fusion

– Allows Oracle RAC to “fuse” the in-memory data cached (physically separate) on

each node into a single Global Cache (GC)

– Through a set of dedicated RAC background processes

– Using the Interconnect for communicating GC messages and for transferring data

blocks

Page 10: UKOUG Tech15 - Overheads of RAC?

Access Times

• Locally in the Instance Local Cache

– Access time: nanoseconds (ns) 1,000,000,000th (billionth) of a second

• Remote in another Instance Cache (Global Cache)

– Access time: microseconds (μs) 1,000,000th (millionth) of a second

• On Disk

– Access time: milliseconds (ms) 1,000th (thousandth) of a second (spinning disks)

– Access time: microseconds (μs) 1,000,000th of a second (SSD/Flash)

– Access time: microseconds (μs) 1,000,000th of a second (NV RAM)

Page 11: UKOUG Tech15 - Overheads of RAC?

RAC Related Terminology

• Resource

– An object, where access is controlled at instance level

• Global Resource

– An object, where access is controlled at cluster level

• Enqueue

– Serialises local access to a resource

• Gobal Enqueue

– Serialises global access to a resource

Page 12: UKOUG Tech15 - Overheads of RAC?

RAC Services

• Global Cache Service (GCS)

– Implements Cache Fusion

– Coordinates access privileges to database blocks for instances

– Responsible for block transfers between instances

– Guarantees the data integrity by employing global access levels

• Global Enqueue Service (GES)

– Performs concurrency control (locks) on dictionary cache, library cache and the transaction

– Performs deadlock detection

• Global Resource Directory (GRD)

– Records the owner of each resource and it’s current state

– Distributed across all instances

– Maintained by GCS and GES

Page 13: UKOUG Tech15 - Overheads of RAC?

RAC Background Processes

• Each RAC instance has the standard set of background processes:

– PMON

– SMON

– LGWR

– DBWn

– ARCn

• Additional background processes to support Global Cache Service and Global Enqueue Service:

– LMSn

– LMD0

– LCK0

– LMON

– DIAG

Page 14: UKOUG Tech15 - Overheads of RAC?

RAC Background Processes

• LMSn

– Global Cache Service Process (Cache Fusion)

– Manages resources and provides resource control among Oracle RAC

instances

– Up to 36 LMSn processes, where n is 0-9 or a-z

– Maintains a lock database for Global Cache Service (GCS) and buffer

cache resources

– This process receives, processes, and sends GCS requests, block

transfers, and other GCS-related messages

Page 15: UKOUG Tech15 - Overheads of RAC?

RAC Background Processes

• LMD0

– Global Enqueue Service Daemon 0 Process

– One LMD0 process per instance

– Manages incoming remote resource requests from other instances

– LMD0 processes enqueue resources managed under Global Enqueue

Service

– In particular, LMD0 processes incoming enqueue request messages and

controls access to global enqueues

– It also performs distributed deadlock detections

Page 16: UKOUG Tech15 - Overheads of RAC?

RAC Background Processes

• LCK0

– Instance Enqueue Background Process

– One LCK0 process per instance

– Assists LMSn processes

– Manages global enqueue requests and cross-instance broadcasts

– The process handles all requests for resources other than data blocks

• For examples, LCK0 manages library and row cache requests

Page 17: UKOUG Tech15 - Overheads of RAC?

RAC Background Processes

• LMON

– Global Enqueue Service Monitor Process

– One LMON process per instance

– Monitors an Oracle RAC cluster to manage global resources

– LMON maintains instance membership within Oracle RAC

– The process detects instance transitions and performs reconfiguration of

GES and GCS resources

Page 18: UKOUG Tech15 - Overheads of RAC?

RAC Background Processes

• DIAG

– Diagnostic Capture Process

– Performs diagnostic dumps

– DIAG performs diagnostic dumps requested by other processes and

dumps triggered by process or instance termination

– In Oracle RAC, DIAG performs global diagnostic dumps requested by

remote instances

Page 19: UKOUG Tech15 - Overheads of RAC?

Block Mastering

• In RAC, every data block is mastered by an instance

– Keep track of the state of the block maintained in Global Resource Directory (GRD)

– Mastered in block ranges (128 blocks since 10g)

– Block ranges are uniformly mastered between instances so that Global Cache

grants are evenly distributed across all instances

Instance 1 Instance 2 Instance 3

000,001 -

128,000128,001 –

256,000

256,001 –

384,000

File 1, blocks

000,001 -384,000

Page 20: UKOUG Tech15 - Overheads of RAC?

Global Cache Examples

Page 21: UKOUG Tech15 - Overheads of RAC?

Global Cache Example: Read From Disk

Resource Master

Instance 1 Instance 2 Instance 3

Instance 2 requests shared read on block

1. Request to

obtain a

Shared

Resource

2. Request is

Granted

3. Read

Request

4. Block

Delivered

SCN 1000SCN 1000

Page 22: UKOUG Tech15 - Overheads of RAC?

Global Cache Example: Read to Write

Resource Master

Instance 1 Instance 2 Instance 3

Instance 3 requests exclusive read on block

1. Request to obtain an Exclusive Resource

2. Instruct to

transfer block

for exclusive

access

SCN 1000

SCN 10003. Transfer

block

4. Update Resource Master with Resource Status

SCN 1001

Page 23: UKOUG Tech15 - Overheads of RAC?

4. Update

Resource

Master with

Resource

Status

Global Cache Example: Write to Write

Resource Master

Instance 1 Instance 2 Instance 3

Instance 2 requests exclusive read on block

1. Request to obtain an Exclusive Resource

SCN 1000

SCN 10013. Transfer

block *

2. Instruct to transfer block for exclusive access

SCN 1001SCN 1002

* The instance will create a

Past Image of the dirty block

before transferring. This is to

reduce recovery times upon

instance failure.

Page 24: UKOUG Tech15 - Overheads of RAC?

Global Cache Waits

Page 25: UKOUG Tech15 - Overheads of RAC?

Global Cache Waits: Active Session History (ASH)

• Block-Related Wait Events

– gc current block 2-way

– gc current block 3-way

– gc cr block 2-way

– gc cr block 3-way

• Wait event indicates that a block arrived from resource master (2-way) or

from another instance instructed by resource master (3-way)

current block = the first time a block is read into buffer

cr block (consistent read) = subsequently, when a block transferred to another instance

Page 26: UKOUG Tech15 - Overheads of RAC?

Global Cache Waits: Active Session History (ASH)

• Message-Related Wait Events

– gc current grant 2-way

– gc cr grant 2-way

• Wait event indicates that no block was received as it wasn’t cached in any

instance, instead a grant was given to read from disk or modify it

current block = the first time a block is read into buffer

cr block (consistent read) = subsequently, when a block transferred to another instance

Page 27: UKOUG Tech15 - Overheads of RAC?

Global Cache Waits: Active Session History (ASH)

• Contention-Related Wait Events

– gc current block busy

– gc cr block busy

– gc buffer busy acquire/release

• Wait event due to a hot block therefore could not be shipped immediately;

normally due to remote log flush, high concurrency or already requested block

current block = the first time a block is read into buffer

cr block (consistent read) = subsequently, when a block transferred to another instance

Page 28: UKOUG Tech15 - Overheads of RAC?

Global Cache Waits: Active Session History (ASH)

• Load-Related Wait Events

– gc current block congested

– gc cr block congested

• Wait event due to High Load, CPU saturation or High Interconnect Traffic

current block = the first time a block is read into buffer

cr block (consistent read) = subsequently, when a block transferred to another instance

Page 29: UKOUG Tech15 - Overheads of RAC?

Global Cache Waits: AWR Report

• Global Cache Waits can be observed in AWR:

Page 30: UKOUG Tech15 - Overheads of RAC?

Global Cache Waits: AWR Report

• Global Cache Load Profile & Efficiency Percentage:

Page 31: UKOUG Tech15 - Overheads of RAC?

Global Cache Waits: AWR Report

• GCS and GES Statistics:

Page 32: UKOUG Tech15 - Overheads of RAC?

Global Cache Waits: Oracle Enterprise Manager

• Cluster Cache Coherency:

Page 33: UKOUG Tech15 - Overheads of RAC?

Global Cache Waits: Oracle Enterprise Manager

• Cluster Cache Coherency:

Page 34: UKOUG Tech15 - Overheads of RAC?

Scalability

• Scalability is the relationship between workload and resources at increased

increments

• For RAC, resources are increased by adding nodes

• Scalability can be:

– linear – direct relationship between workload and resources

– non-linear – more resources are required with increased workload

Resource

Wo

rklo

ad

linear

non-linear

Page 35: UKOUG Tech15 - Overheads of RAC?

Scalability

• With RAC overheads:

– Global Cache Service (Blocks)

– Global Enqueue Service (Locks)

• It is impossible to achieve linear scalability

• General observation, 10% RAC overhead per instance (scale factor of 1.8)

– Overheads do decrease with more instances as the GCS workload is

more evenly distributed across the cluster

Page 36: UKOUG Tech15 - Overheads of RAC?

Demo

Page 37: UKOUG Tech15 - Overheads of RAC?

Dynamic Remastering

• New Feature since 10gR1, improved in 10gR2 and further enhanced in 11g

• When an object is accessed by an instance frequently, then that instance

becomes the master of the object

– Reduces GC grants and block transfers

• The view V$GCSPFMASTER_INFO shows objects that have been

remastered, info also available in AWR report

Page 38: UKOUG Tech15 - Overheads of RAC?

Dynamic Remaster

Demo

Page 39: UKOUG Tech15 - Overheads of RAC?

Reducing Global Cache Waits Further

• Partitioning Workloads

– Partitions Workloads by applications using Database Services

– This means common data will be accessed within a given instance or

isolated to a particular instance

– Reduces Remote Global Cache Requests

• Partitioning Data

– Distribute data using partitions and accessed using Database Services

Page 40: UKOUG Tech15 - Overheads of RAC?

Reducing Global Cache Waits Further

• Minimise Lock Usage

– Avoid unnecessary parsing

– Increase Shared Pool size

– Bind variables

– Cursor sharing

Page 41: UKOUG Tech15 - Overheads of RAC?

Reducing Global Cache Waits Further

• Use Automatic Segment Space Management (ASSM) – MUST

– Eliminates old linked freelists and replaces them with bitmap freelists

– Performs much faster and scales better

– Hence Oracle recommend ASSM for RAC

• Increase Sequence Caches

– With No Order if possible

Page 42: UKOUG Tech15 - Overheads of RAC?

Reducing Global Cache Waits Further

• Write Contention

– Write “hot spots” due to frequent changes to same data blocks across all

instances

– Other instances request blocks that are being changed

– Blocks can’t be transferred until pending redo is flushed to redo logs

– Latency for deferred block transfer becomes dependant on the log write

– Avoid write “hot spots” using Data Partitioning and Database Services

Page 43: UKOUG Tech15 - Overheads of RAC?

Reducing Global Cache Waits Further

• Write Contention Continued…

– Place redo logs on fast storage i.e. SSD and separate disks from other I/O

busy disks

– 99% of write “hot spots” are due to Indexes, therefore:

• Use Global Hash Partitioned Indexes

• Use Locally Partitioned Indexes

• Drop Unused Indexes

Page 44: UKOUG Tech15 - Overheads of RAC?

Cache Fusion Accelerator 12c

• New in 12.1.0.2

– OS kernel (Linux & Solaris only) module that can respond directly to

certain lock requests via RDSv3

– Lock state saved in memory shared by the database and the kernel

– Saves user/kernel context switches, frees up CPU cycles in LMS and

speeds up messages

– Will be incorporated into Engineered Systems

– Improve scalability, bridging the gap between linear and non-linear

– I’ve not tested this YET!

Page 45: UKOUG Tech15 - Overheads of RAC?

Other Performance Tips

• Recovery

– RECOVERY_PARALLELISM – parallel instance recovery

– FAST_START_PARALLEL_ROLLBACK – parallel recovery of a

terminated transaction

• Redo Size

– Size appropriately so to avoid aggressive checkpointing

– Set FAST_START_MTTR_TARGET to a reasonable value, so to

balance aggressive writing of dirty blocks to disk versus longer recovery

times

Page 46: UKOUG Tech15 - Overheads of RAC?

Other Performance Tips

• Ensure cluster is reasonably balanced

– Load Balancing

– Database Services

– Still balanced after a node failure

• Parallel Query

– May increase Global Cache waits but spreads the load across the

cluster thereby increasing performance of queries

– Useful for Large Full Table Scans or DML

Page 47: UKOUG Tech15 - Overheads of RAC?

Vertical and Horizontal Scaling

• Vertical Scaling (add more CPU and/or Memory)

– Pros:

• Avoids Global Cache Waits by using Local Cache instead of Global Cache

• Simpler to manage a Large Single Instance

Page 48: UKOUG Tech15 - Overheads of RAC?

Vertical and Horizontal Scaling

• Vertical Scaling (add more CPU and/or Memory)

– Cons:

• Limitation on adding more CPU and Memory

• No High Availability

• Can become very expensive

Page 49: UKOUG Tech15 - Overheads of RAC?

Vertical and Horizontal Scaling

• Horizontal Scaling (add more nodes, scale out)

– Pros:

• Gets around max CPU and/or Memory limitations

• High Availability

• Highly Scalable

– 100 nodes in 11g

– 64 hub nodes in 12c, with unlimited leaf nodes (FlexCluster)

• Increases network bandwidth to storage across the cluster

• Can use cheaper commodity servers

• RAC Rollable Patching

Page 50: UKOUG Tech15 - Overheads of RAC?

Vertical and Horizontal Scaling

• Horizontal Scaling (add more nodes, scale out)

– Cons:

• Complexity

• Overcapacity to survive failover

• Additional skillset required

• Increases maintenance (can decrease if used as consolidation platform)

• Takes longer to patch the cluster the larger it gets

• Overheads

Page 51: UKOUG Tech15 - Overheads of RAC?

Summary

• RAC does what it says on the tin (High Availability and Highly Scalable)

– Doesn’t come for free (some overheads)

– Near Linear Scalability

– Regardless of the number of instances, the maximum number of

instances involved in a block request is 3 (2-way or 3-way grant/block

transfer)

– Gives Maximum Scalability

– RAC IS STILL GREAT

Page 52: UKOUG Tech15 - Overheads of RAC?

References

• A Rough Guide to RAC - Julian Dyke:

http://www.juliandyke.com/Presentations/ARoughGuideToRAC.ppt

• Inside RAC - Julian Dyke:

http://www.juliandyke.com/Presentations/InsideRAC.ppt

• Oracle Database Online Documentation 11g Release 2 (11.2) - Background Processes:

http://docs.oracle.com/cd/E11882_01/server.112/e40402/bgprocesses.htm#REFRN104

• Monitoring Performance:

http://docs.oracle.com/cd/E11882_01/rac.112/e41960/monitor.htm#RACAD986

• Descriptions of Wait Events:

http://docs.oracle.com/cd/E11882_01/server.112/e40402/waitevents003.htm#BGGIBDJI

Page 53: UKOUG Tech15 - Overheads of RAC?

References

• Oracle RAC Wait Event Tuning:

• http://www.dba-oracle.com/t_rac_wait_event_tuning.htm

• RAC object remastering (Dynamic remastering):

https://orainternals.wordpress.com/2010/03/25/rac-object-remastering-dynamic-remastering/

• Oracle RAC Internals - The Cache Fusion Edition:

http://www.slideshare.net/MarkusMichalewicz/oracle-rac-internals-the-cache-fusion-edition

• Oracle 10G RAC Scalability – Lessons Learned:

https://www.toadworld.com/platforms/oracle/w/wiki/403.oracle-10g-rac-scalability-lessons-learned

• Ten Vital Tips for Oracle RAC Performance:

http://www.slideshare.net/ZekeriyaBesiroglu/oracle-rac-performance-tunning-tipstricks

Page 54: UKOUG Tech15 - Overheads of RAC?

Thank you for listening!

[email protected]

– facebook.ZedDBA.co.uk (blog)

– twitter.ZedDBA.co.uk or @ZedDBA

– LinkedIn.ZedDBA.co.uk

– www.ZedDBA.co.uk (coming soon!)