Oracle Database 12c Best Practices for Data Availability ... · Oracle Data blocks have a...
Transcript of Oracle Database 12c Best Practices for Data Availability ... · Oracle Data blocks have a...
Oracle Database 12c Best Practices for Data Availability and Disaster Protection
Larry M. Carpenter
Master Product Manager
Oracle High Availability Systems
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 2
Agenda
High Availability (HA) Business Challenge
Oracle Maximum Availability Architecture (MAA)
Oracle MAA Reference Architectures
Customer Deployments
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 4
Is This Better?
What if the second engine isn’t started until after the first one fails?
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 5
Reduce cost and increase
return on investment
HA Business Challenges
Eliminate risk of
downtime and data loss
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 6
Generic volume manager
& file system… Generic
backup software
Generic cold
failover cluster
Failover Server Production Server
Identical storage
Idle DR
Storage mirroring
Old School HA Generic Cold Failover Cold Start = High Risk
Idle Assets = High Cost
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 7
Battle Scars
8-day outage http://www.computerworld.com/
s/article/9182159/American_Ea
gle_Outfitters_learns_a_painful
_service_provider_lesson
American Eagle Outfitters - retail
– Disk failure, followed by mirrored disk failure. Restore from
local backup failed. Restore using copy at DR site also failed
5-day outage http://www.computerworld.com/s/
article/9182719/Update_Virginia_
s_IT_outage_continues_3_agenc
ies_still_affected
State of Virginia - government
– SAN memory failure, problem mirrored to standby SAN
Tieto - cloud infrastructure provider in Sweden
– Storage array failed, unable to read tape backups used for DR
5-day outage http://www.channelregister.co.
uk/2012/01/16/tieto_vnx5700/
Examples Where HA Infrastructure and Processes Failed
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 8
Enterprises Need a Better Approach to HA
Requirement Solution Profile
Protect from outages Any type, anywhere
Reduce recovery time Zero, seconds or minutes
Prevent data loss Zero or seconds
Minimize risk Continuous validation, test whenever
Eliminate complexity Simpler, pre-integrated
Increase ROI Reduce cost, utilize all assets
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 9
Agenda
High Availability (HA) Business Challenge
Oracle Maximum Availability Architecture (MAA)
Oracle MAA Reference Architectures
Customer Deployments
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 10
Oracle Maximum Availability Architecture (MAA)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 11
Oracle Maximum Availability Architecture (MAA)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 12
Oracle MAA Design Principles Eliminate Risk and Increase Return on Investment
Data Protection at Every Level
Strong Fault Isolation
Active HA/DR
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 13
Principle #1: Data Protection at Every Level
Oracle Data blocks have a well-defined structure
– Block header is kept
consistent with payload
– Enables validation of
both physical and logical
intra-block consistency
Oracle ensures block validity is
maintained as it traverses I/O path
– Extensive corruption checks
Oracle-Aware Data Validation
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 14
System Memory (SGA)
Oracle
Database
Architecture To Standby
Databases
TCP/IP
• Better performance since no disk I/O
• Better isolation from lower layer faults
• Better network utilization: only redo sent
• Transactional consistency: always
• Corrupted blocks auto-repaired
• Database-integrated application failover
Principle #2: Strong Fault Isolation
Data Guard transmits redo blocks directly from SGA:
like a memcpy over the network
Redo received / applied by running Oracle instance:
continuous Oracle-integrated data validation
Oracle-Aware Database Replication
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 15
Why Not Use Storage Remote Mirroring? Inadequate Protection for Mission Critical Oracle Databases
“…(storage uses) a remote mirroring model…any potential
data corruption would be copied faithfully and expeditiously
to the other side”
VP Global Marketing of a Leading Storage Company
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 16
Principle #3: Active HA/DR
All components active
– Servers
– Storage
– Remote sites
Easy scale-out
– Add capacity online
Rolling maintenance
Best recovery time: already hot
Least risk: you know it is working
Oracle-Aware Active Clustering and Offload to DR Systems for High ROI
Secondary Site Production Site
LAN/WAN
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 17
Agenda
High Availability (HA) Business Challenge
Oracle Maximum Availability Architecture (MAA)
Oracle MAA Reference Architectures
Customer Deployments
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 18
How to Apply MAA Principles
Assess impact of downtime
and data loss
Define service level objectives
– Recovery time (RTO): how long
can you afford to be down
– Recovery Point (RPO) how much
data can you afford to lose
– Performance – pre and post failure
Begin with a Business Impact Analysis
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 19
General Approach
Reduce cost
Reduce risk
Consolidate Standardize Simplify
Set of reference
HA architectures
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 20
Business Impact Analysis
Differentiate between critical and non-critical functions
– What is the cost of downtime and data loss
Heavily influenced by:
– Cost of implementing high availability
– The end customer, internal or external
– Regulatory compliance
Leads to tiering data and applications with regard to priority
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 21
Oracle Database 12c MAA
Real database consolidation - high density and manage as one
Enterprise-scale backup and recovery
Zero data loss at any distance
Global service management and real transparent application failover
Not Your Opa’s Maximum Availability Architecture
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 22
Oracle Database 12c MAA Three Standard Reference Architectures
BRONZE Minutes to days of downtime
Data protected as of last backup
SILVER Seconds to minutes of downtime
Near-zero data loss
GOLD Zero application outage
Zero data loss
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 24
Bronze HA Tier: Low Cost Single Instance
Minimize the cost of HA
– Consolidate to reduce cost
– Use HA features included with
Oracle Database Enterprise Edition
– Backups are first line of defense
against media and site failures
– Secure offsite tape storage for
archival and DR
– Enterprise Manager - Database as
a Service
RTO of Minutes to Days, RPO From Last Backup
ZFS Backup
Appliance
Single Instance MAA Off-site tape storage
for archival and DR (on premise or cloud)
Tape
Cloud
or
Tape
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 25
Oracle Database Architecture Requires memory, processes and database files
System Resources
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 26
New Multitenant Architecture Memory and processes required at multitenant container level only
System Resources
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 27
Oracle Corruption Protection
Automatic Storage
Management (ASM)
Flashback Technologies
Drop, Query, Transaction,
Table, and Database
Online Redefinition
Online Reorganization
Edition Based Redefinition
Online File Move
Online Patching
Oracle Restart
Recovery Manager (RMAN)
Fast Recovery Area
Oracle Secure Backup
Bronze
High Availability at Bronze Tier
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 28
ASM supports ALL data – database files, filesystems, Clusterware files (OCR, Voting Disk)
Built-in mirroring protects from disk failures
Auto-repair of corrupt blocks using a valid mirror copy
3rd Party FS Application
Automatic Storage Management
ASM Cluster & Single Node File System (ACFS)
Database
ACFS Snapshot
ASM Disk
Group
DB Datafiles, OCR and Voting Files Oracle Binaries 3rd Party File Systems
Dynamic Volume Manager
ASM Instance Managing
Oracle DB Files
Automatic Storage Management (ASM)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 29
Flashback Technologies
Fast point-in-time recovery (PITR) without
expensive restore operation
Error investigation
– View data as of previous point in time
Error correction
– Back-out a transaction
– Incorrect table updates
– Rewind the entire database
Rolling upgrades, Snapshot Standby
Rewind Button for Oracle Databases
@t2 Col-1 Col-.. Col-n
Row-1 tom 1234 vp
Row-2 ben 8834 vp
Row-3 charlie 9837 vp
Row-n tom 8793 vp
@t1 Col-1 Col-.. Col-n
Row-1 abby 1234 officer
Row-2 ben 8834 mgr
Row-3 Charlie 9837 officer
Row-n tom 8793 vp
Wrong
Update
Flashback
Table
DB @ T1 DB @ T2
Wrong Update
Batch
Update
Flashback
Database
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 30
Oracle Recovery Manager (RMAN)
Unique knowledge of database file
formats and recovery procedures
– Oracle block validation
– Online block-level recovery
– Native encryption, compression
– Table/partition-level recovery
– Oracle Multitenant support
Tape and cloud backups
Unified Management
Backup and Recovery Oracle
Enterprise
Manager
Tape Drive
Oracle Public Cloud
Amazon S3
RMAN
Data Files Fast Recovery
Area (FRA)
Ora
cle
Se
cu
re
Ba
ck
up
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 31 Copyright © 2013, Oracle and/or its affiliates. All rights reserved.
Enterprise Manager: Database as a Service
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 32
Unplanned Outages and Planned Maintenance Bronze Service – Single Instance MAA
Events Downtime Data Loss
Potential
Instance or Node failures Minutes Zero
Data corruptions, database failures or site failures Hours to days As of last
Backup
Online File Move, Online Reorganization and Redefinition,
Online Patching, App upgrade with Editions Based Redefinition
Zero to
near-zero Zero
Operating System or Database upgrades Minutes to hours Zero
Platform migrations or application upgrades Hours to day Zero
Pla
nn
ed
Ma
inte
na
nce
Un
pla
nn
ed
Ou
tag
es
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 33
Capability Physical Block Corruption Logical Block Corruption
Dbverify,
Analyze Physical block checks
Logical checks for intra-block and
inter-object consistency
RMAN Physical block checks during backup and restore Intra-block logical checks
RMAN • Automatic validation during backup and restore (to disk, tape)
Database In-memory block and redo checksum In-memory intra-block checks
ASM Automatic corruption detection and repair using extent pairs
Exadata HARD checks on write HARD checks on write
Oracle Data Protection Bronze Service – Single Instance MAA
Ru
nti
me
ch
ec
ks
Ma
nu
al
ch
ec
ks
Au
to
ch
ec
ks
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 35
Silver HA Tier: Real-Time Recovery
Bronze plus:
Server and instance HA
– RAC One Node
Database and site HA/DR
– Active Data Guard
– GoldenGate
– Site Guard
Global Load Balancing
– Global Data Services
RTO of Seconds to Minutes, RPO of Near-Zero
Backups
Site A
RAC
One Node
Site B
Active Data Guard
RAC
One Node
Backups Active Data Guard
Active Data Guard
GoldenGate
Global Data Services
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 36
Oracle RAC One Node
On failure of
– a database (DB) instance
– Or the server hosting the DB
Oracle RAC One Node will fail over
the database instance to another
server in the cluster
Online Database Relocation also
minimizes downtime during
scheduled maintenance operations
Instance and Server Failover
Node3 Node2
Oracle (Flex) ASM based pool of shared storage
Node1
Public Network
DBB DBC DBA
Oracle Grid Infrastructure
DBE DBD DBA DBD
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 37
Active Data Guard Best Protection, Highest Performance, All Data Types and Applications
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 38
Data Guard Redo Transport
Role-based
database services
start automatically
2
Integrated Database and Application Failover Data Guard Fast-Start Failover
Database Tier
Application Tier
Database Services
Primary Site Standby Site
Primary Database
FAN breaks clients
out of TCP timeout, TAF/FCF
causes applications to quickly
reconnect to new primary
3
Data Guard
automatic failover 1
Standby Database Primary Database
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 39
Oracle GoldenGate
Many to one replication: operational data store
Subset replication: data integration
Active/Active update anywhere: distributed high availability
Flexible Logical Replication
Source & Target
Oracle & Non-Oracle Database(s) Target & Source
Oracle & Non-Oracle Database(s)
Capture
Delivery
Trail
Files
Pump
Trail
Files Pump
Delivery
Capture
Bi-directional
LAN / WAN / Internet
Over TCP/IP
Trail
Files
Trail
Files
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 40
Minimize Planned Downtime
Active Data Guard or GoldenGate
Primary and standby begin at version n
Database Rolling for Maintenance that can’t be done Online
Primary version n
Standby version n
Defer replication, upgrade standby to n+1
Resynchronize standby with primary
Switch production – only downtime
Upgrade original primary to version n+1
and resynchronize
Standby version n+1
Standby version n
Primary version n+1
Standby version n+1
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 41
Oracle Site Guard
EM Cloud Control Plug-in
– Database Lifecycle
Management pack
Supports:
– Oracle Database
– Fusion Applications & Fusion Middleware
– Data Guard and storage replication
– Extensible to integrate additional infrastructure components
Automation for Site Switchover and Failover
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 42
Global Data Services (GDS)
Unified management
Workload routing & runtime
load balancing
Global service failover &
management
Benefits:
– Higher availability
– Improved performance
– Better manageability
Automated Workload Management for Replicated Databases
GoldenGate
Primary read-only
read-only
Active Data Guard
APAC Data Center
Human Resources
Order Entry
EMEA Data Center
read-write read-write
Global Data Services
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 43
Unplanned Outages and Planned Maintenance Silver Service: Data Protection
Events Downtime Data Loss
Potential
Instance or Node failures Seconds
(vs minutes) Zero
Data corruptions, database failures or site failures Seconds to minutes
(vs hours to days) Near-zero
Online File Move, Online Reorganization and
Redefinition, Online Patching, App upgrade with
Editions Based Redefinition
Zero to near-zero Zero
Operating System or Database upgrades Seconds to minutes
(vs minutes to hours) Zero
Platform migrations or application upgrades Seconds to minutes
(vs hours to day) Zero
Pla
nn
ed
Ma
inte
na
nce
Un
pla
nn
ed
Ou
tag
es
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 44
Capability Physical Block Corruption Logical Block Corruption
Dbverify, Analyze Physical block checks Logical intra-block and inter-
object consistency
RMAN Physical block checks during backup and restore Intra-block logical checks
RMAN • Automatic validation during backup, restore , replication
Active Data Guard
• Strong isolation eliminates single point of failure
• Continuous physical block checking at standby
• Automatic repair of physical corruptions
• Automatic failover
• Detect lost write corruption,
auto shutdown and failover
• Intra-block logical checks at
standby
Database In-memory block and redo checksum In-memory intra-block checks
ASM Automatic corruption detection and repair using extent pairs
Exadata HARD checks on write HARD checks on write
Oracle Data Protection R
un
tim
e
ch
ec
ks
Ma
nu
al
ch
ec
ks
Au
to
ch
ec
ks
Silver Service: Real-Time Recovery
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 46
Gold HA Tier: Maximum Availability
Silver plus:
Scalability and HA
– Oracle RAC
Zero application outage
– Application Continuity
Zero data loss over WAN
– Active Data Guard Far Sync
Zero downtime maintenance
– Oracle GoldenGate
Zero Application Outage, Zero Data Loss
Site A
Oracle
RAC
Site B
Active Data Guard Far Sync
Oracle
RAC
Backups
Backups
Application Continuity
GoldenGate
Active Data Guard Far Sync
Active Data Guard Far Sync
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 47
Oracle RAC
All instances active on all nodes
Best consolidation with Oracle Multitenant
– Single SGA and single set of background
processes per CDB instance
– More efficient, more scalable
Resource Manager prevents contention
– Prioritize resources between different
user groups and pluggable databases
Active-Active Clusters - Best Consolidation with Oracle Multitenant
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 48
Application Continuity Masks Unplanned & Planned Outages
Replays in-flight (DML)
work on recoverable errors
Masks many hardware, software,
network, storage errors and
outages when successful
Improves end-user experience and
productivity without requiring
custom application development
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 49
Active Data Guard Far Sync Zero Data Loss Protection with Application Continuity at ANY Distance
Primary
New York
Standby
London Far Sync
SYNC Data Guard
ASYNC
(compressed)
Far Sync
Zero data loss failover
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 50
GoldenGate Zero Downtime Maintenance
GoldenGate
Source and target begin at version n
Bi-Directional Replication
Source version n
Target version n
Defer replication, upgrade target to n+1
Start bi-directional replication
Synchronize source and target
Target is ready to accept new connections
Migrate users when they make new connections
Zero downtime
Upgrade or decommission original source
Target version n+1
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 51
BRONZE SILVER GOLD
RTO = Minutes to days RTO = Seconds to minutes Zero application outage
RPO = Since last backup RPO = Near zero Zero data loss
ASM, RMAN, Flashback,
OSB, Oracle Restart,
Online Redef/Reorg, EBR
Bronze plus: RAC One Node, Active
Data Guard, GoldenGate, Global
Data Services, Oracle Site Guard
Silver plus: Oracle RAC, Active
Data Guard, Application
Continuity, GoldenGate
Minimize CapEx
Minimize OpEx
• No single point of failure
• Real-time data protection
• Fast failover
• Minimal planned downtime
• High ROI
• Scalable performance
• Zero application outage
• Zero data loss at any distance
• Zero downtime for maintenance
MAA Deployment Architectures Address the Complete Range of Enterprise HA Requirements
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 52
MAA active architectures
reduce cost and increase ROI
MAA Solves HA Business Challenges
MAA eliminates risk of
downtime and data loss
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 53
Agenda
High Availability (HA) Business Challenge
Oracle Maximum Availability Architecture (MAA)
Oracle MAA Reference Architectures
Customer Deployments
PayPal's Critical Application Architecture
54
GoldenGate Real-time Data Integration *
Extreme Performance
• 300+K executions/sec
• Real Time analysis of 99.99% of
critical transactions.
• avg 40 ms response for 99.99%
•10 X performance compared to
pre-Exadata system
HA and MAA
• 99.99% Availability
• MAA technologies (RAC, ASM,
ADG, Exadata, Flashback, GG)
• All disk groups using high
redundancy
• Active Data Guard for auto block
corruption repair and DR
• Rolling upgrade using ASM,
Exadata, CRS, Data Guard, and
GoldenGate
Mission-critical
Databases
Primary Data Center
DR Data Center
Data Guard ASYNC Redo Transport
ETL
Targets Production Databases • 2 X Exadata X2-8
• 2 X Full Storage Expansion
Active Data Guard Standby • Offload queries and reads
• Corruption Protection
• Symmetric System
WAN, 650+ miles (30ms)
Test/Dev
Production and Standby Clusters = 8 Exadata Racks
3 identical Architectures = 24 Exadata Racks + Test/Dev Resources supporting our Critical Applications.
Oracle Database 12c MAA at PayPal
Data Guard
ASYNC
Mission-Critical Payment
Processing Databases
Data Guard Cascade
ARCH
DR Data Center
Active Data Guard Standby • Offload queries and reads
• Automatic corruption repair Data Guard
Physical Standby • Supports DR
WAN, 650+ miles
Primary
Data Center
Active Data Guard Standby • Offload queries and reads
• Automatic corruption repair
Data Guard
ASYNC
Read-Only Services Read- Write Services Read-Only Services
Data Guard 12c
FAST SYNC ASYNC
Active Data Guard 12c
Real Time Cascade Data Guard
ASYNC
Mission-Critical Payment
Processing Databases
Data Guard Cascade
ARCH
DR Data Center
Active Data Guard Standby • Offload queries and reads
• Automatic corruption repair Data Guard
Physical Standby • Supports DR
WAN, 650+ miles
Primary
Data Center
Active Data Guard Standby • Offload queries and reads
• Automatic corruption repair
Active Data Guard 12c
Far Sync - compressed
Read-Only Services Read- Write Services Read-Only Services
Data Guard 12c
FAST SYNC ASYNC
Active Data Guard 12c
Real Time Cascade
• Offload read-mostly
• Offload read-mostly
• Offload real-time data mining
Data Guard
ASYNC
Mission-Critical Payment
Processing Databases
Data Guard Cascade
ARCH
DR Data Center
Active Data Guard Standby • Offload queries and reads
• Automatic corruption repair Data Guard
Physical Standby • Supports DR
WAN, 650+ miles
Primary
Data Center
Active Data Guard Standby • Offload queries and reads
• Automatic corruption repair
Active Data Guard 12c
Far Sync - compressed Data Guard 12c
FAST SYNC ASYNC
Active Data Guard 12c
Real Time Cascade
• Offload read-mostly
• Offload read-mostly
• Offload real-time data mining
Oracle Database 12c Global Data Services • Global service management and High Availability
• Global load balancing and routing
Our Project
56
2012: Wells Fargo initiated a project for the brokerage to migrate away from using costly
hardware-based replication and an “All or Nothing” data center failover model for Disaster
Recovery (DR).
Our Goal:
Enable a more flexible component level failover
Reduce overall costs
Leverage the DR hardware to improve ROI
What We Had
57
Apps/Services
SAN
Oracle Servers
Active
Active
Active
SAN
Oracle Servers
Off Line
Off Line
Off Line
Apps/Services
RW RW
Tape
Backup
RMAN
10GigE
Databases Replicated As
a Single Consistency
Group
Storage Level (Async)
Replication
400+ miles
Tape Copies Shipped
What We Deployed
58
RO RW RO RW
SAN
Active
Read Only
Active
Oracle Grid
SAN
Read Only
Active
Read Only
Oracle Grid
Active Data Guard
Redo Records (Async)
Redo Records (Async)
Redo Records (Async)
Apps/Services Apps/Services RW Calls Follow
Active DB
Backup NAS
R/W Share
R/O Share
Backup NAS R/W
Share
R/O Share
NAS Replication of Backups
NAS Replication of Backups
RMAN 10GigE
RMAN 10GigE
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 60
OTN HA Portal:
http://www.oracle.com/goto/availability
Maximum Availability Architecture (MAA):
http://www.oracle.com/goto/maa
Exadata on OTN:
http://www.oracle.com/technetwork/database/exadata/index.html
Oracle HA Customer Success Stories on OTN:
http://www.oracle.com/technetwork/database/features/ha-casestudies-
098033.html
Resources