Realworld Rac Perf
-
Upload
arulkumaranb -
Category
Documents
-
view
214 -
download
0
Transcript of Realworld Rac Perf
-
8/8/2019 Realworld Rac Perf
1/52
Real Life RAC PerformanceReal Life RAC PerformanceTuningTuning
Arup NandaArup Nanda
Lead DBALead DBAStarwood HotelsStarwood Hotels
-
8/8/2019 Realworld Rac Perf
2/52
Arup Nanda Arup Nanda
Who am IWho am I
Oracle DBA for 13 years and countingOracle DBA for 13 years and counting
Working on OPS from 1999Working on OPS from 1999
Speak at conferences, write papers, booksSpeak at conferences, write papers, books
-
8/8/2019 Realworld Rac Perf
3/52
Arup Nanda Arup Nanda
Why This SessionWhy This Session
I get emails like this:I get emails like this:
We are facing performance issues in RAC.We are facing performance issues in RAC.
What should I do next?What should I do next?
Real Life AdviceReal Life Advice Common Issues (with Wait Events)Common Issues (with Wait Events)
Dispelling MythsDispelling Myths
Formulate a Plan of AttackFormulate a Plan of Attack Real Life Case StudyReal Life Case Study
proligence.com/downloads.htmlproligence.com/downloads.html
-
8/8/2019 Realworld Rac Perf
4/52
Arup Nanda Arup Nanda
Our RAC ImplementationOur RAC Implementation
Oracle 10g RAC in March 2004Oracle 10g RAC in March 2004
Itanium Platform running HP/UXItanium Platform running HP/UX
Oracle 10.1.0.2Oracle 10.1.0.2
Result: FailedResult: Failed
Second Attempt: Dec 2004Second Attempt: Dec 2004 10.1.0.310.1.0.3
Result: Failed AgainResult: Failed Again
Third Attempt: March 2005Third Attempt: March 2005 10.1.0.410.1.0.4
Result: Success!Result: Success!
-
8/8/2019 Realworld Rac Perf
5/52
Arup Nanda Arup Nanda
ChallengesChallenges
TechnologyTechnology Lone rangerLone ranger
A lot of mystery and disconnected facts!A lot of mystery and disconnected facts!
PeoplePeople Building a team that could not only deliver; butBuilding a team that could not only deliver; but
also sustain the delivered partsalso sustain the delivered parts
Each day we learned something newEach day we learned something new
In todays session: real performanceIn todays session: real performanceissues we faced and how we resolvedissues we faced and how we resolvedthem, along with wait events.them, along with wait events.
-
8/8/2019 Realworld Rac Perf
6/52
Arup Nanda Arup Nanda
Why RAC Performance?Why RAC Performance?
All tuning concepts in single instanceAll tuning concepts in single instanceapplied to RAC as wellapplied to RAC as well
RAC has other complexitiesRAC has other complexities
More than 1 buffer cacheMore than 1 buffer cache
Multiple cachesMultiple caches library cache, row cachelibrary cache, row cache
InterconnectInterconnect
PingingPinging Global LockingGlobal Locking
-
8/8/2019 Realworld Rac Perf
7/52
Arup Nanda Arup Nanda
Why RAC Perf TuningWhy RAC Perf Tuning
We want to make sure we identify the rightWe want to make sure we identify the rightproblem and go after itproblem and go after it
. not just. not just aa problemproblem
-
8/8/2019 Realworld Rac Perf
8/52
Arup Nanda Arup Nanda
Switch
VIP
Service
Listener
Instance
ASMClusterware
Op Sys
VIP
Service
Listener
Instance
ASMClusterware
Op Sys
Public Interface
Cache
Cache Fusion
OCR Voting
SwitchInterconnect
Storage
Node1 Node2
Lock Manager
-
8/8/2019 Realworld Rac Perf
9/52
Arup Nanda Arup Nanda
Areas of Concern in RACAreas of Concern in RAC
More than 1 buffer cacheMore than 1 buffer cache
Multiple cachesMultiple caches library cache, row cachelibrary cache, row cache
InterconnectInterconnect
Global LockingGlobal Locking
-
8/8/2019 Realworld Rac Perf
10/52
Arup Nanda Arup Nanda
Cache IssuesCache Issues
Two Caches, requires synchronizationTwo Caches, requires synchronization
What that means:What that means:
A changed block in one instance, whenA changed block in one instance, when
requested by another, should be sent acrossrequested by another, should be sent acrossvia a bridgevia a bridge
This bridge is the InterconnectThis bridge is the Interconnect
-
8/8/2019 Realworld Rac Perf
11/52
Arup Nanda Arup Nanda
Interconnect PerformanceInterconnect Performance
Interconnect must be on a private LANInterconnect must be on a private LAN
Port aggregation to increase throughputPort aggregation to increase throughput
APA on HPUXAPA on HPUX
If using Gigabit over Ethernet, use JumboIf using Gigabit over Ethernet, use JumboFramesFrames
-
8/8/2019 Realworld Rac Perf
12/52
Arup Nanda Arup Nanda
Checking Interconnect UsedChecking Interconnect Used
Identify the interconnect usedIdentify the interconnect used$ oifcfggetif$ oifcfggetif
lan902 172.17.1.0 globallan902 172.17.1.0 global
cluster_interconnectcluster_interconnect
lan901 10.28.188.0 global publiclan901 10.28.188.0 global public
Is lan902 the bonded interface? If not,Is lan902 the bonded interface? If not,
then set itthen set it$ oifcfgsetif $ oifcfgsetif
-
8/8/2019 Realworld Rac Perf
13/52
Arup Nanda Arup Nanda
Pop QuizPop Quiz
If I have a very fast interconnect, I canIf I have a very fast interconnect, I canperform the same work in multiple nodeperform the same work in multiple nodeRAC as a single server with faster CPUs.RAC as a single server with faster CPUs.
True/False?True/False? Since cache fusion is now writeSince cache fusion is now write--write, awrite, a
fast interconnect will compensate for afast interconnect will compensate for a
slower IO subsystem. True/False?slower IO subsystem. True/False?
-
8/8/2019 Realworld Rac Perf
14/52
Arup Nanda Arup Nanda
Cache Coherence TimesCache Coherence Times
The time is a sum of time for:The time is a sum of time for:
Finding the block in the cacheFinding the block in the cache
Identifying the masterIdentifying the master
Get the block in the interconnectGet the block in the interconnect Transfer speed of the interconnectTransfer speed of the interconnect
Latency of the interconnectLatency of the interconnect
Receive the block by the remote instanceReceive the block by the remote instance Create the consistent image for the userCreate the consistent image for the user
CPU
C
PU
Interconnect
-
8/8/2019 Realworld Rac Perf
15/52
Arup Nanda Arup Nanda
So it all boils down to:So it all boils down to:
Block Access CostBlock Access Cost more blocksmore blocks --> more the time> more the time
Parallel QueryParallel Query
Lock Management CostLock Management Cost More coordinationMore coordination --> more time> more time
Implicit Cache ChecksImplicit Cache Checks Sequence NumbersSequence Numbers
Interconnect CostInterconnect Cost
LatencyLatency SpeedSpeed
more data to transfermore data to transfer --> more the time> more the time
-
8/8/2019 Realworld Rac Perf
16/52
Arup Nanda Arup Nanda
Hard LessonsHard Lessons
In RAC, problem symptoms may notIn RAC, problem symptoms may notindicate the correct problem!indicate the correct problem!
Example:Example:
When the CPU is too busy to receive or sendWhen the CPU is too busy to receive or sendpackets via UDP, the packets fails and thepackets via UDP, the packets fails and theClusterware thinks the node is down andClusterware thinks the node is down andevicts it.evicts it.
-
8/8/2019 Realworld Rac Perf
17/52
Arup Nanda Arup Nanda
OS TroubleshootingOS Troubleshooting
OS utilities to troubleshoot CPU issuesOS utilities to troubleshoot CPU issues toptop
glanceglance
OS Utilities to troubleshoot processOS Utilities to troubleshoot processissues:issues:
trusstruss
stracestrace dbxdbx
pstackpstack
-
8/8/2019 Realworld Rac Perf
18/52
Arup Nanda Arup Nanda
Reducing LatencyReducing Latency
A factor of technologyA factor of technology
TCP is the most latentTCP is the most latent
UDP is better (over Ethernet)UDP is better (over Ethernet)
Proprietary protocols are usually betterProprietary protocols are usually better HyperFabric by HPHyperFabric by HP
Reliable Datagram (RDP)Reliable Datagram (RDP)
Direct Memory ChannelDirect Memory Channel
InfinibandInfiniband UDP over InfinibandUDP over Infiniband
RDP over InfinibandRDP over Infiniband
-
8/8/2019 Realworld Rac Perf
19/52
Arup Nanda Arup Nanda
Start with AWRStart with AWR
-
8/8/2019 Realworld Rac Perf
20/52
Arup Nanda Arup Nanda
gc current|cr grant 2gc current|cr grant 2--wayway
Instance 1
Instance 2
Session
Database
LMS LGWR
Log BufferLog Buffer
LMS
gc current block requestgc current grant 2-way
-
8/8/2019 Realworld Rac Perf
21/52
Arup Nanda Arup Nanda
gc current|cr block 2gc current|cr block 2--wayway
Instance 1
Instance 2
Session
Database
LMS LGWR
Log BufferLog Buffer
LMS
gc current block request log file syncgc current block 2-way
-
8/8/2019 Realworld Rac Perf
22/52
Arup Nanda Arup Nanda
gc current|cr block 3gc current|cr block 3--wayway
Instance 1
Instance 2
Instance 3
Session
Requestor
Master
Holder
gc current block 3-way
-
8/8/2019 Realworld Rac Perf
23/52
Arup Nanda Arup Nanda
RAC related StatsRAC related Stats
-
8/8/2019 Realworld Rac Perf
24/52
Arup Nanda Arup Nanda
RAC Stats contd.RAC Stats contd.
-
8/8/2019 Realworld Rac Perf
25/52
Arup Nanda Arup Nanda
-
8/8/2019 Realworld Rac Perf
26/52
Arup Nanda Arup Nanda
Other GC Block WaitsOther GC Block Waits
gccurrent/crblocklostgccurrent/crblocklost
Lost blocks due to Interconnect or CPULost blocks due to Interconnect or CPU
gccurent/crblockbusygccurent/crblockbusy
The consistent read request was delayed,The consistent read request was delayed,most likely an I/O bottleneckmost likely an I/O bottleneck
gccurrent/crblockcongestedgccurrent/crblockcongested
Long run queues and/or paging due toLong run queues and/or paging due tomemory deficiency.memory deficiency.
-
8/8/2019 Realworld Rac Perf
27/52
Arup Nanda Arup Nanda
Hung or Slow?Hung or Slow?
CheckCheck V$SESSIONV$SESSION forforWAIT_TIMEWAIT_TIME
If 0, then its not waiting; its hungIf 0, then its not waiting; its hung
When hung:When hung:
Take a systemstate dump from all nodesTake a systemstate dump from all nodes
Wait some timeWait some time
Take another systemstate dumpTake another systemstate dump
Check change in values. If unchanged, thenCheck change in values. If unchanged, thensystem is hungsystem is hung
-
8/8/2019 Realworld Rac Perf
28/52
Arup Nanda Arup Nanda
Chart a PlanChart a Plan
Rule out the obviousRule out the obvious
Start with AWR ReportStart with AWR Report
Start with TopStart with Top--5 Waits5 Waits
See if they have any significant waitsSee if they have any significant waits
especially RAC related especially RAC related
Go on to RAC StatisticsGo on to RAC Statistics
Base your solution based on the waitBase your solution based on the waiteventevent
-
8/8/2019 Realworld Rac Perf
29/52
Arup Nanda Arup Nanda
Rule out the obviousRule out the obvious
Is interconnect private?Is interconnect private?
Is interconnect on UDP?Is interconnect on UDP?
Do you see high CPU?Do you see high CPU?
Do you see a lot of IO bottleneck?Do you see a lot of IO bottleneck?
How about memory?How about memory?
Are the apps spread over evenly?Are the apps spread over evenly?
Do you see lost blocks?Do you see lost blocks?
-
8/8/2019 Realworld Rac Perf
30/52
Arup Nanda Arup Nanda
Make Simple FixesMake Simple Fixes
Strongly consider RAID 0+1Strongly consider RAID 0+1
Highest possible number of I/O pathsHighest possible number of I/O paths
Use fastest interconnect possibleUse fastest interconnect possible
Use private collision free domain for I/CUse private collision free domain for I/C
Cache and NOORDER sequencesCache and NOORDER sequences
-
8/8/2019 Realworld Rac Perf
31/52
Arup Nanda Arup Nanda
Enterprise ManagerEnterprise Manager
-
8/8/2019 Realworld Rac Perf
32/52
Arup Nanda Arup Nanda
Buffer BusyBuffer Busy
CauseCause Instance wants to bring something from diskInstance wants to bring something from disk
to the buffer cacheto the buffer cache
Delay, due to space not availableDelay, due to space not available Delay, bcoz the source buffer is not readyDelay, bcoz the source buffer is not ready
Delay, I/O is slowDelay, I/O is slow
Delay, bcoz redo log is being flushedDelay, bcoz redo log is being flushed
In summaryIn summary
Log buffer flushLog buffer flush --> gc buffer busy> gc buffer busy
-
8/8/2019 Realworld Rac Perf
33/52
Arup Nanda Arup Nanda
Parallel QueryParallel Query
One major issue in RAC is parallel queryOne major issue in RAC is parallel querythat goes across many nodesthat goes across many nodes
Instance 1 Instance 2 QC
Slave Slave Slave Slave
ViaInterconnect
-
8/8/2019 Realworld Rac Perf
34/52
Arup Nanda Arup Nanda
Restricting PQRestricting PQ
Define Instance GroupsDefine Instance Groups
Specify in init.oraSpecify in init.oraprodb1.instance_groups='pqgroup1'prodb1.instance_groups='pqgroup1'
prodb2.instance_groups='pqgroup2'prodb2.instance_groups='pqgroup2'
Specify Instance Groups in SessionSpecify Instance Groups in SessionSQL> altersessionsetSQL> altersessionset
parallel_instance_group=parallel_instance_group='pqgroup1';'pqgroup1';
-
8/8/2019 Realworld Rac Perf
35/52
Arup Nanda Arup Nanda
Forcing PQ on both NodesForcing PQ on both Nodes
Define a common Instance GroupDefine a common Instance Groupprodb1.instance_groups='pqgroup1prodb1.instance_groups='pqgroup1
prodb1.instance_groups=pq2nodes'prodb1.instance_groups=pq2nodes'
prodb2.instance_groups='pqgroup2'prodb2.instance_groups='pqgroup2'prodb2.instance_groups='pq2nodes'prodb2.instance_groups='pq2nodes'
Specify Instance Groups in SessionSpecify Instance Groups in Session
SQL> altersessionsetSQL> altersessionsetparallel_instance_group=parallel_instance_group=
'pq2nodes';'pq2nodes';
-
8/8/2019 Realworld Rac Perf
36/52
Arup Nanda Arup Nanda
Vital Cache Fusion ViewsVital Cache Fusion Views
gv$cache_transfer:gv$cache_transfer: Monitor blocksMonitor blockstransferred by objecttransferred by object
gv$class_cache_transfer:gv$class_cache_transfer: MonitorMonitor
block transfer by classblock transfer by class gv$file_cache_transfer:gv$file_cache_transfer: MonitorMonitor
the blocks transferred per filethe blocks transferred per file
gv$temp_cache_transfer:gv$temp_cache_transfer: MonitorMonitorthe transfer of temporary tablespacethe transfer of temporary tablespaceblocksblocks
-
8/8/2019 Realworld Rac Perf
37/52
Arup Nanda Arup Nanda
Hot TablesHot Tables
Tables, e.g. Rate PlansTables, e.g. Rate Plans SmallSmall Compact blocksCompact blocks High updatesHigh updates High readsHigh reads
SymptomsSymptoms gc buffer busy waitsgc buffer busy waits
SolutionSolution Less rows per blockLess rows per block High PCTFREE, INITRANS,High PCTFREE, INITRANS, ALTER TABLE MINIMIZEALTER TABLE MINIMIZE
RECORDS_PER_BLOCKRECORDS_PER_BLOCK
-
8/8/2019 Realworld Rac Perf
38/52
Arup Nanda Arup Nanda
Hot SequencesHot Sequences
Symptoms:Symptoms: High waits on Sequence Number latchHigh waits on Sequence Number latch
High waits on SEQ$ tableHigh waits on SEQ$ table
Solution:Solution: Increase the cacheIncrease the cache
Make it NOORDERMake it NOORDER
Especially AUDSESS$ sequence in SYS,Especially AUDSESS$ sequence in SYS,used in Auditingused in Auditing
-
8/8/2019 Realworld Rac Perf
39/52
Arup Nanda Arup Nanda
Read Only? Say So.Read Only? Say So.
Reading table data from other instancesReading table data from other instancescreate gc * contentionscreate gc * contentions
Suggestion:Suggestion:
Move Read Only tables to a single tablespaceMove Read Only tables to a single tablespace Make this tablespace Read OnlyMake this tablespace Read Only
SQL> altertablespace ROD readSQL> altertablespace ROD readonly;only;
-
8/8/2019 Realworld Rac Perf
40/52
Arup Nanda Arup Nanda
PartitioningPartitioning
Partitioning creates several segments forPartitioning creates several segments forthe same table (or index)the same table (or index)
=> more resources=> more resources
=> less contention=> less contention
-
8/8/2019 Realworld Rac Perf
41/52
Arup Nanda Arup Nanda
Monotonically Increasing IndexMonotonically Increasing Index
Problem:Problem: Reservation ID, a sequence generated keyReservation ID, a sequence generated key
Index is heavy on one sideIndex is heavy on one side
SymptomsSymptoms Buffer busy waitsBuffer busy waits
Index block spiltsIndex block spilts
Solutions:Solutions: Reverse key indexesReverse key indexes
Hash partitioned index (even if the table is notHash partitioned index (even if the table is notpartitioned) 10gR2partitioned) 10gR2
-
8/8/2019 Realworld Rac Perf
42/52
Arup Nanda Arup Nanda
Library CacheLibrary Cache
In RAC, Library Cache is globalIn RAC, Library Cache is global
So, parsing cost is worse than nonSo, parsing cost is worse than non--RACRAC
Solutions:Solutions:
Minimize table alters, drops, creates,Minimize table alters, drops, creates,truncatestruncates
Use PL/SQL stored programs, not unnamedUse PL/SQL stored programs, not unnamed
blocksblocks
-
8/8/2019 Realworld Rac Perf
43/52
Arup Nanda Arup Nanda
Log FilesLog Files
In 10g R2, the log files are in a single location:In 10g R2, the log files are in a single location: $CRS_HOME/log//$CRS_HOME/log//
racgracg
crsdcrsd
cssdcssd
evmdevmd
clientclient
cssd/oclsmoncssd/oclsmon $ORACLE_HOME/racg/dump$ORACLE_HOME/racg/dump
-
8/8/2019 Realworld Rac Perf
44/52
Arup Nanda Arup Nanda
Case StudyCase Study
-
8/8/2019 Realworld Rac Perf
45/52
Arup Nanda Arup Nanda
DiagnosisDiagnosis
ifconfigifconfig --aa shows no congestion orshows no congestion ordropped packetsdropped packets
Top shows 1% idle time on node 2Top shows 1% idle time on node 2
Top processesTop processes LMS and LMDLMS and LMD
And, several Netbackup processesAnd, several Netbackup processes
-
8/8/2019 Realworld Rac Perf
46/52
Arup Nanda Arup Nanda
Further DiagnosisFurther Diagnosis
SQL:SQL:select * fromv$instance_cache_transferselect * fromv$instance_cache_transferwhereclass='datablock'whereclass='datablock'andinstance= 1;andinstance= 1;
Output:Output:INSTANCE CLASS CR_BLOCK CR_BUSYINSTANCE CLASS CR_BLOCK CR_BUSY
-------------------- ------------------------------------ -------------------- --------------------
CR_CONGESTED CURRENT_BLOCK CURRENT_BUSY CURRENT_CONGESTEDCR_CONGESTED CURRENT_BLOCK CURRENT_BUSY CURRENT_CONGESTED------------------------ -------------------------- ------------------------ ----------------------------------
1 datablock 162478682 50971491 datablock 162478682 5097149
477721 347917908 2950144 16320267477721 347917908 2950144 16320267
After sometime:After sometime:INSTANCE CLASS CR_BLOCK CR_BUSYINSTANCE CLASS CR_BLOCK CR_BUSY
-------------------- ------------------------------------ -------------------- --------------------CR_CONGESTED CURRENT_BLOCK CURRENT_BUSY CURRENT_CONGESTEDCR_CONGESTED CURRENT_BLOCK CURRENT_BUSY CURRENT_CONGESTED
------------------------ -------------------------- ------------------------ ----------------------------------
1 datablock 162480580 50971851 datablock 162480580 5097185477722 347923719 2950376 16320269477722 347923719 2950376 16320269
Seeincreases
-
8/8/2019 Realworld Rac Perf
47/52
Arup Nanda Arup Nanda
Diagnosis:Diagnosis: CPU starvation by LMS/D processes causedCPU starvation by LMS/D processes caused
GC waits.GC waits.
Solution:Solution: Killed the Netbackup processesKilled the Netbackup processes
LMD and LMS got the CPULMD and LMS got the CPU
-
8/8/2019 Realworld Rac Perf
48/52
Arup Nanda Arup Nanda
Increasing Interconnect SpeedIncreasing Interconnect Speed
Faster HardwareFaster Hardware Gigabit Ethernet; not FastGigabit Ethernet; not Fast
Infiniband, even if IP over IBInfiniband, even if IP over IB
NIC settingsNIC settings Duplex ModeDuplex Mode
Highest Top Bit RateHighest Top Bit Rate
TCP SettingsTCP Settings Flow Control SettingsFlow Control Settings
Network Interrupts for CPUNetwork Interrupts for CPU
Socket Receive BufferSocket Receive Buffer
LAN PlanningLAN Planning Private LANsPrivate LANs
Collision DomainsCollision Domains
-
8/8/2019 Realworld Rac Perf
49/52
Arup Nanda Arup Nanda
High Speed InterconnectsHigh Speed Interconnects
Oracle will support RDS over InfinibandOracle will support RDS over Infiniband http://oss.oracle.com/projects/rds/http://oss.oracle.com/projects/rds/
On 10 Gig Ethernet as wellOn 10 Gig Ethernet as well
-
8/8/2019 Realworld Rac Perf
50/52
Arup Nanda Arup Nanda
In summary: PlanningIn summary: Planning
Adequate CPU, Network, MemoryAdequate CPU, Network, Memory SequencesSequences cache, noordercache, noorder
Tablespaces read onlyTablespaces read only
UnUn--compact small hot tablescompact small hot tables
Keep undo and redo on fastest disksKeep undo and redo on fastest disks
Avoid full table scans of large tablesAvoid full table scans of large tables
Avoid DDLs and unnamed PL/SQL blocksAvoid DDLs and unnamed PL/SQL blocks
-
8/8/2019 Realworld Rac Perf
51/52
Arup Nanda Arup Nanda
In summary: DiagnosisIn summary: Diagnosis
Start with AWRStart with AWR Identify symptoms and assign causesIdentify symptoms and assign causes
Dont get fooled by gc waits asDont get fooled by gc waits as
interconnect issuesinterconnect issues Find the correlation between droppedFind the correlation between dropped
packets in network, CPU issues from sarpackets in network, CPU issues from sar
and gc buffer lost in sysstat reports.and gc buffer lost in sysstat reports.
-
8/8/2019 Realworld Rac Perf
52/52
Arup Nanda Arup Nanda
Thank You!Thank You!
Download from:Download from:proligence.com/downloads.htmlproligence.com/downloads.html