Oracle RAC 12c Rel. 2 for Continuous Availability

65

Transcript of Oracle RAC 12c Rel. 2 for Continuous Availability

Page 1: Oracle RAC 12c Rel. 2 for Continuous Availability
Page 2: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Real  Applica@on  Clusters  (RAC)  12c  Release  2  –    For  Con@nuous  Availability    

Markus  Michalewicz  Senior  Director  of    Product  Management,    Oracle  RAC  Development  

 [email protected]      @OracleRACpm    hQp://www.linkedin.com/in/markusmichalewicz        hQp://www.slideshare.net/MarkusMichalewicz    

Page 3: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Safe  Harbor  Statement  The  following  is  intended  to  outline  our  general  product  direc@on.  It  is  intended  for  informa@on  purposes  only,  and  may  not  be  incorporated  into  any  contract.  It  is  not  a  commitment  to  deliver  any  material,  code,  or  func@onality,  and  should  not  be  relied  upon  in  making  purchasing  decisions.  The  development,  release,  and  @ming  of  any  features  or  func@onality  described  for  Oracle’s  products  remains  at  the  sole  discre@on  of  Oracle.  

3  

Page 4: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Edi@on-­‐based  Redefini@on,    Online  Redefini@on,  Data  Guard,  GoldenGate  –   Minimal  down+me  maintenance,  upgrades,  migra+ons  

Ac@ve  Data  Guard  – Data  Protec+on,  DR  – Query  Offload  

GoldenGate  – Ac+ve-­‐ac+ve  replica+on  – Heterogeneous  

Ac@ve  Replica  

Oracle  Maximum  Availability  Architecture  (MAA)  

RMAN,  Oracle  Secure  Backup  –  Backup  to  disk,  tape  or  cloud  

Enterprise  Manager  Cloud  Control  – Coordinated  Site  Failover  Applica@on  Con@nuity  – Applica+on  HA  Global  Data  Services    – Service  Failover  /  Load  Balancing  

RAC  – Scalability  – Server  HA  

Flashback  – Human  error  correc+on  

Produc@on  Site  

ASM  – ASM  mirroring  

Page 5: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Edi@on-­‐based  Redefini@on,    Online  Redefini@on,  Data  Guard,  GoldenGate  –   Minimal  down+me  maintenance,  upgrades,  migra+ons  

Ac@ve  Data  Guard  – Data  Protec+on,  DR  – Query  Offload  

GoldenGate  – Ac+ve-­‐ac+ve  replica+on  – Heterogeneous  

Ac@ve  Replica  

Oracle  Maximum  Availability  Architecture  (MAA)  

RMAN,  Oracle  Secure  Backup  –  Backup  to  disk,  tape  or  cloud  

Enterprise  Manager  Cloud  Control  – Coordinated  Site  Failover  Applica@on  Con@nuity  – Applica+on  HA  Global  Data  Services    – Service  Failover  /  Load  Balancing  

RAC  – Scalability  – Server  HA  

Flashback  – Human  error  correc+on  

Produc@on  Site  

Edi@on-­‐based  Redefini@on,    Online  Redefini@on,  Data  Guard,  GoldenGate  –   Minimal  down+me  maintenance,  upgrades,  migra+ons  

Ac@ve  Data  Guard  – Data  Protec+on,  DR  – Query  Offload  

GoldenGate  – Ac+ve-­‐ac+ve  replica+on  – Heterogeneous  

Ac@ve  Replica  

RMAN,  Oracle  Secure  Backup  –  Backup  to  disk,  tape  or  cloud  

Enterprise  Manager  Cloud  Control  – Coordinated  Site  Failover  Applica@on  Con@nuity  – Applica+on  HA  Global  Data  Services    – Service  Failover  /  Load  Balancing  

RAC  – Scalability  – Server  HA  

Flashback  – Human  error  correc+on  

Produc@on  Site  

ASM  – ASM  mirroring  

Page 6: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Program  Agenda  

High  Availability  Improvements  

Con@nuous  Availability  Features  

1  

2  

6  

Page 7: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Program  Agenda  

High  Availability  Improvements  

Con@nuous  Availability  Features  

1  

2  

7  

Page 8: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Reduced  failure  detec(on  @me  for  an  increased  number  of  monitored  components  

8  

Reduced  (me  to  recover    from  local  failures  due  to    

reduced  reconfigura@on  @mes    

Preven(on  of  system  or  database  failures  using  ML-­‐based  real-­‐@me  

analysis  of  diagnos@c  data  

RAC  High  Availability  Improvements    

Page 9: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Reduced  failure  detec(on  @me  for  an  increased  number  of  monitored  components  

9  

Reduced  (me  to  recover    from  local  failures  due  to    

reduced  reconfigura@on  @mes    

Preven(on  of  system  or  database  failures  using  ML-­‐based  real-­‐@me  

analysis  of  diagnos@c  data  

RAC  High  Availability  Improvements    

Page 10: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

More  Components  Checked  More  Frequently    •  Oracle  Clusterware  checks  – more  components    •  Mul@ple  public  networks  checked  with  Ping  Targets  

– more  frequently    •  VIPs  checked  every  second    •  30  secs  CSS  misscount  default,  zero  brownout  allows  for  less  

– more  efficiently    •  Agent  changes  allow  for  more  checks  using  lesser  resources  •  Data  from  auxiliary  systems  are  taken  into  account  •  Engineered  System-­‐op(mized  failure  detec(on  and  fencing  

– and  offline    •  Offline  monitoring  of  failed  components  for  faster  recovery  

–  to  detect  failures  sooner  and  to  recover  faster  

10  

Page 11: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Reduced  failure  detec(on  @me  for  an  increased  number  of  monitored  components  

11  

Reduced  (me  to  recover    from  local  failures  due  to    

reduced  reconfigura@on  @mes    

Preven(on  of  system  or  database  failures  using  ML-­‐based  real-­‐@me  

analysis  of  diagnos@c  data  

RAC  High  Availability  Improvements    

Page 12: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Smart  Fencing  

12  

Page 13: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   13  

•  Pre-­‐12.2,  node  evic@on  follows    a  rather  “ignorant”  paQern  –  Example  in  a  2-­‐node  cluster:  The  node    with  the  lowest  node  number  survives.    

•  Customers  must  not  base  their  applica@on  logic  on  which  node    survives  the  split  brain.    –  As  this  may(!)  change  in  future  releases    

Node  Evic@on  Basics  h=p://www.slideshare.net/MarkusMichalewicz/oracle-­‐clusterware-­‐node-­‐management-­‐and-­‐vo(ng-­‐disks    

✔  1   2  

Page 14: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   14  

•  Node  Weigh@ng  is  a  new  feature  that  considers  the  workload  hosted  in  the  cluster  during  fencing  

•  The  idea  is  to  let  the  majority  of  work  survive,    if  everything  else  is  equal  –  Example:  In  a  2-­‐node  cluster,  the  node  hos@ng  the  

majority  of  services  (at  fencing  @me)  is  meant  to  survive    

Node  Weigh@ng  in  Oracle  RAC  12c  Release  2  Idea:  Everything  equal,  let  the  majority  of  work  survive  

✔  1   2  

Page 15: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

A  three  node  cluster  will  benefit  from  “Node  Weigh@ng”,  if  three  equally  sized  sub-­‐clusters  are  built  as  s  result  of  the  failure,  since  two  differently  sized  sub-­‐clusters  are  

not  equal.    

15  

Secondary  failure  considera(on  can  influence  which  node  survives.  Secondary  failure  considera@on  will  be  enhanced  successively.    

A  fallback  scheme    is  applied  if  considera@ons  do  not  lead  to  an  ac@onable  outcome.    

Let’s  Define  “Equal”  

✔  

Public  network  card  failure.   “Conflict”.  

Page 16: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

CSS_CRITICAL  can  be  set  on  various  levels  /  components  to  mark  them  as  

“cri@cal”  so  that  the  cluster  will  try  to  preserve  them  in  case  of  a  failure.    

16  

CSS_CRITICAL  will  be  honored  if  no  other  technical  reason  prohibits  survival  of  the  node  which  has  at  least  one  cri@cal  

component  at  the  @me  of  failure.    

A  fallback  scheme  is  applied  if  CSS_CRITICAL  sepngs  do  not  lead  

to  an  ac@onable  outcome.    

CSS_CRITICAL  –  Fencing  with  Manual  Override  

crsctl  set  server  css_cri(cal  {YES|NO}  

+  server  restart  

srvctl  modify  database  -­‐help  |grep  cri@cal  

…  -­‐css_cri@cal  {YES  |  NO}                    

Define  whether  the  database  or  service  is  CSS  cri@cal  

✔  Node  evic@on  despite  WL;  WL  will  failover.      

“Conflict”.  

Page 17: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Recovery  Buddies  

17  

Page 18: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   18  

•  Recovery  Buddies  •  Track  block  changes  on  buddy  instance    

•  Quickly  iden@fy  blocks  requiring  recovery  during  reconfigura@on    

•  Allow  rapid  processing  of    transac@ons  awer  failures  

Near  Zero  Reconfigura@on  Time  with  Recovery  Buddies  A.k.a.  Buddy  Instances  

Page 19: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   19  

•  Buddy  Instance  mapping  is  simple  (random)  –  e.g.  I1  à  I2,  I2  à  I3,  I3  à  I4,  I4  à  I1  

•  Recovery  buddies  are  assigned  during  startup  •  RMS0  on  each  recovery  buddy  instance  maintains  an  in-­‐memory  area  for  redo  log  change    

•  An  in-­‐memory  area  is  used  during  recovery  –  Eliminates  the  need  to  physically  read  the  redo  

Near  Zero  Reconfigura@on  Time  with  Recovery  Buddies  How  it  works  under  the  hood  

Instance    I1  

Instance  I2  

Instance  I3  

Instance  I4  

Recovery  Buddy  I3  

Recovery  Buddy  I4  

Recovery  Buddy  I1  

MyCluster  

Recovery  Buddy  I2  

Page 20: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

How  Recovery  Buddies  Help  Reducing  Recovery  Time  

Without  Recovery  Buddies   With  Recovery  Buddies  

20  

Detect  

Evict  

Elect  Recovery  

Read  Redo  

Apply  Recovery  

Detect  

Evict  

Elect  Recovery  

Read  Redo  

Apply  Recovery  

Up  to  4x  

faster  

Page 21: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Database    Hang  Manager  

21  

Page 22: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Overlooked  and  Underes@mated  –  Hang  Manager  

• Customers  experience  database  hangs  for  a  variety  of  reasons  –  High  system  load,  workload  conten@on,  network  conges@on,  general  errors,  etc.        

• Before  Hang  Manager  was  introduced  with  Oracle  RAC  11.2.0.2    –  Oracle  required  quite  some  informa@on  to  troubleshoot  a  hang  -­‐  e.g.:    •  System  state  dumps  •  For  RAC:  global  system  state  dumps  

–  Customer  usually  had  to  reproduce  “the”  hang  with  addi@onal  events  to  analyze  it  

22  

Why  having  a  Hang  Manager  is  useful  

Page 23: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   23  

•  Always  on,  as  enabled  by  default  •  Reliably  detects  database  hangs  •  Autonomically  resolves  hangs  

•  Considers  QoS  policies  for  hang  resolu@on  •  Logs  all  detected  hangs  &  their  resolu@ons  

Introduc@on  to  Hang  Manager  How  it  works   Session  

DIAG0  

EVALUATE

DETECT

ANALYZE

Hung?  

VERIFY

Vic(m  

QoS  Policy  

Page 24: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   24  

•  Hang  Manager  auto-­‐tunes  itself  by  periodically  collec@ng  instance-­‐and  cluster-­‐wide  hang  sta@s@cs    

•  Metrics  like  cluster  health/instance    health  is  tracked  over  a  moving  average    

•  This  moving  average  is    considered  during  resolu@on    

•  Holders  wai@ng  on  SQL*Net    break/reset  are  fast  tracked  

Hang  Manager  Op@miza@ons  with  Oracle  RAC  12c  Tuning  under  the  hood  

Page 25: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   25  

•  Early  warning  exposed  via  (V$  view)    

•  Sensi@vity  can  be  set  higher  –  If  the  default  level  is  too  conserva@ve    

•  Hang  Manager  considers  QoS  policies  and  data  during  the  valida@on  process  

DBMS_HANG_MANAGER.Sensi@vity  A  new  SQL  interface  to  set  Hang  Manager  sensi@vity    

Hang  Sensi(vity  Level  

Descrip(on   Note  

NORMAL   Hang  Manager  uses  its  default  internal  opera@ng  parameters  to  try  to  meet  typical  requirements  for  any  environments.  

Default  

HIGH   Hang  Manager  is  more  alert  to  sessions  wai@ng  in  a  chain  than  when  sensi@vity  is  in  NORMAL  level.    

Page 26: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Reduced  failure  detec(on  @me  for  an  increased  number  of  monitored  components  

26  

Reduced  (me  to  recover    from  local  failures  due  to    

reduced  reconfigura@on  @mes    

Preven(on  of  system  or  database  failures  using  ML-­‐based  real-­‐@me  

analysis  of  diagnos@c  data  

RAC  High  Availability  Improvements    

Page 27: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Autonomous  Health  Framework  (AHF)  

•  Integrates  next  genera@on  tools  running  as  components  -­‐  24/7    • Discovers  Poten@al  Issues  and  No@fies  or  takes  Correc@ve  Ac@ons  •  Speeds  up  Issue  Diagnosis  and  Recovery  • Preserves  Database  and  Server  Availability  and  Performance  • Autonomously  Monitors  and  Manages  resources  to  maintain  SLAs  

27  

Working  for  You  Con(nuously  

Page 28: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

AHF  –  Availability  by  Pla}orm  

28  

Linux  x86-­‐64   zLinux   Solaris  (Sparc)   HP-­‐UX  Itanium   IBM  AIX   Windows  z86-­‐64  

Cluster  Verifica(on  U(lity  (CVU)  

✔ ✔V:  March  2015   ✔ ✔

V:  August  2015  ✔

V:  August  2015  ✔

V:  August  2015  

ORAchk   ✔   ✔   ✔   ✔   ✔   ✔  

Cluster  Health  Monitor  (CHM)   ✔ ✗���

Not  planned   ✔✗���

Not  planned   ✔ ✔

Cluster  Health  Advisor  (CHA)  

✔Since  12.2.0.1  

✗���Not  planned  

✗���Future  Release  

✗���Not  planned  

✗���Future  Release  

✗���Not  planned  

Trace  File  Analyzer  (TFA)   ✔   ✔   ✔   ✔

(no  TFA  web)   ✔   ✔(no  TFA  web)  

Hang  Manager   ✔   ✔   ✔   ✔   ✔   ✔  

Memory  Guard   ✔✗���

Not  planned   ✔✗���

Not  planned   ✔ ✔

Quality  of  Service  Management  (QOS)   ✔

✗���Not  planned   ✔

✗���Not  planned   ✔ ✔

Page 29: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   29  

Generates  Diagnos(c  Metrics  View  of  Cluster  and  Databases  Cluster  Health  Monitor  (CHM)  

•  Always  on  -­‐  Enabled  by  default  •  Provides  Detailed  OS  Resource  Metrics  

•  Assists  Node  evic@on  analysis  •  Locally  logs  all  process  data  •  User  can  define  pinned  processes  •  Listens  to  CSS  and  GIPC  events  •  Categorizes  processes  by  type  •  Supports  plug-­‐in  collectors  (ex.  traceroute,  netstat,  ping,  etc.)  

•  New  CSV  output  for  ease  of  analysis  

GIMR  

ologgerd    (master)  

osysmond  

12c  Grid  Infrastructure    Management  Repository  

OS  Data  

osysmond  

osysmond  

OS  Data  

OS  Data  

Page 30: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Introducing  Oracle  12c  Cluster  Health  Advisor  (CHA)  

• Real  @me  monitoring  of  Oracle  RAC  database  systems  and  their  hosts  •  Early  detec@on  of  impending  as  well  as  ongoing  system  faults  • Diagnoses  and  iden@fies  the  most  likely  root  causes  • Provides  correc@ve  ac@ons  for  targeted  triage.  • Generates  alerts  and  no@fica@ons  for  rapid  recovery  

30  

Proac(ve  Health  Prognos(cs  System    

Full  presenta@on:  hQp://www.oracle.com/technetwork/database/op@ons/clustering/ahf/learnmore/oracle-­‐12cr2-­‐cha-­‐3623186.pdf    

Recorded  WebSeminar:    hQps://www.youtube.com/watch?v=TbdkGsmSgcQ    

Page 31: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Cluster  Health  Advisor  (CHA)  Architecture  Overview  

31  

OS  Data  

GIMR  

ochad  

DB  Data  

CHM  

Node  Health  

Prognos(cs  Engine  

Database  Health  

Prognos(cs  Engine  

OS  Model  

DB  Model  

•  cha  –  Cluster  node  resource    •  Single  Java  ochad  daemon  per  node  

•  Reads  Cluster  Health  Monitor  data    directly  from  memory  

•  Reads  DB  ASH  data  from  SMR  w/o  DB  connec@on  

•  Uses  OS  and  DB  models  and  data  to  perform  prognos@cs  

•  Stores  analysis  and  evidence  in  the  GI  Management  Repository  

•  Sends  alerts  to  EMCC  Incident  Manager  per  target  

EMCC  Alert  

Page 32: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Cluster  Health  Advisor  -­‐  Scope  of  Problem  Detec@on  

• Over  30  node  and  database  problems  have  been  modeled  • Over  150  OS  and  DB  metric  predictors  iden@fied  • Problem  Detec@on  in  12.2.0.1  includes  – Interconnect  ,  Global  Cache  and  Cluster  Problems  – Host  CPU  and  Memory  ,  PGA  Memory  stress    – IO  and  Storage  Performance  issues  – Reconfigura@on  and  Recovery  issues  – Workload  and  Session  abnormal  varia@ons    

32  

Best  Effort  Immediate  Guided  Diagnosis    

Page 33: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   33  

Data  Sources  and  Data  Points    Cluster  Health  Advisor  

Time   CPU   ASM    IOPS    

Network  %  u(l  

Network_Packets  Dropped  

Log  file  sync  

Log  file  parallel  write  

GC    CR  request  

GC  current    request  

GC  current  block  2-­‐way  

GC  current  block  busy  

Enq:  CF    -­‐conten(on  

15:16:00   0.90   4100   13%   0   2  ms   600  us   0   0   300  us   1.5  ms     0  

A  CHA  Data  Point  contains  >  150  signals  (sta@s@cs  and  events)  from  mul+ple  sources  

OS,  ASM  ,  Network   DB  (  ASH,  AWR  session,  system  and  PDB  sta(s(cs  )  

       Sta@s@cs  are  collected  at  a  1  second  internal  sampling  rate  ,  synchronized,  smoothed  and  aggregated  to  a  Data  Point  every  5  seconds  

Page 34: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   34  

Models  Capture  the  Dynamic  Behavior  of  all  Normal  Opera?on    Models  Capture  all  Normal  Opera@ng  Modes  

0  

5000  

10000  

15000  

20000  

25000  

30000  

35000  

40000  

10:00   2:00   6:00  

5100  9025  

4024  

2350  

4100  

22050  10000  

21000  

4400  

2500  

4900  

800  

IOPS  

user  commits  (/sec)  

log  file  parallel  write  (usec)  

log  file  sync  (usec)  

•  Release  ships  with  conserva@ve  models  to  minimize  false  warnings  •  A  model  captures  the  normal  load  phases  and  their  sta@s@cs  over  @me,  and  thus  the  characteris@cs  for  all  load    

intensi@es  and  profiles.  During  monitoring,  any  data  point  similar  to  one  of  the  vectors  is  NORMAL.    •  One  could  say  that  the  model  REMEMBERS  the  normal  opera?onal  dynamics  over  ?me  

In-­‐Memory  Reference  Matrix  (Part  of  “Normality”  Model)    

IOPS   ####   2500   4900   800   ####  

User  Commits   ####   10000   21000   4400   ####  

Log  File  Parallel  Write   ####   2350   4100   22050   ####  

Log  File  Sync   ####   5100   9025   4024   ####  

…   …   …   …   …   …  

Page 35: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   35  

CHA  Model:  Find  Similarity  with  Normal  Values    Cluster  Health  Advisor  

Observed  values  (Part  of  a  Data  Point)    

CHA  es(mator/predictor:  “based  on  my  normality  model,  the  value  of  IOPS  should  be  in  the  vicinity  of  ~  4900,  but  it  is  reported  as  10500,  this  is  causing  a  residual  of  ~  5600  in  magnitude”,  

CHA  fault  detector:  “such  high  magnitude  of  residuals  should  be  tracked  carefully!  I’ll  keep  an  eye  on  the  incoming  sequence  of  this  signal  IOPS  and  if  it  remains  deviant  I’ll  generate  a  fault  on  it”.

In-­‐Memory  Reference  Matrix  (Part  of  “Normality”  Model)    

IOPS   ####   2500   4900   800   ####  

User  Commits   ####   10000   21000   4400   ####  

Log  File  Parallel  Write   ####   2350   4100   22050   ####  

Log  File  Sync   ####   5100   9025   4024   ####  

…   …   …   …   …   …  

10500  

20000  

4050  

10250  

…  

Residual  Values  (Part  of  a  Data  Point)    

5600  

-­‐1000  

-­‐50  

325  

…  

     Observed  -­‐          Predicted  =  

Page 36: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Cluster  Health  Advisor  (CHA)  Opera@on  Overview  

36  

•  SRVCTL  lifecycle  daemon  management    •  Enabled  by  default  -­‐  Ac@vates  when  1st  RAC  instance  starts  

•  New  CHACTL  command  line  tool  for  all  local  opera@ons    

•  Java  GUI  Tool  available  on  OTN  soon  •  Integrated  into  EMCC  Incident  Manager  and  no@fica@ons  

•  Monitoring  has  no  impact  on    DB  performance  or  availability    

CHACTL  Client  

CHA  Java  GUI  Client  

SRVCTL    

OS  Data  

GIMR  

DB  Data  

CHM  

Node  Health  

Prognos(cs  Engine  

Database  Health  

Prognos(cs  Engine  

OS  Model  

DB  Model  

Local  to  Cluster  

EM  Cloud  Control  

CHADDriver  

Page 37: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

CHA  Command  Line  Opera@ons  

37  

Checking  for  Health  Issues  and  Correc(ve  Ac(ons  with  CHACTL  QUERY  DIAGNOSIS   $ chactl query diagnosis -db oltpacdb -start "2016-10-28 01:52:50" -end "2016-10-28 03:19:15" 2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_1) [detected] 2016-10-28 01:47:10.0 Database oltpacdb DB Control File IO Performance (oltpacdb_2) [detected] 2016-10-28 02:59:35.0 Database oltpacdb DB Log File Switch (oltpacdb_1) [detected] 2016-10-28 02:59:45.0 Database oltpacdb DB Log File Switch (oltpacdb_2) [detected] Problem: DB Control File IO Performance Description: CHA has detected that reads or writes to the control files are slower than expected. Cause: The Cluster Health Advisor (CHA) detected that reads or writes to the control files were slow because of an increase in disk IO. The slow control file reads and writes may have an impact on checkpoint and Log Writer (LGWR) performance. Action: Separate the control files from other database files and move them to faster disks or Solid State Devices. Problem: DB Log File Switch Description: CHA detected that database sessions are waiting longer than expected for log switch completions. Cause: The Cluster Health Advisor (CHA) detected high contention during log switches because the redo log files were small and the redo logs switched frequently.

Action: Increase the size of the redo logs.    

Page 38: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Cluster  Health  Advisor  –  Command  Line  Opera@ons  

38  

HTML  Diagnos(c  Health  Output  Available  (-­‐html  <file_name>)  

Page 39: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Using  EMCC  for  Alerts  and  Correc@ve  Ac@ons  

39  

Page 40: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   40  

Using  the  CHA  GUI  to  Perform  Root-­‐Cause  Analysis  Overview    

•  Standalone  Java  GUI  Client  •  Must  be  run  on  local  cluster  node  

•  Can  be  run  against  live  GIMR  or  MDB  (dump)  file  chactl export repository -format mdb -start '2017-05-01 00:00:00' -end '2017-05-10 00:00:00'

•  Used  internally  for  development  

• Will  be  available  and  maintained  on  Oracle  Technology  Network  soon.  

Page 41: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Calibra@ng  CHA  to  your  RAC  Deployment  

• Calibra@on  Goal:  Increase  sensi@vity  and  accuracy  with  sufficient  warning  • Release  ships  with  conserva@ve  models  to  minimize  false  warnings  – DEFAULT_CLUSTER  for  each  cluster  node  – DEFAULT_DB  for  each  database  instance  

• Use  your  own  data  for  periods  of  “normal  opera@ons”  to  increase  sensi@vity  – Recommended  minimum  6  hour  period    – Should  include  all  normal  workload  phases  for  that  model  

• Models  may  be  changed  dynamically  online  using  CHACTL    

41  

Overview    

Page 42: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Calibra@ng  CHA  to  your  RAC  deployment  

42  

Choosing  a  Data  Set  for  Calibra(on  –  Defining  “normal”  $ chactl query calibration –cluster –timeranges ‘start=2016-10-28 07:00:00,end=2016-10-28 13:00:00’ Cluster name : mycluster Start time : 2016-10-28 07:00:00 End time : 2016-10-28 13:00:00 Total Samples : 11524 Percentage of filtered data : 100% 1) Disk read (ASM) (Mbyte/sec) MEAN MEDIAN STDDEV MIN MAX 0.11 0.00 2.62 0.00 114.66 <25 <50 <75 <100 >=100 99.87% 0.08% 0.00% 0.02% 0.03% 2) Disk write (ASM) (Mbyte/sec) MEAN MEDIAN STDDEV MIN MAX 0.01 0.00 0.15 0.00 6.77 <50 <100 <150 <200 >=200 100.00% 0.00% 0.00% 0.00% 0.00%

3) Disk throughput (ASM) (IO/sec) MEAN MEDIAN STDDEV MIN MAX 2.20 0.00 31.17 0.00 1100.00 <5000 <10000 <15000 <20000 >=20000 100.00% 0.00% 0.00% 0.00% 0.00% 4) CPU utilization (total) (%) MEAN MEDIAN STDDEV MIN MAX 9.62 9.30 7.95 1.80 77.90 <20 <40 <60 <80 >=80 92.67% 6.17% 1.11% 0.05% 0.00%

Page 43: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Calibra@ng  CHA  to  your  RAC  deployment  

•  Create  and  store  the  new  model   $ chactl query calibrate cluster –model daytime –timeranges ‘start=2016-10-28 07:00:00, end=2016-10-28 13:00:00’

•  Begin  using  the  new  model   $ chactl monitor cluster –model daytime

•  Confirm  the  new  model  is  being  used   $ chactl status –verbose

monitoring nodes svr01, svr02 using model daytime monitoring database qoltpacdb, instances oltpacdb_1, oltpacdb_2 using model DEFAULT_DB

43  

Crea(ng  a  new  CHA  Model  with  CHACTL  

Page 44: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Program  Agenda  

High  Availability  Improvements  

Con@nuous  Availability  Features  

1  

2  

44  

Page 45: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Availability  for  applica@ons  –    Applica(on  Con(nuity  

45  

Availability  during  Planned  Maintenance    

Con@nues  Availability    

Page 46: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Availability  for  applica@ons  –    Applica(on  Con(nuity  

46  

Availability  during  Planned  Maintenance    

Con@nues  Availability    

Page 47: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Oracle  Real  Applica(on  Clusters  12c  Release  2    Con(nuous  Service  Availability    

Real  Applica(on  Service  Levels  

•   Scales  PDBs  and  Services  

•   2  second  detec@on  on  EXA  

•   Recovery  in  low  seconds  

•   Drains  work  gradually    

•   Recovers  in-­‐flight  with  AC  

“Always  Running”        

47  

Page 48: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

• Recover  in-­‐flight  with  Applica@on  Con@nuity    • ADG  sessions  survive  standby  role  change    • Drain  then  switchover,                    AC  recovers  stragglers  

Switchover to <db_resource_name> [wait]

 

       FAILOVER  

Data  Guard  Observer  

RAC  Primary   RAC  Standby  Site  A   Site  B  

Oracle  Ac(ve  Data  Guard  12c  Release  2    Con(nuous  Service  Availability    

48  

Page 49: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

§  Replays  in-­‐flight  work  on  recoverable  errors  

§  Masks  hardware,  sowware,  network,  storage  errors  and  @meouts  

§  12.1    JDBC-­‐Thin,  UCP,  WebLogic  Server,  3rd  Party  Java  applica@on  servers  

§         OCI,  ODP.NET  unmanaged,  JDBC  Thin  on  XA,  Tuxedo,  SQL*Plus  

§  RAC,  RAC  One,  &  Ac@ve  Data  Guard    

In-­‐flight  work  con(nues  

Applica(on  Con(nuity  

49  

12.2  

Page 50: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

1  –  Normal  Opera(on  

• Client  marks  database  requests  

• Server  decides  which  calls  can  &  cannot  be  replayed  

• Directed,  client  holds  original  calls,  their  inputs,  and  valida@on  data  

2  –  Outage  Phase  1:  Reconnect  

• Checks  replay  is  enabled  

• Verifies  @meliness  

• Creates  a  new  connec@on  

• Checks  target  database  is  valid  for  replay  

• Uses  Transac@on  Guard  to  guarantee  last  outcome  

50  

3  –  Outage  Phase  2:  Replay    

• Replays  captured  calls  

• Ensures  results  returned  to  app  match  original  

•   On  success,  returns  control  to  the  applica@on  

Under  the  Covers  

Page 51: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   51  

Steps  to  use  Applica@on  Con@nuity  

Check   What  to  do  

Iden@fy  Requests   Return  connec(ons  to  pool  -­‐  UCP,  WebLogic  Ac@ve  GridLink,    3rd  Party  Containers  using  UCP  ,  OCI  Session  Pool,  ODP.NET  Unmanaged,  Tuxedo    

JDBC  Deprecated  Classes   Replace  non-­‐standard  classes  (MOS  1364193.1);      Use  AC  orachk  to  know  

Side  Effects   Use  disable  or  another  connec@on  if  a  request  should  not  be  replayed    

Callbacks  UCP  and  WLS  –  with  labels  do  nothing.    12.2  set  FAILOVER_RESTORE=LEVEL1  Else  register  a  callback  for  applica@ons  that  change  state  outside  requests    

Mutable  Func@ons   Grant  keeping  mutable  values,  e.g.  sequence.nextval  

Page 52: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Run  the  AC  Assessments

52  

How  effec@ve  is  Applica@on  Con@nuity  for  your  applica@on  ?  Where  Applica@on  Con@nuity  is  not  in  effect  -­‐  what  steps  need  to  be  taken  ?

No Steps

1 Analyze  and  Report  Coverage

2 Report  usage  of  deprecated  Java  Classes

Assessment  tool input

output

Applica@on  traces  

user Out  put  

orachk

read

h=ps://blogs.oracle.com/WebLogicServer/entry/using_orachk_for_coverage_analysis    

52  

Available  in  ORAchk  

Page 53: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

For  owned  sequences:  ALTER  SEQUENCE..  [sequence]    [KEEP|NOKEEP];    

CREATE  SEQUENCE..  [sequence]    [KEEP|NOKEEP];  

Grant  and  Revoke  for  other  users:  

GRANT  [KEEP  DATE  TIME  |  KEEP  SYSGUID]  [to    USER]    

REVOKE  [KEEP  DATE  TIME  |  KEEP  SYSGUID]  [from  USER]    

GRANT  KEEP  SEQUENCE  on  [sequence]  [to    USER]  ;    

REVOKE    KEEP  SEQUENCE  on  [sequence]  [from  USER]    

 53  

Grant  Mutables  Keep  original  func@on  results  at  replay  

Page 54: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Decide  if  any  requests  should  not  be  replayed  

e.g.    Autonomous  Transac@ons  UTL_HTTP  UTL_URL  UTL_FILE  UTL_FILE_TRANSFER  UTL_SMTP  UTL_TCP  UTL_MAIL  DBMS_JAVA  callouts  EXTPROC  

 

 

54  

Don’t  Want  to  Replay  Disable  replay  for  requests  that  should  not  be  replayed  

 

Use  another  connec(on  or  disable  API  

             

Page 55: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Configura@on  

FAILOVER_TYPE  =  TRANSACTION  for  Applica@on  Con@nuity  

FAILOVER_RESTORE  =  LEVEL1    for  common  states  restored  at  failover  

AQ_HA_NOTIFICATIONS=True  for  FAN  with  OCI  driver  ,  ODP.NET,  Tuxedo,  SQL*Plus    

 

55  

For  Java  

Set  Service  A=ributes  

Use a replay data source (local or XA) replay datasource=oracle.jdbc.replay.OracleDataSourceImpl For  OCI,  ODP.NET,  Tuxedo,  SQL*Plus  

 On when enabled on the service

Page 56: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Killing  Sessions  -­‐  Extended  DBA  Command   Replays  

alter  system  kill  session  …  noreplay   BEST  METHOD  

dbms_service.disconnect_session([service],  dbms_service.noreplay)   BEST  METHOD  

srvctl  stop  service  -­‐db  orcl  -­‐instance  orcl2    -­‐force   YES  

srvctl  stop  service  -­‐db  orcl  -­‐node  rws3  -­‐force   YES  

srvctl  stop  service  -­‐db  orcl  -­‐instance  orcl2    –noreplay  -­‐force  

srvctl  stop  service  -­‐db  orcl  -­‐node  rws3  –noreplay  -­‐force  

alter  system  kill  session  …  immediate   YES  

56  

Page 57: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Availability  for  applica@ons  –    Applica(on  Con(nuity  

57  

Availability  during  Planned  Maintenance    

Con@nues  Availability    

Page 58: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

•  Complex  build  process  repeated  for  each  node  

•  Error  prone  •  Longest  down-­‐@me  and  maintenance  window  

•  Have  to  create  backup  (no  built-­‐in  fallback  plan)  

•  How  do  you  enforce  standardiza@on?  

 

•  Build  gold  image  once,    use  everywhere  

•  Fewest  steps,  simplest  process  

•  Shortest  down-­‐@me  and  maintenance  window  

•  Built-­‐in  Fallback  •  Built-­‐in  standardiza@on  

•  Complex  build  process  repeated  for  each  node  

•  Error  prone  •  Shorter  down-­‐@me  and  maintenance  window  

•  Built-­‐in  Fallback  •  How  do  you  enforce  standardiza@on?    

58  

What  is  the  best  way  to  apply  maintenance?  

1            2            3   1                      2                        3   1                                  2  

Update  in  Place   Clone,  Update  and  Switch   Deploy  Gold  Image,  Switch  

Page 59: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

 •  Driw  not  seen  un@l  scan  takes  places  

•  Scanning  unchanged  targets  is  unnecessary  work  

•  Does  not  prevent  driw    

 •  No  @me  lag  between  driw  and  alert  

•  No  extra  work  •  Does  not  prevent  driw  

59  

 

•  Locked  configs  cannot  driw  •  Can  trigger  alert  if  unauthorized  changes  aQempted  

•  Can  trigger  alert  if  authorized  changes  made    

What  is  the  best  approach  to  handling  sowware  driw?  

Scan    

Trigger  Alert    

Prevent    

Page 60: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Streamline  the  Distribu@on  Process  

•  Ship  only  once  – To  a  customer,  to  a  site,  to  a  pool  

•  Ship  to  interested  par@es  only  – Subscribers  •  Ship  only  what  is  necessary  – Updated  Modules,  Updated  Files,  Updated  Blocks  

• Deploy  non-­‐disrup@vely  – Ship  any  @me,  choose  when  to  use  it  

 

60  

Customer  

Page 61: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   61  

•  Simple  •  Prevent  errors,  enable  easy  correc@ons  •  Use  Gold  Images  for  all  scenarios    •  Enable  mass  opera@ons                on  1000s  of  nodes  

Rapid  Home  Provisioning  and  Maintenance  

Page 62: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Build  Inventory  of  Gold  Images  

62  

Create  once  on  RHP  Server    

Installed  homes  

11.2.0.4.1

DB  

12.1.0.2 Custom

RHP  Server  •   Uptake  current  estate  by  promo(ng  exis(ng  homes  to  gold  images  

•   Create  new  homes  and  promote  to  gold  images  axer  valida(on    

•   Assign  states  to  images  for  lifecycle  management  

GRID 11.2.0.4.3

WLS 12.2.1

•   Oracle  internal  users:  import  image  from  GIaaS  Grid

Page 63: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |   63  

Supported  targets  and  environments  Manage  exis(ng  and  create  new  Pools,  Homes,  and  Databases  

• Patch  and  Upgrade  exis@ng  deployments  – No  pre-­‐requisites  (config,  agent,  daemon…)  for  targets  – Database  and  Grid  Infrastructure  11.2.0.3,  11.2.0.4,  12.1.0.2,  12.2.0.1    

•   Provision,  Scale,  Patch  and  Upgrade  new  Clusters  and  Databases    – 11.2.0.4,  12.1.0.2,  12.2.0.1  

• Bare  metal,  VMs,  CDBs,  non-­‐CDBs  •  SI  (standalone,  Restart,  Grid  Infr),  RAC  One,  RAC    •  Linux,  Solaris,  AIX  • Generic  sowware  homes  

Page 64: Oracle RAC 12c Rel. 2 for Continuous Availability

Copyright  ©  2016,  Oracle  and/or  its  affiliates.  All  rights  reserved.    |  

Easy  to  create  Server,  start  managing  current  estate    

• RHP  Server  fully  self-­‐contained  – Commodity  hardware  or  engineered  systems,  can  be  clustered  for  HA  – Enable  with  single  srvctl  command  – Lightweight  -­‐  can  co-­‐exist  with  other  func@ons  

• No  new  sowware  needed  on  targets    • No  run-­‐@me  dependency  between  Server  and  targets  

64  

Page 65: Oracle RAC 12c Rel. 2 for Continuous Availability