sparc migration doag

59
striking SPARCs a migration story Björn Rost @brost

Transcript of sparc migration doag

striking SPARCs a migration story

Björn Rost

@brost

Björn Rost

• founder, manager and DBA

• president RAC SIG

• ACE Director

about us

• Software production company founded 2001

• mostly J2EE

• logistics

• telecommunication

• media and publishing

• customers demand full lifecycle support

• hardware resale

• datacenter operations

• 3rd party software

INTRO!!!

• motivation

• Oracle VM for SPARC

• estimating performance impact

• SLOB

• data migration TTS

• issues

• ASH/AWR comparisons

agenda

the situation

• 2TB OLTP RAC database

• x4150 servers, 4 cores each

• max memory 36GB

• workload IO-bound

hw refresh options

x4170 M2 T4-1

CPU 4c 2.4GHz 8c 2.85GHz

RAM max 72GB max 512GB

virtualization zones LDom, zones

lifecycle eol 08/12 available

why SPARC?

• license: hard partitioning

• high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

hard partitioning• license: hard partitioning • high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

How much of your CPU is idle?

hard partitioning

• how much CPU is needed?

• pay only for what you need

• pay-as-you-grow

• add cores to VM one-by-one

68,000

1,221,760

SW license EE,RAC,PART,DIAG, TUNE + 3years supportHW 2*T4-1, 128GB RAM, 3 years support

• license: hard partitioning • high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

hard partitioning

• client already licensed: 2 nodes, 4 cores each

• cpu utilization <75% (ok)

• modern CPUs have >4 cores

• and become more powerful per core

• why pay for cores that are not needed?

• license: hard partitioning • high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

hard partitioning

http://www.oracle.com/us/corporate/pricing/partitioning-070609.pdf

• soft partitioning (license whole box)

• vmWare

• several others

• hard partitioning (license a subset)

• Solaris capped Zones

• Oracle VM (with special config)

• Oracle VM for Sparc (LDom)

• some special (mainframe) hw

• dsd, lpar, vpar

• license: hard partitioning • high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

hard partitions on x64

• Oracle VM with pinned CPUs

• overhead (especially for IO)

• no more O-Motion or OVM failover

• Solaris Capped Zone

• already need Solaris (on Oracle HW?)

• small extra step to go sparc

• no “hard” isolation

• RAC is supported, but would you?

• license: hard partitioning • high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

OVM for SPARC

• hypervisor built in hardware

• zero overhead

• strict isolation of CPU, mem, IO

• PCIe dio for HBAs

• supported with RAC

• supported for hard partitioning

• license: hard partitioning • high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

OVM for SPARC• license: hard partitioning • high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

OVM for SPARC• license: hard partitioning • high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

VM challenges

• ovoid overhead

• CPU

• mem

• clocks, RT scheduling

• don’t waste IO latency in vm layers

• make VM as robust as possible

• other VMs must not influence prod VM

• license: hard partitioning • high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

LDom PCIe DIO

root@primary:~# ldm list-io -l!NAME TYPE BUS DOMAIN STATUS !---- ---- --- ------ ------ !niu_0 NIU niu_0 primary ![niu@480]!niu_1 NIU niu_1 primary ![niu@580]!pci_0 BUS pci_0 primary IOV ![pci@400]!pci_1 BUS pci_1 primary IOV ![pci@500]!/SYS/MB/PCIE0 PCIE pci_0 db1prod OCC ![pci@400/pci@2/pci@0/pci@8]! SUNW,assigned-device@0! SUNW,assigned-device@0,1!/SYS/MB/PCIE2 PCIE pci_0 primary EMP ![pci@400/pci@2/pci@0/pci@4]!/SYS/MB/SASHBA PCIE pci_0 primary OCC ![pci@400/pci@2/pci@0/pci@e]! scsi@0/iport@1! scsi@0/iport@2! scsi@0/iport@80/cdrom@p7,0! scsi@0/iport@v0/disk@w365ae6ad45951589,0!/SYS/MB/NET0 PCIE pci_0 primary OCC ![pci@400/pci@1/pci@0/pci@4]! network@0! network@0,1!/SYS/MB/PCIE1 PCIE pci_1 db1test OCC ![pci@500/pci@2/pci@0/pci@a]! SUNW,assigned-device@0! SUNW,assigned-device@0,1!

http://portrix-systems.de/blog/brost/using-direct-io-with-ldoms/

• license: hard partitioning • high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

Sol 11 vnet

• sol11 crossbow network virtualization

• LACP

• DLMP (new, great if your switch does not support bonding)

• build vnics on top

• everything is possible

• license: hard partitioning • high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

memory per CPU

• memory means caching

• caching means less IO

• you don’t license DB per GB Ram

• so the more the merrier

• NUMA

• performance penalty for SMP

• license: hard partitioning

• high memory/CPU ratio • product lifecycle

• reliability

• Oracle-on-Oracle

memory per CPU• license: hard partitioning

• high memory/CPU ratio • product lifecycle

• reliability

• Oracle-on-Oracle

x4170 M2 (nehalem)

X3-2 (sandy bridge)

X4-2 (sandy bridge)

3-channel Intel E5

T4

T5

0 128 256 384 512

512GB

512GB

384GB

256GB

256GB

72GB

memory per CPU

product lifecycle

!

• the x4170 M2 is EOL since 08/2012

• the t4 is still being sold

• SUN even had roadmaps that guaranteed availability

• license: hard partitioning

• high memory/CPU ratio

• product lifecycle • reliability

• Oracle-on-Oracle

RAS

• T4-4 has hot-swap PCI

• needed for RAC?

• average MTBF?

• perceived reliability?

• license: hard partitioning

• high memory/CPU ratio

• product lifecycle

• reliability • Oracle-on-Oracle

Oracle on Oracle

• engineered together

• optimized for Oracle DB workloads

• support under one roof

• license: hard partitioning

• high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

planning

• CMT history not all bright

• horrible single-thread performance

• results in inacceptable user experience

• insanely expensive RAM

perf estimates

• benchmark!

• find an estimate of impact on latency

• show better throughput at high utilization

perf estimates

problem

!

we did not have our own test machine (yet)

and no useful public benchmarks available

!

but Oracle was kind enough to run some tests for us

perf benchmarks

• loop around sql that works from buffer cache (eliminate IO)

• count executions per second

• we wrote this ourselves first

perf benchmarkstim

e

0s

125s

250s

375s

500s

threads per core

0,5 1 2 4

x5570 @2.93GHzT4 @2.85GHz

results - latency

• at very low utilization

• 90s vs 119s (30%)

• this is intel turbo-boost

• at 1 thread/core same response time

results - throughput

• T4 handles concurrency much better

• hw threads work really well

• ~30% less response time at 4threads/core

results - summary

• single thread “in the same ballpark”

• multi-threaded advantage for T4

• more work/core = more work/$license$

deal!

deal!

deal!

more benchmarks

• SLOB

• LIO test mode

• there are “modes” for PIO, writes, ...

• one suite of tools for multiple platforms

• very simple (and open) logic

• easy to set up, easy to compare

http://kevinclosson.wordpress.com/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/

SLOB intro1. download

2. unzip

3. compile (minor tweaking for solaris)

4. ./setup.sh for users (~80MB table per user)

5. set SGA large enough to hold all test data

6. modify readers.sh to loop 500k times

7. runit.sh with different # of threads

8. look at SLOBops/s

9. will generate AWR report for LIO and waits

SLOB setup

DECLARE !x NUMBER :=1;!fluff varchar2(128) := 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';!!BEGIN!FOR i IN 1..10000 LOOP! insert into seed values (x,fluff, NULL, NULL, NULL, NULL, ! NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, fluff);! x := x + 1;!!END LOOP;

SLOB setup

insert into cf1 select * from user1.seed where rownum = 1 ;!commit;!!alter table cf1 minimize records_per_block;!truncate table cf1 ;!commit;!!insert /*+ APPEND */ into cf1 select * from user1.seed order by dbms_random.value();!commit;!!create unique index i_cf1 on cf1(custid) NOPARALLEL PCTFREE 0 tablespace $TABLESPACE;!!alter index i_cf1 SHRINK SPACE COMPACT;!!exec DBMS_STATS.GATHER_TABLE_STATS('$user', 'cf1', estimate_percent=>100, block_sample=>TRUE, degree=>2);!!

SLOB reader.sql

DECLARE!x NUMBER := 0;!v_r PLS_INTEGER;!!BEGIN!dbms_random.initialize(UID * 7777);!!FOR i IN 1..500000 LOOP! v_r := dbms_random.value(257, 10000) ;! SELECT COUNT(c2) into x FROM cf1 where custid > v_r - 256 AND custid < v_r;!END LOOP;!!END;!/!

SLOBops/s

#> ./runit.sh 0 64!Tm 1179

64*500000 SLOBops / 1179s = 27142 SLOBops/s

27142 SLOBops/s / 8 cores = 3393 SLOBops/s/c

SLOB results

peak SLOBops/s per core

0

1250

2500

3750

5000

x5570 E5-2640 E5-2690 SPARC64 T4 T5

thanks to Philippe Fierens for additional data twitter: @pfierens blog: http://pfierens.blogspot.de/

SLOB results

thanks to Philippe Fierens for additional data twitter: @pfierens blog: http://pfierens.blogspot.de/

time to complete 500k SLOB loops

time

0s

400s

800s

1200s

1600s

threads per core

0,25 0,5 1 2 4 8

E5-2640 T4 T5

data migration

• ~1800GB

• ~8 hours of downtime at night acceptable

data migration

data migration

• rman backup/restore

• not across endianness

• rman convert database

• only supported for migration sparc to exadata

• datapump export/import?

• too long

TTS to the rescue

• cross-platform-transportable tablespace

• metalink 371556.1

TTS steps

1. check prerequisites

2. make TS read-only

3. dp export metadata

4. rman convert tablespace TBS to platform

5. move TBS copies to destination

6. import metadata

7. make TBS rw

TTS data copy

• move data on SAN

• avoid 1GbE network copy

• ASM

• nope, partitions not compatible

• ZFS!

• works with block device

• automatic endian conversion

issues

• clusterware does not know netn (bug 13604285)

• falls back to “generic” probing for interconnect health

• trouble with network virtualization (eviction issues until sol patch)

• we forgot to transfer SPM baselines (stupid us)

post mortem

• how to measure succes for the change?

• is “stuff” faster?

• by how much?

post mortem ASH

• ASH/AWR is fantastic

• archive of SQL performance data

• compare avg sql runtime

• before and after change

ASH

• automatically on by default

• sampled, detailed activity data

• samples taken (in mem) every second

• v$active_session...

• write AWR snapshots to disk

• DBA_HIST_ACTIVE_SESSION...

• increase default keep time

• can keep some baselines forever

• awrextr.sql/awrload.sql after migration

post mortem AWR

SELECT!sql_id,!ROUND(sum(elapsed_time_delta)/sum(executions_delta)) exectime,!sum(executions_delta) executions,!sum(elapsed_time_delta) total_time!FROM dba_hist_sqlstat a!JOIN dba_hist_snapshot b!ON a.snap_id = b.snap_id!WHERE b.begin_interval_time > to_timestamp('201301220845','YYYYMMDDHH24MI') !and b.end_interval_time < to_timestamp('201301221345','YYYYMMDDHH24MI')!AND executions_delta>0!group by sql_id

post mortem AWRwith new_system as (SELECT!sql_id,!ROUND(sum(elapsed_time_delta)/sum(executions_delta)) exectime,!sum(executions_delta) executions,!sum(elapsed_time_delta) total_time!FROM dba_hist_sqlstat a!JOIN dba_hist_snapshot b!ON a.snap_id = b.snap_id!WHERE b.begin_interval_time > to_timestamp('201301220845','YYYYMMDDHH24MI') !and b.end_interval_time < to_timestamp('201301221345','YYYYMMDDHH24MI')!AND executions_delta>0!group by sql_id),!old_system as (SELECT!sql_id,!ROUND(sum(elapsed_time_delta)/sum(executions_delta)) exectime,!sum(executions_delta) executions,!sum(elapsed_time_delta) total_time!FROM dba_hist_sqlstat a!JOIN dba_hist_snapshot b!ON a.snap_id = b.snap_id!WHERE b.begin_interval_time > to_timestamp('201301150845','YYYYMMDDHH24MI') !and b.end_interval_time < to_timestamp('201301151345','YYYYMMDDHH24MI')!AND executions_delta>0!group by sql_id)!select new_system.sql_id, !new_system.exectime newtime, !old_system.exectime oldtime, !round(greatest(old_system.exectime,newdb.exectime)/least(old_system.exectime,newdb.exectime)*sign(old_system.exectime-new_system.exectime),0) speedup,!new_system.executions, !new_system.total_time newtime, !old_system.total_time oldtime from new_system, old_system where new_system.sql_id = old_system.sql_id!order by new_system.total_time desc;

post mortem AWRwith new_system as (SELECT!sql_id,!ROUND(sum(elapsed_time_delta)/sum(executions_delta)) exectime,!sum(executions_delta) executions,!sum(elapsed_time_delta) total_time!FROM dba_hist_sqlstat a!JOIN dba_hist_snapshot b!ON a.snap_id = b.snap_id!WHERE b.begin_interval_time > to_timestamp('201301220845','YYYYMMDDHH24MI') !and b.end_interval_time < to_timestamp('201301221345','YYYYMMDDHH24MI')!AND executions_delta>0!group by sql_id),!old_system as (SELECT!sql_id,!ROUND(sum(elapsed_time_delta)/sum(executions_delta)) exectime,!sum(executions_delta) executions,!sum(elapsed_time_delta) total_time!FROM dba_hist_sqlstat a!JOIN dba_hist_snapshot b!ON a.snap_id = b.snap_id!WHERE b.begin_interval_time > to_timestamp('201301150845','YYYYMMDDHH24MI') !and b.end_interval_time < to_timestamp('201301151345','YYYYMMDDHH24MI')!AND executions_delta>0!group by sql_id)!select new_system.sql_id, !new_system.exectime newtime, !old_system.exectime oldtime, !round(greatest(old_system.exectime,newdb.exectime)/least(old_system.exectime,newdb.exectime)*sign(old_system.exectime-new_system.exectime),0) speedup,!new_system.executions, !new_system.total_time newtime, !old_system.total_time oldtime from new_system, old_system where new_system.sql_id = old_system.sql_id!order by new_system.total_time desc;

post mortem ASH!SQL_ID           NEWTIME    OLDTIME    SPEEDUP EXECUTIONS    NEWTIME    OLDTIME!------------- ---------- ---------- ---------- ---------- ---------- ----------!94qfy61wzr979     756417    9169090         12       5142 3889494264 1.0323E+11!9fqj3jnfyupvf     894891   15599003         17       2920 2613080481 1.1609E+11!82gkmapmt6apc        726        300         -2    3462071 2515070807 1211844016!5bzjuzcrh52gy  577309159 1170336372          2          4 2309236634 4681345488!gfrtvm879j1ry     517999     176268         -3       3469 1796936890  284848872!annj0ct5d49mp     563926    8964813         16       2727 1537827089 2.0153E+10!a5h8sbx7b4gzf   18930618  491125857         26         54 1022253386 1.3752E+10!ggcsmyw22ygpr     825227     233433         -4       1221 1007601900  465464972!bsjug2fdvkn3a      85490     652245          8      11356  970824082 2.4279E+10!1cphs8cnbhpsm        275      17161         62    1755282  482780358 5.0277E+10!d739f3ystssbf     125212     226918          2       2714  339825406  168372874!crjus93xycs2w   48755724  249751087          5          6  292534344 1998008696!6mrjk196ayz7r       1423      26445         19     192648  274203984 7557523980!6j5f8rc00kb0x        838      19019         23     193281  161903300 5446650848!abr4rgpaagv17       1045      18193         17     147923  154573924 1068976356!09caxbcryq43r        945      18916         20     162908  153892043 4546423200!79wuh7f3rfqfy        376       5280         14     287087  107972760 2275145736!dvmdvrsmhz76b  104031863  160488695          2          1  104031863  320977390!7wra7pmvddday   96859189   35919437         -3          1   96859189   71838874!7bqnjnuptpatm      10585     116958         11       8927   94490590 1.0220E+10!3aj76umkhstnf       2392      18974          8      33688   80588976  167996156!0bc7w9wr441zq        488       7678         16     164108   80126122 1852890834!akrvwc6urkax3       1047      43265         41      70221   73524606 1337942240!0k4nn1g1fwqr8        348      19872         57     191774   66719694 5630835446!agxyq3xtmbtxt        196       2172         11     251529   49349839  895362046!12qw5z09ufub8        267      15673         59     140150   37465359 1934266074!g434abph3dq42        172       1866         11     189748   32709417  509570152!9vp8s2hu7sg41        185       1494          8     166841   30867239  203240886!gag61u6hjrsvk       8658     124536         14       3091   26760809 4033231394!56f1b4nwg7vc9        163       3828         23      67637   11008187 1955368074!gxk98nqkg9rkz        196       5322         27      47439    9285292 1394379424!69subccxd9b03        668      47280         71       4017    2683163 1985395704!c618bg1pzysfb      22832    1678854         74          7     159822   30219378!3t5zba5fc90aj        987      50171         51        155     152919    1705820

Summary

• reconsider SPARC platform

• save money with hard partitioning

• know and utilize your benchmark tools

• don’t fear endian conversion

• use your ASH/AWR data

thank you

!

!

twitter: @brost

http://portrix-systems.de/blog/

[email protected]