sparc migration doag

striking SPARCs a migration story

Björn Rost

@brost

Björn Rost

• founder, manager and DBA

• president RAC SIG

• ACE Director

about us

• Software production company founded 2001

• mostly J2EE

• logistics

• telecommunication

• media and publishing

• customers demand full lifecycle support

• hardware resale

• datacenter operations

• 3rd party software

INTRO!!!

• motivation

• Oracle VM for SPARC

• estimating performance impact

• SLOB

• data migration TTS

• issues

• ASH/AWR comparisons

agenda

the situation

• 2TB OLTP RAC database

• x4150 servers, 4 cores each

• max memory 36GB

• workload IO-bound

hw refresh options

x4170 M2 T4-1

CPU 4c 2.4GHz 8c 2.85GHz

RAM max 72GB max 512GB

virtualization zones LDom, zones

lifecycle eol 08/12 available

why SPARC?

• license: hard partitioning

• high memory/CPU ratio

• product lifecycle

• reliability

• Oracle-on-Oracle

hard partitioning• license: hard partitioning • high memory/CPU ratio


• reliability


How much of your CPU is idle?

hard partitioning

• how much CPU is needed?

• pay only for what you need

• pay-as-you-grow

• add cores to VM one-by-one

68,000

1,221,760

SW license EE,RAC,PART,DIAG, TUNE + 3years supportHW 2*T4-1, 128GB RAM, 3 years support

• license: hard partitioning • high memory/CPU ratio


• reliability


hard partitioning

• client already licensed: 2 nodes, 4 cores each

• cpu utilization <75% (ok)

• modern CPUs have >4 cores

• and become more powerful per core

• why pay for cores that are not needed?



• reliability


hard partitioning

http://www.oracle.com/us/corporate/pricing/partitioning-070609.pdf

• soft partitioning (license whole box)

• vmWare

• several others

• hard partitioning (license a subset)

• Solaris capped Zones

• Oracle VM (with special config)

• Oracle VM for Sparc (LDom)

• some special (mainframe) hw

• dsd, lpar, vpar



• reliability


http://www.oracle.com/us/corporate/pricing/partitioning-070609.pdf

hard partitions on x64

• Oracle VM with pinned CPUs

• overhead (especially for IO)

• no more O-Motion or OVM failover

• Solaris Capped Zone

• already need Solaris (on Oracle HW?)

• small extra step to go sparc

• no “hard” isolation

• RAC is supported, but would you?



• reliability


OVM for SPARC

• hypervisor built in hardware

• zero overhead

• strict isolation of CPU, mem, IO

• PCIe dio for HBAs

• supported with RAC

• supported for hard partitioning



• reliability


OVM for SPARC• license: hard partitioning • high memory/CPU ratio


• reliability


VM challenges

• ovoid overhead

• CPU

• mem

• clocks, RT scheduling

• don’t waste IO latency in vm layers

• make VM as robust as possible

• other VMs must not influence prod VM



• reliability


LDom PCIe DIO

root@primary:~# ldm list-io -l!NAME TYPE BUS DOMAIN STATUS !---- ---- --- ------ ------ !niu_0 NIU niu_0 primary ![niu@480]!niu_1 NIU niu_1 primary ![niu@580]!pci_0 BUS pci_0 primary IOV ![pci@400]!pci_1 BUS pci_1 primary IOV ![pci@500]!/SYS/MB/PCIE0 PCIE pci_0 db1prod OCC ![pci@400/pci@2/pci@0/pci@8]! SUNW,assigned-device@0! SUNW,assigned-device@0,1!/SYS/MB/PCIE2 PCIE pci_0 primary EMP ![pci@400/pci@2/pci@0/pci@4]!/SYS/MB/SASHBA PCIE pci_0 primary OCC ![pci@400/pci@2/pci@0/pci@e]! scsi@0/iport@1! scsi@0/iport@2! scsi@0/iport@80/cdrom@p7,0! scsi@0/iport@v0/disk@w365ae6ad45951589,0!/SYS/MB/NET0 PCIE pci_0 primary OCC ![pci@400/pci@1/pci@0/pci@4]! network@0! network@0,1!/SYS/MB/PCIE1 PCIE pci_1 db1test OCC ![pci@500/pci@2/pci@0/pci@a]! SUNW,assigned-device@0! SUNW,assigned-device@0,1!

http://portrix-systems.de/blog/brost/using-direct-io-with-ldoms/



• reliability


http://portrix-systems.de/blog/brost/using-direct-io-with-ldoms/

Sol 11 vnet

• sol11 crossbow network virtualization

• LACP

• DLMP (new, great if your switch does not support bonding)

• build vnics on top

• everything is possible



• reliability


memory per CPU

• memory means caching

• caching means less IO

• you don’t license DB per GB Ram

• so the more the merrier

• NUMA

• performance penalty for SMP


• high memory/CPU ratio • product lifecycle

• reliability


memory per CPU• license: hard partitioning

• high memory/CPU ratio • product lifecycle

• reliability


x4170 M2 (nehalem)

X3-2 (sandy bridge)

X4-2 (sandy bridge)

3-channel Intel E5

T4

T5

0 128 256 384 512

512GB

512GB

384GB

256GB

256GB

72GB

memory per CPU

product lifecycle

!

• the x4170 M2 is EOL since 08/2012

• the t4 is still being sold

• SUN even had roadmaps that guaranteed availability



• product lifecycle • reliability


RAS

• T4-4 has hot-swap PCI

• needed for RAC?

• average MTBF?

• perceived reliability?




• reliability • Oracle-on-Oracle

Oracle on Oracle

• engineered together

• optimized for Oracle DB workloads

• support under one roof




• reliability


planning

• CMT history not all bright

• horrible single-thread performance

• results in inacceptable user experience

• insanely expensive RAM

perf estimates

• benchmark!

• find an estimate of impact on latency

• show better throughput at high utilization

perf estimates

problem

!

we did not have our own test machine (yet)

and no useful public benchmarks available

!

but Oracle was kind enough to run some tests for us

perf benchmarks

• loop around sql that works from buffer cache (eliminate IO)

• count executions per second

• we wrote this ourselves first

perf benchmarkstim

e

0s

125s

250s

375s

500s

threads per core

0,5 1 2 4

x5570 @2.93GHzT4 @2.85GHz

results - latency

• at very low utilization

• 90s vs 119s (30%)

• this is intel turbo-boost

• at 1 thread/core same response time

results - throughput

• T4 handles concurrency much better

• hw threads work really well

• ~30% less response time at 4threads/core

results - summary

• single thread “in the same ballpark”

• multi-threaded advantage for T4

• more work/core = more work/$license$

more benchmarks

• SLOB

• LIO test mode

• there are “modes” for PIO, writes, ...

• one suite of tools for multiple platforms

• very simple (and open) logic

• easy to set up, easy to compare

http://kevinclosson.wordpress.com/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/

http://kevinclosson.wordpress.com/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/

SLOB intro1. download

2. unzip

3. compile (minor tweaking for solaris)

4. ./setup.sh for users (~80MB table per user)

5. set SGA large enough to hold all test data

6. modify readers.sh to loop 500k times

7. runit.sh with different # of threads

8. look at SLOBops/s

9. will generate AWR report for LIO and waits

SLOB setup

DECLARE !x NUMBER :=1;!fluff varchar2(128) := 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX';!!BEGIN!FOR i IN 1..10000 LOOP! insert into seed values (x,fluff, NULL, NULL, NULL, NULL, ! NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, fluff);! x := x + 1;!!END LOOP;

SLOB setup

insert into cf1 select * from user1.seed where rownum = 1 ;!commit;!!alter table cf1 minimize records_per_block;!truncate table cf1 ;!commit;!!insert /*+ APPEND */ into cf1 select * from user1.seed order by dbms_random.value();!commit;!!create unique index i_cf1 on cf1(custid) NOPARALLEL PCTFREE 0 tablespace $TABLESPACE;!!alter index i_cf1 SHRINK SPACE COMPACT;!!exec DBMS_STATS.GATHER_TABLE_STATS('$user', 'cf1', estimate_percent=>100, block_sample=>TRUE, degree=>2);!!

SLOB reader.sql

DECLARE!x NUMBER := 0;!v_r PLS_INTEGER;!!BEGIN!dbms_random.initialize(UID * 7777);!!FOR i IN 1..500000 LOOP! v_r := dbms_random.value(257, 10000) ;! SELECT COUNT(c2) into x FROM cf1 where custid > v_r - 256 AND custid < v_r;!END LOOP;!!END;!/!

SLOBops/s

#> ./runit.sh 0 64!Tm 1179

64*500000 SLOBops / 1179s = 27142 SLOBops/s

27142 SLOBops/s / 8 cores = 3393 SLOBops/s/c

SLOB results

peak SLOBops/s per core

0

1250

2500

3750

5000

x5570 E5-2640 E5-2690 SPARC64 T4 T5

thanks to Philippe Fierens for additional data twitter: @pfierens blog: http://pfierens.blogspot.de/

SLOB results

thanks to Philippe Fierens for additional data twitter: @pfierens blog: http://pfierens.blogspot.de/

time to complete 500k SLOB loops

time

0s

400s

800s

1200s

1600s

threads per core

0,25 0,5 1 2 4 8

E5-2640 T4 T5

data migration

• ~1800GB

• ~8 hours of downtime at night acceptable

data migration

data migration

• rman backup/restore

• not across endianness

• rman convert database

• only supported for migration sparc to exadata

• datapump export/import?

• too long

TTS to the rescue

• cross-platform-transportable tablespace

• metalink 371556.1

TTS steps

1. check prerequisites

2. make TS read-only

3. dp export metadata

4. rman convert tablespace TBS to platform

5. move TBS copies to destination

6. import metadata

7. make TBS rw

TTS data copy

• move data on SAN

• avoid 1GbE network copy

• ASM

• nope, partitions not compatible

• ZFS!

• works with block device

• automatic endian conversion

issues

• clusterware does not know netn (bug 13604285)

• falls back to “generic” probing for interconnect health

• trouble with network virtualization (eviction issues until sol patch)

• we forgot to transfer SPM baselines (stupid us)

post mortem

• how to measure succes for the change?

• is “stuff” faster?

• by how much?

post mortem ASH

• ASH/AWR is fantastic

• archive of SQL performance data

• compare avg sql runtime

• before and after change

ASH

• automatically on by default

• sampled, detailed activity data

• samples taken (in mem) every second

• v$active_session...

• write AWR snapshots to disk

• DBA_HIST_ACTIVE_SESSION...

• increase default keep time

• can keep some baselines forever

• awrextr.sql/awrload.sql after migration

post mortem AWR

SELECT!sql_id,!ROUND(sum(elapsed_time_delta)/sum(executions_delta)) exectime,!sum(executions_delta) executions,!sum(elapsed_time_delta) total_time!FROM dba_hist_sqlstat a!JOIN dba_hist_snapshot b!ON a.snap_id = b.snap_id!WHERE b.begin_interval_time > to_timestamp('201301220845','YYYYMMDDHH24MI') !and b.end_interval_time < to_timestamp('201301221345','YYYYMMDDHH24MI')!AND executions_delta>0!group by sql_id

post mortem AWRwith new_system as (SELECT!sql_id,!ROUND(sum(elapsed_time_delta)/sum(executions_delta)) exectime,!sum(executions_delta) executions,!sum(elapsed_time_delta) total_time!FROM dba_hist_sqlstat a!JOIN dba_hist_snapshot b!ON a.snap_id = b.snap_id!WHERE b.begin_interval_time > to_timestamp('201301220845','YYYYMMDDHH24MI') !and b.end_interval_time < to_timestamp('201301221345','YYYYMMDDHH24MI')!AND executions_delta>0!group by sql_id),!old_system as (SELECT!sql_id,!ROUND(sum(elapsed_time_delta)/sum(executions_delta)) exectime,!sum(executions_delta) executions,!sum(elapsed_time_delta) total_time!FROM dba_hist_sqlstat a!JOIN dba_hist_snapshot b!ON a.snap_id = b.snap_id!WHERE b.begin_interval_time > to_timestamp('201301150845','YYYYMMDDHH24MI') !and b.end_interval_time < to_timestamp('201301151345','YYYYMMDDHH24MI')!AND executions_delta>0!group by sql_id)!select new_system.sql_id, !new_system.exectime newtime, !old_system.exectime oldtime, !round(greatest(old_system.exectime,newdb.exectime)/least(old_system.exectime,newdb.exectime)*sign(old_system.exectime-new_system.exectime),0) speedup,!new_system.executions, !new_system.total_time newtime, !old_system.total_time oldtime from new_system, old_system where new_system.sql_id = old_system.sql_id!order by new_system.total_time desc;

post mortem ASH!SQL_ID NEWTIME OLDTIME SPEEDUP EXECUTIONS NEWTIME OLDTIME!------------- ---------- ---------- ---------- ---------- ---------- ----------!94qfy61wzr979 756417 9169090 12 5142 3889494264 1.0323E+11!9fqj3jnfyupvf 894891 15599003 17 2920 2613080481 1.1609E+11!82gkmapmt6apc 726 300 -2 3462071 2515070807 1211844016!5bzjuzcrh52gy 577309159 1170336372 2 4 2309236634 4681345488!gfrtvm879j1ry 517999 176268 -3 3469 1796936890 284848872!annj0ct5d49mp 563926 8964813 16 2727 1537827089 2.0153E+10!a5h8sbx7b4gzf 18930618 491125857 26 54 1022253386 1.3752E+10!ggcsmyw22ygpr 825227 233433 -4 1221 1007601900 465464972!bsjug2fdvkn3a 85490 652245 8 11356 970824082 2.4279E+10!1cphs8cnbhpsm 275 17161 62 1755282 482780358 5.0277E+10!d739f3ystssbf 125212 226918 2 2714 339825406 168372874!crjus93xycs2w 48755724 249751087 5 6 292534344 1998008696!6mrjk196ayz7r 1423 26445 19 192648 274203984 7557523980!6j5f8rc00kb0x 838 19019 23 193281 161903300 5446650848!abr4rgpaagv17 1045 18193 17 147923 154573924 1068976356!09caxbcryq43r 945 18916 20 162908 153892043 4546423200!79wuh7f3rfqfy 376 5280 14 287087 107972760 2275145736!dvmdvrsmhz76b 104031863 160488695 2 1 104031863 320977390!7wra7pmvddday 96859189 35919437 -3 1 96859189 71838874!7bqnjnuptpatm 10585 116958 11 8927 94490590 1.0220E+10!3aj76umkhstnf 2392 18974 8 33688 80588976 167996156!0bc7w9wr441zq 488 7678 16 164108 80126122 1852890834!akrvwc6urkax3 1047 43265 41 70221 73524606 1337942240!0k4nn1g1fwqr8 348 19872 57 191774 66719694 5630835446!agxyq3xtmbtxt 196 2172 11 251529 49349839 895362046!12qw5z09ufub8 267 15673 59 140150 37465359 1934266074!g434abph3dq42 172 1866 11 189748 32709417 509570152!9vp8s2hu7sg41 185 1494 8 166841 30867239 203240886!gag61u6hjrsvk 8658 124536 14 3091 26760809 4033231394!56f1b4nwg7vc9 163 3828 23 67637 11008187 1955368074!gxk98nqkg9rkz 196 5322 27 47439 9285292 1394379424!69subccxd9b03 668 47280 71 4017 2683163 1985395704!c618bg1pzysfb 22832 1678854 74 7 159822 30219378!3t5zba5fc90aj 987 50171 51 155 152919 1705820

Summary

• reconsider SPARC platform

• save money with hard partitioning

• know and utilize your benchmark tools

• don’t fear endian conversion

• use your ASH/AWR data

thank you

!

!

twitter: @brost

http://portrix-systems.de/blog/

[email protected]

http://portrix-systems.de/blog/

sparc migration doag

Documents

Transcript of sparc migration doag