dNFS for DBA’s - docs.delphix.com · MOS - Recommended Patches for Direct NFS Client (Doc ID...
Transcript of dNFS for DBA’s - docs.delphix.com · MOS - Recommended Patches for Direct NFS Client (Doc ID...
dNFS for DBA’s
Marcin Przepiorowski December 2016
2 © 2016 Delphix Corporation
About me
Oracle consultant/DBA since 2000 co-developer of OraSASH – free ASH/AWR like repository Blogger
3 © 2016 Delphix Corporation
Application Files
Any Storage
Databases Dev QA Stage
Collect Control Consume
Mar. 15, 3:30:15 PM!
Non-disruptively sync with source data. Compress an initial copy and store only
change data.
Mask copies. Manage all changes to the source and its copies via a single
point of control.
Create ten copies in the space of one. Deliver data in minutes with powerful self-
service features. Mask
10:1 10:1 10:1
Compress Provision
Retain Branch Bookmark
Refresh Rewind Integrate
Delphix Data as a Service Platform
© 2016 Delphix Corporation 4
Agenda Network 1 Configuration 2 Examples 3
5 © 2016 Delphix Corporation
Michael Coles https://pixabay.com/en/water-hose-garden-wet-gardening-815475/
6 © 2016 Delphix Corporation
Network
8 Gb Fiber Channel = 10 Gb Ethernet NFS
7 © 2016 Delphix Corporation
Network
Throughput is one dimension to measure.
Latency is even more important.
Latency has impact on: - single block reads - real throughput
8 © 2016 Delphix Corporation
100 m
Dog can run 70 km/h – 100 m in 5 sec Dog can carry a 2 TB SSD drive Throughput = 2 * 1024 / 5 sec = 409 GB/s Latency – 5 sec RTT – 10 sec
Other – RFC 1149 IP over Avian Carriers
9 © 2016 Delphix Corporation
Network
Recommended latency for (d)NFS < 1 ms
Jumbo Frames enabled
10 © 2016 Delphix Corporation
Network
1 TCP stream vs multiple TCP streams
11 © 2016 Delphix Corporation
http://www.ishn.com/ext/resources/todaysnews3/traffic-422.jpg http://www.livemint.com/rf/Image-621x414/LiveMint/Period1/2015/08/11/Photos/[email protected]
1 TCP stream vs multiple TCP streams
12 © 2016 Delphix Corporation
Network
Download Managers
13 © 2016 Delphix Corporation
Network
NFS opens one TCP stream
dNFS opens multiple TCP streams
14 © 2016 Delphix Corporation
Direct NFS
• dNFS support >=11g
• Unix/Linux, Windows
• Using ODM for file system calls
• Talking directly to filer
File Type Supported Control file YES Data file YES Redo log file YES Archive/Flashback log file YES Backup files YES Temp file YES Datapump dump file YES OCR files NO spfile YES passwd file YES ASM files YES Voting files NO Audit files NO Database trace files NO External tables YES (12c)
15 © 2016 Delphix Corporation
dNFS configuration
Enable $ cd $ORACLE_HOME/rdbms/lib/ $ make -f ins_rdbms.mk dnfs_on rm -f /u01/app/oracle/11.2.0.4/db1/lib/libodm11.so; cp /u01/app/oracle/11.2.0.4/db1/lib/libnfsodm11.so /u01/app/oracle/11.2.0.4/db1/lib/libodm11.so
Disable $ cd $ORACLE_HOME/rdbms/lib/ $ make -f ins_rdbms.mk dnfs_off rm -f /u01/app/oracle/11.2.0.4/db1/lib/libodm11.so; cp /u01/app/oracle/11.2.0.4/db1/rdbms/lib/libodm11.so.dummy /u01/app/oracle/11.2.0.4/db1/lib/libodm11.s
16 © 2016 Delphix Corporation
dNFS configuration
§ There is no configuration required for basic use case.
§ Configuration file oranfstab is optional
§ But sometimes ALTER DATABASE MOUNT ends up with
ORA-00600: internal error code
UEK Kernel bug ID 1460787.1
17 © 2016 Delphix Corporation
dNFS configuration
$ORACLE_HOME/dbs/oranfstab
/etc/oranfstab
/etc/mtab
server: Delphix # This is only name path: 192.168.166.141 # IP of NFS server local: 192.168.166.142 # IP of interface on DB server export: /oraclenfs mount: /oradata1 # mount points
18 © 2016 Delphix Corporation
dNFS configuration
§ dNFS is looking for each data file path in oranfstab
§ If not found, information from /etc/mtab is used
§ Multipath configuration requires an entry about all file systems in oranfstab
§ File systems without matching entry won’t use multipath
19 © 2016 Delphix Corporation
dNFS configuration
Multipath server:DE local:172.16.169.142 path:172.16.169.141 local:192.168.166.24 path:192.168.166.23 export:/orc_timeflow-79/datafile mount:/mnt/provision/SLOB/datafile export:/orc_timeflow-79/temp mount:/mnt/provision/SLOB/temp export:/orc_timeflow-79/archive mount:/mnt/provision/SLOB/archive
20 © 2016 Delphix Corporation
dNFS configuration
§ MOS - Recommended Patches for Direct NFS Client (Doc ID 1495104.1)
11.2.0.4 and 12.1.0.2 looks OK - How To Setup DNFS (Direct NFS) On Oracle Release 11.2 (Doc ID 1452614.1) - Step by Step - Configure Direct NFS Client (DNFS) on Linux (11g) (Doc ID 762374.1) - Step by Step - Configure Direct NFS Client (DNFS) on Windows (Doc ID 1468114.1)
§ Blogs - http://blog.oracle48.nl/wordpress/direct-nfs-configuring-and-network-considerations-in-practise/ - http://www.slideshare.net/yvelikanov/sharing-experience-implementing-direct-nfs#
21 © 2016 Delphix Corporation
dNFS configuration
§ TCP stack parameters - Oracle recommend increase buffers to 4 MB - This is not enough for fast LAN’s (10 Gb) - Other settings are recommended as well
§ For fast LAN network 16 MB buffers can be fully utilized
§ Some systems require a change of NFS block size
https://docs.oracle.com/database/122/LADBI/checking-tcp-network-protocol-buffer-for-direct-nfs-client.htm
22 © 2016 Delphix Corporation
dNFS configuration
Linux ( Red Hat >= 6.3 )
net.ipv4.tcp_timestamps = 1 net.ipv4.tcp_sack = 1 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_rmem = 4096 16777216 16777216 net.ipv4.tcp_wmem = 4096 4194304 16777216
https://docs.delphix.com/display/DOCS/Target+DB+and+OS+Configuration+Options+for+Improved+Performance
23 © 2016 Delphix Corporation
dNFS configuration
Solaris 11 ipadm set-prop -p max_buf=16777216 tcp ipadm set-prop -p _cwnd_max=4194304 tcp ipadm set-prop -p send_buf=4194304 tcp ipadm set-prop -p recv_buf=16777216 tcp
NFS block size – change from 32 kB to 1MB /etc/system set nfs:nfs3_bsize=0x100000
https://docs.delphix.com/display/DOCS/Target+DB+and+OS+Configuration+Options+for+Improved+Performance
24 © 2016 Delphix Corporation
dNFS configuration
AIX Disable delayed ACK tcp_nodelayack=1 NFS nfs_max_read_size=524288 nfs_max_write_size=524288 Fix for 64k NFS block size 6.1 IV24594 7.1 IV24688
https://docs.delphix.com/display/DOCS/Target+DB+and+OS+Configuration+Options+for+Improved+Performance
25 © 2016 Delphix Corporation
Monitoring
NFS
nfsiostat, netstat, ss, wireshark
dNFS
v$dnfs_stats, v$dnfs_channels
netstat, ss, wireshark
Examples
27 © 2016 Delphix Corporation
0.00
10,000.00
20,000.00
30,000.00
40,000.00
50,000.00
60,000.00
70,000.00
80,000.00
1 2 4 8 16 24 32 40 48
IOPS
Number of SLOB processes
IOPS dNFS dNFS NFS NFS
28 © 2016 Delphix Corporation
IOPS – response time – 40 processes
1.0 ms
0.61 ms
29 © 2016 Delphix Corporation
40 k IOPS
IOPS CPU utilization
dNFS NFS
30 © 2016 Delphix Corporation
Impact of Oracle parameters
FILESYSTEMIO_OPTIONS
- SETALL
- NONE / ASYNCH
NFS can use OS cache depend on value of parameter
dNFS is not using OS file system and OS cache
31 © 2016 Delphix Corporation
0.00
100,000.00
200,000.00
300,000.00
400,000.00
500,000.00
600,000.00
700,000.00
1 2 4 8 16 24 32 40 48
IOPS
Number of SLOB processes
IOPS NFS big dataset NFS small dataset dNFS big dataset dNFS small dataset
INVESTIGATION
33 © 2016 Delphix Corporation
Real life example Block #2: noparallel FULL table scan NFS dNFS
Stat: query - elapsed (s) : 57.43 146.04
Stat: query - row count : 648,300 648,300
Stat: query - physical reads (MB/s) : 176.38 69.362
Block #3: parallel FULL table scans…
Stat: parallel4 - MB/s : 179.43 260.75
Stat: parallel8 - MB/s : 177.24 450.42
Stat: parallel12 - MB/s : 173.45 459.39
Stat: parallel16 - MB/s : 171.98 453.24
34 © 2016 Delphix Corporation
Examples - NFS - RedHat 6.8 UEK 3.8.13
Block #2: noparallel FULL table scan
Stat: query - elapsed (s) : 39.49
Stat: query - row count : 5,067,786
Stat: query - physical reads (MB/s) : 1,002.59
Block #3: parallel FULL table scans...
Stat: parallel4 - MB/s : 1,174.03
Stat: parallel8 - MB/s : 1,177.87
Stat: parallel12 - MB/s : 1,173.68
Stat: parallel16 - MB/s : 1,176.12
35 © 2016 Delphix Corporation
Examples - dNFS - RedHat 6.8 UEK 3.8.13
Block #2: noparallel FULL table scan...
Stat: query - elapsed (s) : 74.71
Stat: query - row count : 5,067,786
Stat: query - physical reads (MB/s) : 529.943
Block #3: parallel FULL table scans...
Stat: parallel4 - MB/s : 1,013.95
Stat: parallel8 - MB/s : 1,152.86
Stat: parallel12 - MB/s : 1,169.18
Stat: parallel16 - MB/s : 1,174.38
36 © 2016 Delphix Corporation
Example – table full scan
NFS
Tota Wait % DB Event Waits Time Avg(ms) time Wait Class ------------------------------ ------------ ---- ------- ------ ---------- direct path read 384,380 1341 3 97.4 User I/O DB CPU 35.1 2.5
dNFS Tota Wait % DB Event Waits Time Avg(ms) time Wait Class ------------------------------ ------------ ---- ------- ------ ---------- direct path read 164,147 1402 9 98.8 User I/O DB CPU 119. 8.4
37 © 2016 Delphix Corporation
38 © 2016 Delphix Corporation
Investigation
§ Top function : copy_user_generic_unrolled
§ It used when there is no optimization on CPU level
§ Fast string operations ( Enhanced REP MOVSB/SROSB) are unsupported.
Linux version 3.8.13-118.14.1.el6uek.x86_64 (mockbuild@x86-ol6-builder-04) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-16) (GCC) ) #2 SMP Mon Oct 31 17:33:13 PDT 2016 Command line: ro root=/dev/mapper/vg_dnfstargetdb1-lv_root rd_NO_LUKS Disabled fast string operations
39 © 2016 Delphix Corporation
Investigation
§ perf record -g -F9999 -p 6101
§ perf script | gzip > dnfs2.txt.gz
§ stackcollapse-perf.pl dnfs2.txt > dnfs2_1.folded
§ flamegraph.pl dnfs2_1.folded > dnfs2_1.svg
40 © 2016 Delphix Corporation
OS call
OS call
Top function
41 © 2016 Delphix Corporation
http://www.newsweek.pl/biznes/animacja-reklamy,artykuly,41747,1,1,1.html
42 © 2016 Delphix Corporation
Investigation - Network
“bandwidth-delay product refers to the product of a data link's capacity (in bits per second) and its round-trip delay time (in seconds)”
https://en.wikipedia.org/wiki/Bandwidth-delay_product
43 © 2016 Delphix Corporation
Investigation - Network
If
BDP (in bytes) > TCP window
then the TCP session will not be able to use all of the available bandwidth
Bandwidth = (window size *8)/RTT
44 © 2016 Delphix Corporation
0
100000
200000
300000
400000
500000
600000
1 6 11
16
21
26
31
36
41
46
51
56
61
66
71
76
81
86
91
96
101
106
111
116
121
126
131
136
Byt
es
Bytes in flight
NFS
dNFS
45 © 2016 Delphix Corporation
Investigation - Network
TCP settings on OS level are some for both runs
and TCP windows can be as big as 16M
net.ipv4.tcp_rmem = 4096 16777216 16777216 net.ipv4.tcp_wmem = 4096 4194304 16777216
46 © 2016 Delphix Corporation
Investigation - Network
NFS LADDR LPORT RADDR RPORT SWND CWND RWND 192.168.166.23 2049 192.168.166.24 971 8379904 1744860 4196612 dNFS LADDR LPORT RADDR RPORT SWND CWND RWND 192.168.166.23 2049 192.168.166.24 36189 71608 44740 4196612 192.168.166.23 2049 192.168.166.24 48924 247052 16777216 4196612 192.168.166.23 2049 192.168.166.24 50202 247052 16777216 4196612 192.168.166.23 2049 192.168.166.24 38596 247052 16777216 4196612
47 © 2016 Delphix Corporation
Investigation - Network
strace output of Oracle process setsockopt(32, SOL_SOCKET, SO_SNDBUF, [262144], 4) = 0 setsockopt(32, SOL_SOCKET, SO_RCVBUF, [262144], 4) = 0 bind(32, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.166.24")}, 16) = 0 connect(32, {sa_family=AF_INET, sin_port=htons(2049), sin_addr=inet_addr("192.168.166.23")}, 16) = 0 setsockopt(32, SOL_SOCKET, SO_SNDBUF, [1056768], 4) = 0 setsockopt(32, SOL_SOCKET, SO_RCVBUF, [1056768], 4) = 0
48 © 2016 Delphix Corporation
Investigation - Network
§ Kernel allocate 2 x SO_RCVBUF for overhead
§ Parameter tcp_adv_win_scale is controlling an overhead buffer Default value is 1
TCP Window = buffer - buffer/2^tcp_adv_win_scale
TCP Window = 512k – 256/2^1 = 256k
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt, http://man7.org/linux/man-pages/man7/socket.7.html
49 © 2016 Delphix Corporation
Investigation - Network
tcp_adv_win_scale = 2 Stat: query - physical reads (MB/s) : 721.704 LADDR LPORT RADDR RPORT SWND CWND RWND 192.168.166.23 2049 192.168.166.24 34770 375056 16777216 4196612 192.168.166.23 2049 192.168.166.24 46897 375056 16777216 4196612 tcp_adv_win_scale = 4 Stat: query - physical reads (MB/s) : 712.985 LADDR LPORT RADDR RPORT SWND CWND RWND 192.168.166.23 2049 192.168.166.24 44832 471056 6970492 4196612 192.168.166.23 2049 192.168.166.24 39377 471056 2925996 4196612
50 © 2016 Delphix Corporation
Investigation - Latency
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
0 0.1 0.2 0.4 0.5 1 2
%
Additional latency
% of max throughput
NFS
dNFS
tc qdisc add dev eth1 root netem delay Xms
51 © 2016 Delphix Corporation
Who is a winner ?
It depends
Run a test with your workload and network
Marcin Przepiorowski @pioro [email protected]
Thank you for attending my session Q&A
Look on my blog for a white paper