ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

28
ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010

Transcript of ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

Page 1: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS Under Scrutiny

Luca Canali, CERN

Dawid Wojcik, CERNUKOUG Conference, Birmingham, Nov 2010

Page 2: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

Outline

ACFS – cluster file system for 11gR2 ASM Use cases Architecture Installation and setup Some investigations of the internals

ACFS at CERN Use cases Deployment Performance tests

Conclusions

ACFS under scrutiny – Luca Canali, Dawid Wojcik 2

Page 3: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS Use Cases

Use cases for ACFS with Oracle Automatic Diagnostic Repository (ADR) for 11g RDBMS

• unified logging structure for RAC RDBMS related usage: BFILES, datapump dump files, ETL files,

miscellaneous log files (RMAN), etc Can also be used for Oracle RDBMS home binaries

• shared or non-shared

• Allows to take snapshots before applying a patch or a patchset

Use cases of ACFS as generic file system Can be deployed for custom applications and application

servers No need to have RDBMS installation, only clusterware 11.2 Performance, maintenance and high availability DBAs will find it easy to use

3ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 4: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ASM and ACFS – Architecture

ACFS under scrutiny – Luca Canali, Dawid Wojcik 4

Grid Infrastructure

ASM

Oracle Database

Files

ASM Cluster File System

(ACFS)

Third Party File System(optional)

Oracle RAC or Single Instance DBs

ASM dynamic Volume Manager (ADVM)

Applications

Page 5: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ASM and ACFS

ASM: volume manager and cluster file system for Oracle DB files

ACFS: POSIX compliant cluster file system Build on top of ASM disk groups For ‘all other files’ (DB not supported on ACFS yet)

ACFS leverages ASM and CRS Performance Manageability High Availability

Ref: ACFS Technical Overview and Deployment Guide [ID 948187.1]

5ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 6: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

Automatic Storage Management

ASM (Automatic Storage Management) Oracle’s cluster file system and volume manager for Oracle

databases HA: fault tolerant, online storage reorganization/addition Performance: stripe and mirroring everything Commodity HW: Physics databases at CERN use ASM normal

redundancy (similar to RAID 1+0 across multiple disks and storage arrays)

ACFS under scrutiny – Luca Canali, Dawid Wojcik 6

Failgroup4Failgroup4Failgroup2Failgroup2 Failgroup3Failgroup3Failgroup1Failgroup1

DATA_DiskGroupDATA_DiskGroup

RECOVERY_DiskGroupRECOVERY_DiskGroup

Page 7: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ASM Dynamic Volume Manager

ASM Dynamic Volume Manager (ADVM) New feature in Oracle Clusterware 11.2 Volumes are implemented as ASM files exposed to OS as block devices: /dev/asm/volume_name-x Configurable redundancy, stripe width and stripe columns A dirty region logging file is created if redundancy is mirror or high On top of ADVM volume one can create any file system (ext3,

ACFS, ...)

Volumes can be resized online File system must also support online resize (ACFS, grow: ext2,

ext3, ext4)

Further investigations on internals: v$asm_volume, v$asm_file, x$kffxp

7ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 8: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ASM Cluster File System

What is ACFS ASM-based Cluster File System – new in Oracle 11.2

• Built on top of ADVM volumes• Can be used cluster-wide or single-node only

Multi platform support (11.2.0.2) Can be shared using NFS, CIFS, … Online file system expansion / shrink Mirror protection when using NORMAL or HIGH

redundancy diskgroups/volumes Read-only snapshots built-in Replication, security realms, encryption and tagging

introduced in 11.2.0.2

ACFS under scrutiny – Luca Canali, Dawid Wojcik 8

Page 9: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS Integration with the Oracle Software Stack

ASM, ADVM and ACFS are integrated with Oracle 11gR2 clusterware

ASM and clusterware are tightly integrated in 11gR2 A single ‘GRID HOME’ is used Notable: administration is simplified by storing OCR and voting

disk(s) in ASM

ADVM and ACFS resources are managed by clusterware Ease maintenance and learning curve for the DBA

9ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 10: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS Crash Recovery

In case of a node crash or force dismounting of ACFS – recovery is needed (three levels) ASM in RAC will use surviving nodes to recover

• ASM uses ‘internal files’ such as ACD (Active Change Directory) and COD (Continuing Operation Directory) for this purpose

ADVM volumes with normal or high redundancy have associated dirty region logging file (high redundancy by default) – recovery run by ASM processes

ACFS utilizes Metadata Transaction Log

10ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 11: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS setup

Setting up ACFS for the first time The quick way: asmca

The alternative CLI setup Create and enable volume (enabled on all nodes by default)

• Asmcmd: create volume -G {diskroup_name} -s {size} {vol_name}

• /dev/asm/{vol_name}-x device is created (Linux) Create ACFS file system

• mkfs -t acfs /dev/asm/{vol_name} Register acfs general purpose filesystem with CRS (Allows to

mount filesystem automatically with CRS)• acfsutil registry -a -f {vol_path} {acfs_mount_point}

• If ACFS will be used for Oracle home use this instead:– srvctl add filesystem -d {vol_path} -v {volume_name} -g

{disgroup_name}

– Allows to maintain ACFS, ASM and DB dependencies

11ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 12: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ASM and ACFS internals

New ASM background processes in 11gR2 Used to manage interaction ADVM and ASM, IO

fencing and clusterware membership• One can see them with ps -elf | grep ASM

Volume Driver Background (VDBG) Volume Background (VBG#) processes Volume Membership Background (VMB) ACFS background process

More details on metalink note [ID 883028.1]

12ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 13: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS, ADVM and Linux

Kernel modules needed for ADVM and ACFS oracleacfs, oracleadvm, oracleoks

• Can been seen on OS level with lsmod | grep oracle

Binaries in $GRID_HOME/install/usm/• One can check location with acfsroot version_check

How to remove acfs-related kernel modules Modules are proprietary (non-GPL) and trigger message on

kernel tainting in /var/log/messages If don’t want to use acfs or are afraid of kernel tainting acfsroot uninstall See also note [ID 1082851.1]

13ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 14: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS Command Line Tools

Main command line interface – acfsutil Display filesystem information, resize filesystem,

register mountpoints, create snapshots, … Can be used to configure new 11.2.0.2 features of

security, realms, encryption, replication and tagging Most operations can also be done via GUI tool

asmca

Other utilities Typical Linux/UNIX: fsck, mkfs, mount, umount afcsdbg – debug tool advmutil – display ADVM information, tune ADVM

14ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 15: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS Snapshots

ACFS snapshots provide point-in-time images Can be used for consistent backups Performed online Copy on first write mechanism (before-images

shared between snapshots) Snapshots within the same file system

Snapshots visible in CLI or in V$ASM_ACFSSNAPSHOTS You can read snapshots in /mount_point/.ACFS/snaps/

Limited to 63 snapshots per ACFS file system

ACFS under scrutiny – Luca Canali, Dawid Wojcik 15

Page 16: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS Replication

File system level replication from one primary site to another Can replicate whole ACFS filesystem or only tagged fragments Changes on the primary system captured to replication logs Replication logs send by background processes to destination

cluster and replayed there Logs deleted from both system after applying Replication logs stored in the same filesystem – check for free

space! Replication can be set-up with acfsutil

Possible use case: Replicate ACFS file system data in Data Guard setup

16ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 17: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

CERN experience with ACFS

Page 18: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

Physics DB HW, a typical setup Dual-CPU quad-core blade servers, 24GB memory, Intel

Nehalem low power; 2.26GHz clock Redundant power, mirrored local disks, 4 NIC (2 private/

2 public), dual HBAs, “RAID 1+0 like” with ASM

18ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 19: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS and ASM on low-cost storage

Advantages High performance Reliability Low cost

• Normal redundancy ASM disk groups instead of complicated RAIDs• Cheap SATA disks rather than more expensive enterprise solutions

• Can provide mirroring across storage arrays Online operations (grow/shrink, add/remove disks)

Disadvantages Can only be used on nodes with clusterware installed

• Unless exported via NFS Some operations cause cluster-wide sync – performance impact Async IO not supported – cannot put DB data files on ACFS

19ACFS under scrutiny – Luca Canali, Dawid Wojcik

Page 20: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS use cases at CERN

ACFS is used in production at CERN General purpose cluster file system for backup &

monitoring cluster – fast and reliable Repository of oracle binaries Temporary storage for large exports/imports

Other usages predicted after moving to Oracle 11.2 Automatic Diagnostic Repository (ADR) Export/import directory for each cluster DB Local Oracle homes (?) – snapshots can be used before

patching

ACFS under scrutiny – Luca Canali, Dawid Wojcik 20

Page 21: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

ACFS test setup

64 SATA II, 7k2 RPM, 400GB lower end disks JBOD configuration – visible to ASM as 64 LUNs 45% of each disk’s capacity used for DATA diskgroup

For improved IOPS and throughput (OS level partitioning)

ASM normal redundancy used – 10.5TB diskgroup Two 800GB and 80GB ADVM normal redundancy volumes

created for tests ACFS, ext2 and ext3 file systems created on ADVM volumes No difference in speed between small and large file systems in any of

the tests (80GB vs 800GB)

ACFS under scrutiny – Luca Canali, Dawid Wojcik 21

Page 22: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

Tests conducted

Tests conducted using dd tool – different operations and different operation block sizes (all file

sizes of 70GB) bonnie++ – generic file system tests (ver. 1.96)

Tests presented Comparing ACFS, ext2 ext3 and encrypted ACFS (AES 192-bit) ADVM used in all tests dd command sequential write (synchronous and asynchronous) dd command sequential read (synchronous) bonnie++ - file system block write, rewrite and read; file creation and

deletion speed Multithread tests – workload run from 1 node and 2 nodes

• Running same workload (2 threads)• Running one workload listing directories on the second node (10x/s)• Streaming tests – one thread writing, second thread reading

ACFS under scrutiny – Luca Canali, Dawid Wojcik 22

Page 23: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

Write test results results in our enviroment

ACFS under scrutiny – Luca Canali, Dawid Wojcik 23

218 217 215 214 217226 229 232 231 229

210 215205 203 205

180

190

200

210

220

230

240

1024 256 128 32 8

MB/

s

operation block size [kB]

Asynchronous sequential write [MB/s]

ACFS async [MB/s]

EXT2 async [MB/s]

EXT3 async [MB/s]

148120

102

5623

92 109 113

6325

90109 115

6224

020406080

100120140160

1024 256 128 32 8

MB/

s

operation block sie [kB]

Sychronous sequential write [MB/s]

ACFS sync [MB/s]

EXT2 sync [MB/s]

EXT3 sync [MB/s]

Page 24: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

Read and write results in our enviroment

ACFS under scrutiny – Luca Canali, Dawid Wojcik 24

208 207 208 208 208276

219 209

11654

278218 215

11654

0

50

100

150

200

250

300

1024 256 128 32 8

MB/

s

operation block size [kB]

Sychronous sequential read [MB/s]

ACFS sync [MB/s]

EXT2 sync [MB/s]

EXT3 sync [MB/s]

39 39 39 38 38

5.5 5.4 5.4 5.2 4.57.4 7.5 7.4 7.9 7.70

10

20

30

40

50

1024 256 128 32 8

MB/

s

operation block size [kB]

Sequential read and write - encrypted ACFS (AES 192-bit)

ACFS async write [MB/s]

ACFS sync write [MB/s]

ACFS sync read [MB/s]

Page 25: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

bonnie++ test results results in our enviroment

ACFS under scrutiny – Luca Canali, Dawid Wojcik 25

201

152

201

256.7 7.5

210

155

207191

138

207

0

50

100

150

200

250

Block write Block rewrite Block read

MB/

s

bonnie++ throughput tests

ACFS

Encr. ACFS

EXT2

EXT3

7500 62002800

48007800

24000

0

5000

10000

15000

20000

25000

30000

Files created/s Files deted/s

File

s/s

bonnie++ random file creation test

ACFS

Encr. ACFS

EXT2

Page 26: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

Multithread test results results in our enviroment

ACFS under scrutiny – Luca Canali, Dawid Wojcik 26

201

98

183

54

152

82

126

26

201

144165

198

0

50

100

150

200

250

1 thread 2 threads (same node) 2 threads (different nodes) 1 thread (ls command 10x/s on second node)

aver

age

MB/

s/th

read

bonnie++ throughput tests - multithread comparison

Write

Rewrite

Read

161

72

194212

154

88120 126

0

50

100

150

200

250

Threads on the same node (1MB block)

Threads on the 2 nodes (1MB block)

Threads on the same node (8kB block)

Threads on the 2 nodes (8kB block)

aver

age

MB/

s/th

read

bonnie++ throughput tests - write and reader threads

Write thread

Read thread

Page 27: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

Conclusions

ADVM and ACFS Cluster file system integrated in 11.2 Leverages ASM features for non-RDBMS files

ACFS usage at CERN Positive experience

• Currently used to provide cluster filesystem for our custom DB monitoring

Being considered for the 11g RDBMS deployments • To support log files on ADR, …

ACFS performance tests• Positive results, more tests in progress

ACFS under scrutiny – Luca Canali, Dawid Wojcik 27

Page 28: ACFS Under Scrutiny Luca Canali, CERN Dawid Wojcik, CERN UKOUG Conference, Birmingham, Nov 2010.

Acknowledgments

CERN-IT DB group and in particular: Jacek Wojcieszuk

More info:

http://cern.ch/it-dep/db

http://cern.ch/canali

ACFS under scrutiny – Luca Canali, Dawid Wojcik 28