Data Management at CERN’s Large Hadron Collider (LHC) Dirk D ü llmann CERN IT/DB, Switzerland
Experience with NetApp at CERN IT/DB
description
Transcript of Experience with NetApp at CERN IT/DB
![Page 1: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/1.jpg)
CERN IT Department
CH-1211 Geneva 23
Switzerlandwww.cern.ch/
it
Experience with NetAppat CERN IT/DB
Giacomo Tenagliaon behalf of
Eric GrancherRuben Gaspar Aparicio
![Page 2: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/2.jpg)
CERN IT Department
CH-1211 Geneva 23
Switzerlandwww.cern.ch/
it
Outline
• NAS-based usage at CERN• Key features• Future plans
Experience with NetApp at CERN IT/DB - 2
![Page 3: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/3.jpg)
Storage for Oracle at CERN
• 1982: Oracle at CERN, PDP-11, mainframe, VAX VMS, Solaris SPARC 32 and 64
• 1996: Solaris SPARC with OPS, then RAC• 2000: Linux x86 on single node, DAS• 2005: Linux x86_64 / RAC / SAN
– Experiment and part of WLCG on SAN until 2012• 2006: Linux x86_64 / RAC / NFS (IBM/NetApp)• 2012: all production primary Oracle databases (*)
on NFS
(*) apart from ALICE and LHCb onlineExperience with NetApp at CERN IT/DB - 3
![Page 4: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/4.jpg)
Network topology
• All 10Gb/s Ethernet• Same network for storage and cluster interconnect
filer1 filer2 filer3 filer4
serverBserverA serverC serverEserverD
Ethernet switch
Private 1
Ethernet switch
Private 2
Internal HA pair interconnect
Private network, both CRS and storage
“public network” Ethernet switch Public
![Page 5: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/5.jpg)
Domains: space/filers
Total size (TB) Used for backup (TB) # of Filersdes-nas 47.4 62.6 10
shosts 204 4
gen3 97 4
rac10 59 6
rac11 59 6
castor 154 18
acc 281 8
db disk 1000 2
TOTAL 901.4 1062.6 58
Experience with NetApp at CERN IT/DB - 5
![Page 6: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/6.jpg)
Typical setup
![Page 7: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/7.jpg)
Impact of storage architecture on Oracle stability at CERN
Experience with NetApp at CERN IT/DB - 7
![Page 8: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/8.jpg)
Key features
• Flash cache• RaidDP• Snapshots• Compression
Experience with NetApp at CERN IT/DB - 8
![Page 9: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/9.jpg)
Flash cache
• Help to increase random IOPs on disks– Very good for OLTP-like workload
• Don’t get wiped when servers reboot• For databases
– Decide what volumes to cache:fas3240>priority on
fas3240>priority set volume volname cache=[reuse|keep]
• 512 GB modules• 1 per controller
Experience with NetApp at CERN IT/DB - 9
![Page 10: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/10.jpg)
IOPs and Flash cache
Experience with NetApp at CERN IT/DB - 10
![Page 11: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/11.jpg)
IOPs and Flash cache
Experience with NetApp at CERN IT/DB - 11
![Page 12: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/12.jpg)
Key features
• Flash cache• RaidDP• Snapshots• Compression
Experience with NetApp at CERN IT/DB - 12
![Page 13: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/13.jpg)
Disk and redundancy (1/2)
• Disks are larger and larger – speed stay ~constant → issue with performance– bit error rate stay constant (10-14 to 10-16), increasing
issue with availability
• With x as the size and α the “bit error rate”
Experience with NetApp at CERN IT/DB - 13
![Page 14: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/14.jpg)
Disks, redundancy comparison (2/2)
1 TB SATA desktop
Bit error rate 10^-14
RAID 1 7.68E-02
RAID 5 (n+1) 3.29E-01 6.73E-01 8.93E-01
~RAID 6 (n+2) 1.60E-14 1.46E-13 6.05E-13
~triple mirror 8.00E-16 8.00E-16 8.00E-16
1TB SATA enterprise
Bit error rate 10^-15
RAID 1 7.96E-03
RAID 5 (n+1) 3.92E-02 1.06E-01 2.01E-01
~RAID 6 (n+2) 1.60E-16 1.46E-15 6.05E-15
~triple mirror 8.00E-18 8.00E-18 8.00E-18
450GB FCBit error
rate 10^-16
RAID 1 4.00E-04
RAID 5 (n+1) 2.00E-03 5.58E-03 1.11E-02
~RAID 6 (n+2) 7.20E-19 6.55E-18 2.72E-17
~triple mirror 3.60E-20 3.60E-20 3.60E-20
5 14 28 5 14 28
10TB SATA enterprise
Bit error rate 10^-15
RAID 1 7.68E-02
RAID 5 (n+1) 3.29E-01 6.73E-01 8.93E-01
~RAID 6 (n+2) 1.60E-15 1.46E-14 6.05E-14
~triple mirror 8E-17 8E-17 8E-17
Experience with NetApp at CERN IT/DB - 14
Data loss probability for different disk types and groups
![Page 15: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/15.jpg)
Key features
• Flash cache• RaidDP• Snapshots• Compression
Experience with NetApp at CERN IT/DB - 15
![Page 16: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/16.jpg)
Snapshots
Experience with NetApp at CERN IT/DB - 16
• T0: take snapshot 1
![Page 17: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/17.jpg)
Snapshots
Experience with NetApp at CERN IT/DB - 17
• T0: take snapshot 1• T1: file changed
![Page 18: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/18.jpg)
Snapshots
Experience with NetApp at CERN IT/DB - 18
• T0: take snapshot 1• T1: file changed• T2: take snapshot 2
![Page 19: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/19.jpg)
Snapshots for backups
• With data growth, restoring databases in reasonable amount of time is impossible using “traditional” restore/backup techniques
• 100TB, 10GbE, 4 tape drives• Tape drive restore performance ~120MB/s• Restore ~ 58 hours (but it can be much longer)
Experience with NetApp at CERN IT/DB - 19
![Page 20: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/20.jpg)
Snapshots and Real Application Testing
Capture
insert… PL/SQL
update …
delete …Original
Clone 10.2 11.2
Upgrade Replay
insert… PL/SQL
update …
delete …
Experience with NetApp at CERN IT/DB - 20
![Page 21: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/21.jpg)
Snapshots and Real Application Testing
Capture
insert… PL/SQL
update …
delete …Original
Clone 10.2 11.2
Upgrade Replay
insert… PL/SQL
update …
delete …
SnapRestore®
Replay
insert… PL/SQL
update …
delete …
Replay
insert… PL/SQL
update …
delete …
Experience with NetApp at CERN IT/DB - 20
![Page 22: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/22.jpg)
Key features
• Flash cache• RaidDP• Snapshots• Compression
Experience with NetApp at CERN IT/DB - 21
![Page 23: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/23.jpg)
NetApp compression factor
' Uncompressed GB Compressed GB Compression RatioOne day AISDB Prod redolog 281.3 100.7 2.8Recent one day ACCLOG datafile 118.1 49.4 2.4CMSR full backup 997.3 297.7 3.4
Experience with NetApp at CERN IT/DB - 22
![Page 24: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/24.jpg)
Compression: backup on disk
RMANFile backup
1x tape copy
+
Disk bufferRaw: ~1700 TiB (576 3TB disks)
Usable: 1000 TiB(to hold ~2PiB uncompressed data)
Experience with NetApp at CERN IT/DB - 23
![Page 25: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/25.jpg)
Future: OnTap Cluster Mode
• Non-disruptive upgrades/operations: the immortal cluster
• Interesting new features– Internal DNS load balancing– Export policies: fine-grained access for NFS exports– Encryption and compression at storage level– NFS 4.1 implementation, parallel NFS
• Scale-out architecture: up to 24 (512 theoretical)• Seamless data moves for capacity, performance
rebalancing or hardware replacement
Experience with NetApp at CERN IT/DB - 24
![Page 26: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/26.jpg)
Architecture view – Ontap cluster mode
Experience with NetApp at CERN IT/DB - 25
![Page 27: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/27.jpg)
Possible implementation
Experience with NetApp at CERN IT/DB - 26
![Page 28: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/28.jpg)
Logical components
Experience with NetApp at CERN IT/DB - 27
![Page 29: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/29.jpg)
pNFS
• NFS 4.1 standard (client caching, Kerberos, ACL)• Coming with Ontap 8.1RC2
• Not natively supported by Oracle yet• In RHEL 6.2
• Control protocol: provides synchronization among data and metadata server
• pNFS between client and MDS, get where information is store
• Storage access protocols: file-based, block-based and object- based
pNFS
Storage access protocols
Experience with NetApp at CERN IT/DB - 28
![Page 30: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/30.jpg)
CERN IT Department
CH-1211 Geneva 23
Switzerlandwww.cern.ch/
it
Summary
• Good reliability– Six years of operations with minimal downtime
• Good flexibility– Same setup for different uses/workloads
• Scales to our needs
Experience with NetApp at CERN IT/DB - 29
![Page 31: Experience with NetApp at CERN IT/DB](https://reader035.fdocuments.us/reader035/viewer/2022062815/56816932550346895de082f5/html5/thumbnails/31.jpg)
CERN IT Department
CH-1211 Geneva 23
Switzerlandwww.cern.ch/
it
Q&A
Thanks!
[email protected]@cern.ch
Experience with NetApp at CERN IT/DB - 30