Chapter 12: Mass-Storage Systems (Storage Management-Secondary storage)
Trends in the Enterprise Storage...
Transcript of Trends in the Enterprise Storage...
Tom Coughlan1
Trends in the Enterprise Storage Market
Tom CoughlanSr. Engineering Manager, Red HatFeburary, 2012
Tom Coughlan2
➔Big Data, ➔Unstructured Data, ➔Scale-Out vs. Scale-Up,➔Virtualization,➔pNFS➔Solid State Storage...
What is the future of SAN, NAS, DAS?
What role will Linux play in the new environment?
Hot Topics
Tom Coughlan3
The amount of data is exploding
● IBM estimates:● Every day, 2.5 exabytes of data are created.● 90% of the data in the world today was created within the past
two years.
● IDC projections:● transactional data will grow
at a 21.8% CAGR● unstructured data will grow
at a 61.7% CAGR
0 1 2 3 4 5 60
2
4
6
8
10
12
14
16
18
20
http://www-01.ibm.com/software/data/bigdata/http://searchstorage.techtarget.com/magazineContent/Object-storage-gains-steam-as-unstructured-data-grows
Tom Coughlan4
Much of this data is unstructured
● Total Archived Capacity, by Content Type, Worldwide, 2008-2015 (Petabytes) (ESG)
2008 2009 2010 2011 2012 2013 2014 20150
50000
100000
150000
200000
250000
300000
350000
FileE-mailDatabase
h20195.www2.hp.com/v2/GetPDF.aspx/4AA0-4382ENW.pdf
Tom Coughlan5
Big Data Analytics
● Healthcare, government, entertainment, social networking, oil and gas, retail...
● Business intelligence, strategy, product support, product panning, development, just-in-time capacity planning...
● Requires:● High volume (so cost control is critical)● Low latency (streaming)● Data integration: text, audio, video..., from retail,
medical, sensor, seismic, climate, satellite, (even databases)...
Tom Coughlan6
How to deal with this?
● So far, big data adherents have
1)Preferred to avoid shared storage, to minimize latency, and cost.
2)When more capacity is required, scale-out (add nodes)● Keep the storage close to the processor● Add processing power and storage together ● Use commodity parts● File replication for data persistence, as needed
ApplicationServer
ApplicationServer
ApplicationServer
IP Network
Tom Coughlan7
Scale-out, shared nothing
● Advantages:● Performs well for highly distributable problems● Inexpensive commodity hardware● Takes advantage of high performance local storage
(PCIe flash)
Tom Coughlan8
Scale-out, shared nothing
● Disadvantages:● Latency increases for queries that span nodes, and for
replication.● Specialized functions previously done in the storage
controller must be implemented in the o.s.: ● distributed fs, global namespace, data replication, backup,
encryption, snapshot, thin provisioning, remote replication, deduplication, compression, proactive error detection, ease of management, ...
Tom Coughlan9
Contrast with the more traditional approach- shared storage, scale-up
SAN or NAS
ApplicationServer
ApplicationServer
● Smaller number of nodes, more tightly-coupled, shared resources, specialized storage servers.
● When more capacity is required, scale-up existing nodes.
Ctrlr 1 Ctrlr 2 Ctrlr 1 Ctrlr 2
Tom Coughlan10
Scale up – add horsepower to existing nodes
● Advantages:● Some applications (single-threaded with large data sets)
can not be easily partitioned.● Centralized data protection, management, backup...
● Disadvantages:● Scaling limits... eventually you hit a wall
● ...and, if you do add another server+storage cluster, the lack of a global namespace can make it difficult to manage/load-balance the environment
● Proprietary, vendor lock-in● Generally more expensive
Tom Coughlan11
Shared storage
● Advantages:● Data is available to multiple machines
● Server virtualization provides load balancing
● Centralized data protection, management, backup...
● Disadvantages:● Access coordination can impact performance● Can be more expensive than DAS
Tom Coughlan12
As big data moves to the enterprise
● Take advantage of the scale-out approach● Control cost
● but keep shared storage● Virtualization● Ease-of-management, data protection, specialized
functions.
Tom Coughlan13
Scale-out, with shared storage
ApplicationServer
ApplicationServer
ApplicationServer
IP Network
NAS/SAN Shared Storage
● Scale-out NAS
● pNFS
● iSCSI and FCoE
Tom Coughlan14
Scale-out NAS (the hardware approach)
● Hardware vendors solve the storage controller bottleneck by “clustering” the controllers together.
● The group appears as one to the o.s.
ApplicationServer
ApplicationServer
ApplicationServer
IP Network
SAN or NAS
Ctrlr 1 Ctrlr 2 Ctrlr 1 Ctrlr 2 Ctrlr 1 Ctrlr 2
Tom Coughlan15
Scale-out NAS (the software approach) - Gluster Distributed Filesystem
IP NetworkGluster Global Namespace
Mirror Mirror
Distribute
Applicationw. Native GlusterFS
GlusterServer
xfs
GlusterServer
xfs
GlusterServer
xfs
GlusterServer
xfs
Applicationw. Native GlusterFS
Tom Coughlan16
Gluster with Non-native Clients
IP NetworkGluster Global Namespace
Application- NFS- CIFS
Mirror Mirror
Distribute
GlusterServer
xfs
GlusterServer
xfs
GlusterServer
xfs
GlusterServer
xfs
Applicationw. Native GlusterFS
Applicationw. Native GlusterFS
Tom Coughlan17
pNFS
● Access an NFS Metadata Server, then R/W the storage directly
● Data may be File, or Object, or Block Based
ApplicationServer
ApplicationServer
ApplicationServer
IP Network
SAN or NAS NFS metadata
Data
Tom Coughlan18
iSCSI and FCoE
● Lower-cost shared block storage● Traditional db, and virtualization workloads ● May pair nicely with pNFS:
ApplicationServer
ApplicationServer
ApplicationServer
IP Network
LANNFS metadata
Data
Tom Coughlan19
Conclusions
● In the strict big data approach, with no shared storage, the Linux system must perform the specialized functions previously performed by the storage controller.
● Currently underway:● efficient snapshot● thin provisioning● disk encryption - dm-crypt● integration of LVM with md RAID● PCIe flash performance optimizations
● Future:● hierarchical storage
Tom Coughlan20
Disk Density
30% CAGR
http://www.storagenewsletter.com/news/disk/hdd-technology-trends-ibm
Tom Coughlan21
Disk Capacity
http://en.wikipedia.org/wiki/File:Hard_drive_capacity_over_time.png
Tom Coughlan22
Disk Max. Sustained Bandwidth
10-15% CAGR
http://www.storagenewsletter.com/news/disk/hdd-technology-trends-ibm
Tom Coughlan23
Disk Access Time = seek time + latency
Ave. Seek Time
http://www.storagenewsletter.com/news/disk/hdd-technology-trends-ibm
Tom Coughlan24
Disk Latency
7,200 RPM
10,000 RPM
15,000 RPM
http://www.storagenewsletter.com/news/disk/hdd-technology-trends-ibm
Tom Coughlan25
Conclusions (cont.)
● Flash has arrived just in time.
● Shared storage (NAS, SAN) will remain prominent, with additional emphasis on cost effective scale-out.
● Gluster● pNFS● iSCSI, FCoE
● Initiator and target
● More storage boxes => better management is required:● libStorageMgmt
● http://sourceforge.net/projects/libstoragemgmt/
● An opportunity for Linux as a low-cost scale-out storage server.
Tom Coughlan26
Thank-you.