Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the...
Transcript of Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the...
![Page 1: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/1.jpg)
Big Data – Trends
Bill Peterson
gto Watch
NetApp
September, 2012
1
![Page 2: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/2.jpg)
Bill Peterson@th bill@thebillp
![Page 3: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/3.jpg)
What I hope to accomplish todayWhat I hope to accomplish todayaccomplish today...accomplish today...
![Page 4: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/4.jpg)
...and avoid this....and avoid this.
![Page 5: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/5.jpg)
What is “Big Data”?“Big Data” refers to datasets whose volume, speed and complexity is beyond the ability of typical tools to capture, “Big Data” refers to datasets whose volume, speed and complexity is beyond the ability of typical tools to capture,
Complexity
store, manage and analyze.store, manage and analyze.
Coined by Francis yDiebold, professor of economics at the University of PA in 2000, when “Big” meantwhen Big meant Gigabytes / day1
VolumeSpeed
5
![Page 6: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/6.jpg)
Quantifying The Big Data Challenge
Estimated size of the digital universe in
60 Zettabytes
5 Billi
Growth Over the Next Decade:
Servers (Phys/VM): 10xdigital universe in 20205 Billion
smart phones
Data/Information: 50x#Files: 75xIT Professionals: <1.5xSource: Gantz, John and Reinsel, David, “Extracting Value from Chaos”, IDC IVIEW J 2011 4
30 Billionpieces of new content to Facebook per month
IDC IVIEW, June 2011, page 4.
SSensorsVideoMusicLocationW blWeblogs
6
![Page 7: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/7.jpg)
The Big Data Push
Tier 1 BP
Hig
h
OLTP
Tier-2 BP OLTPre
DSS/DWCollaboration
App DevWeb Infra
No SQLColumnar
DBs
a S
truct
ur
Sat G d
IT Infra
Content
Dat
a Ground Stations
FMVDVS
Content Repositories
Performance S
Low
HPCTech Comp Home
Dirs
7
Performance Small Block, Random I/O
(100s KIOPS)
Large Block, Sequential I/O100s GB/sec
![Page 8: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/8.jpg)
What Does This Mean to You?
Information Becomes
on V
eloc
ity
Information Becomes a Propellant to the Organzation
ess
or M
issi
o
InflectionPoint Data Becomes a
Burden to IT Infrastructure
Bus
ine
2010 2020
You are also at an Inflection Point: You also have a decision to make as “business as usual” may not cut it!
8
decision to make, as business as usual may not cut it!
![Page 9: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/9.jpg)
Dispelling the Misconceptions pAbout Big Data
9
![Page 10: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/10.jpg)
Big Data Is NOT New
30 PB of New Data
10
Annually
![Page 11: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/11.jpg)
Big Data = Big Analytics = Hadoop?
That’s What The Media Hype Implies, but it is NOT true!Traditional analytics (BI/DSS/DW) dominates the analytics markety ( ) yLike other technologies vying to gain broad adoption in Enterprise IT (e.g., Traditional Analytics, HPC & Cloud), it shows promise
HadoopHadoop$77 M$77 M
Analytics Analytics $35 B$35 B HPC HPC Cloud Cloud
BPaaSBPaaS$87 B$87 B
Enterprise Enterprise IT $3.6TIT $3.6T
$77 M$77 M $35 B$35 B$29 B$29 B $23 B$23 B
11
![Page 12: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/12.jpg)
12
![Page 13: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/13.jpg)
Why Decision Support Systems are important?
A BusinessCustomers A Business
Management
Customers
Products & Services ManagementProducts & Services
OLTP DW$$$$
DSS enables businesses to run “Closed Loop”, ultimately improving their business through the use of feedback mechanisms.
![Page 14: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/14.jpg)
Big Analytics – An Emerging MarketCloud & Cyber
Open Source Distributors
Integration Services
Legacy DBs
NoSQL / Column DBs
Middleware & Apps
Compute StorageNetwork
14
![Page 15: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/15.jpg)
Analytics & Enterprise Apps Environment
Reporting/Dashboard/Visualization
A l ti
Applications
OLAP
Data Management
Analytics
ETL
OLAP
OLTP
OLAPETL
Mobile Devices
Storage File SystemsOLTP
Storage DataManagement
Sensors
Logs
Location/GPS
Storage(All other storage i e internal DAS)
Content Repositories
Shared StorageInfrastructureOther
Data Sources
Management
15
Applications (All other storage, i.e. internal DAS)
![Page 16: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/16.jpg)
What Does Hadoop Look Like Today?
Runs on a collection of cheap, commodity servers, in a distributed,
HDFS
shared nothing architecture
Two key components
NameNode
Secondary NameNode
– HDFSHadoop Distributed File System
Map Reduce DataNodes /File System
– MapReduceProgramming model for processing
Reduce DataNodes /TaskTracker
:JobTracker
g g p gand generating large datasets
DataNodes /TaskTracker
16
![Page 17: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/17.jpg)
Ethernet’s Relentless March
100000
Data will be growing by 50x, but bandwidth only by 10x!
1000
10000
c
10
100
MB
/Sec
0 1
1iS
CSI
FCIP
FCoE
iWar
p
pNFS
0.1
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
Time
17
SCSI/FCP Infiniband ATM FDDI Ethernet
![Page 18: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/18.jpg)
Why Should You Care?It’s the Value of your dataIt s the Value of your data
Top line revenueLeverage their data– Leverage their data assets into business advantage
5 Billion RecordsAnywhere, AnytimeFaster time to market50% Increase in Revenue
Bottom Line savings– Lower the cost of
complianceO 1PB f d t compliance– Manage ever growing
data efficiently
Over 1PB of dataGrowth of 175% YOY90 days of data within24 hours of a failure y24 hours of a failure
18
![Page 19: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/19.jpg)
AutoSupport: Hadoop Use Case at NetApp
“Call‐home” service for all NetApp® systems Foundation of NetApp proactive support strategies
CHALLENGE NETAPP SOLUTION BENEFITS
Machine‐generated data doubles every 16 months
CHALLENGE NETAPP SOLUTION BENEFITS
4 weeks to run a query on24 billion nstr ct red
Time reduced from 4 eeks to 10 5 ho rs24 billion unstructured
records4 weeks to 10.5 hours
10-node HadoopImpossible to run a query: 240 billion unstructured records
10 Node Hadoop Cluster
Previously impossible, now achievable in just 18 hours
10-node Hadoop Cluster w/
sharedStoragerecords 18 hours
19
“NetApp ASUP is a mission-critical application”
![Page 20: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/20.jpg)
Analytics of Tomorrow
Traditional & Big Analytics side-by-side for years to comeHadoop moves to shared, virtualized infrastructure, forHadoop moves to shared, virtualized infrastructure, for better efficiency and ease of management:– Hadoop remains logically distributed, shared nothing, but runs
on a virtualized shared everything architecture (e g FlexPodon a virtualized shared everything architecture (e.g., FlexPod for Vmware + eSeries)
– Same as above, except Hadoop becomes logically shared everything as HDFS is replaced by a parallel file systemeverything, as HDFS is replaced by a parallel file system (e.g., Lustre Cluster, StorNext or GPFS)
Enterprise class resiliency (no SPoF) and reliability with HPC lik f ( d f t i li )HPC-like performance (no need for triplicas)Use of a single copy of data for the map phase (higher storage utilization)Natural intersection with Cloud (Analytics as a Service)
20
![Page 21: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/21.jpg)
Summary
Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private companies have
l d i h Bi D f d d )struggled with Big Data for decades) Analytics: Traditional BI/DSS analytics still dominate. Importance of newer NoSQL & Columnar DB applications, enabled by MapReduce will grow with the growth of multi-structured dataBig Data applications, such as Hadoop, will need to adopt shared, virtualized infrastructure (and its management benefits) if they are to be widely adopted by Enterprise IT
21
![Page 22: Big Data – Trends to Watchmedia.govtech.net/.../2012/...to_Big_Data_NetApp.pdf · Despite the hype, Big Data is not new and is more than just analytics! (Many agencies and private](https://reader033.fdocuments.us/reader033/viewer/2022050309/5f7136dcf5ac0711e37b15af/html5/thumbnails/22.jpg)
YOU’VE GOTYOU VE GOT
I’VE GOTI VE GOTrambling responses that sound like
@thebillp or [email protected]