Big Data and Hadoop - Introduction, Architecture - HDFS and MapReduce, Ecosystem
hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf ·...
Transcript of hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf ·...
![Page 1: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/1.jpg)
hadoop reference architecture
financial services webinar series
all informa6on in this document is confiden6al & proprietary
![Page 2: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/2.jpg)
2
financial services industry issues
© tresata inc. 2011
A. Huge Financial Services industry
ver6cal…
B. …u6lized very limited data
capabili6es which…
C. …drove poor consumer
underwri6ng decisions…
D. …that yielded severely nega6ve
financial results
Figure 1: Consumer Credit Loans vs.
Delinquency Rates Trends
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
5.0%
‐
0.5
1.0
1.5
2.0
2.5
3.0
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
Consumer Loans ($T)
Delinquency Rate (%)
Figure 2: Mortgage 1‐4 Loans vs.
Delinquency Rates Trends
0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
12.0%
‐
2.0
4.0
6.0
8.0
10.0
12.0
1990
1992
1994
1996
1998
2000
2002
2004
2006
2008
2010
Mortgage 1‐4 Loans ($T)
Delinquency Rate (%)
Source: Federal Reserve 2011
![Page 3: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/3.jpg)
3
big impact on consumer behavior
© tresata inc. 2011
• Debt Overhang
• ‘Nega6ve’ Equity Homes
• Damaged Credit Scores
• Zip Code Myth
Source: Tresata AnalyMcs 2012
![Page 4: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/4.jpg)
4
required big data capabili6es
© tresata inc. 2011
A. Current Capabili6es 1. Sub Samples
2. Limited variables
3. Limited Mme series
4. Very high cost
B. Future Capabili6es 1. All customers
2. All variables
3. Full Mme series
4. BeSer AnalyMcs
Major Data Analy6cs &
Technology Challenge!
Time (Daily)
Variables
Individual Customers Current Industry
CapabiliMes
Future Industry CapabiliMes
Source: Tresata AnalyMcs 2011
![Page 5: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/5.jpg)
5
where are we in the hype cycle?
© tresata inc. 2011
![Page 6: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/6.jpg)
6
delivery & comprehension paradigm
© tresata inc. 2011
Source: Cap Gemini 2011
![Page 7: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/7.jpg)
7
a new big data architecture
© tresata inc. 2011
Enterprise Data Systems / Enterprise File Systems (HDFS, NFS, etc.)
Hadoop Environment
Data & Analy6cs Pla\orm
• Broader leverage of EDS/EFS
• Abstracted modularity
• As‐a–service Infrastructure
• Automated Provision, Spin‐Up & Scale IaaS
• Auto‐deploy, auto‐scale Hadoop clusters (don't
need heavily staffed SME/support, instead
use commodity model)
• Unified, converging story over
Mme for indexing & algorithms
• Grow machine & senMment feedbase
Enterprise Distributed
Compute (hpc / grid)
cpu gpu fpga
TradiMonal BI / DW / OLAP
Pvt or Public Cloud (Kit & Opera6ng Model)
Smart Data AnalyMcs louds
• Broader leverage of compute
• Abstracted modularity
storage & access
paradigm
delivery & comprehension
paradigm
Batch Real‐Mme Hybrid
• Unified, near real‐Mme "big charts"
• Synergy with tradiMonal "quants/algs"
![Page 8: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/8.jpg)
8
enterprise ready hadoop stack
© tresata inc. 2011
Tresata
Analy6cs Engine
Security & Monitoring
Hadoop Opera6ng System
Infra
structu
re
Data Pla\orm
Server Hardware & Opera6on System
Apache
• High bandwidth backplane • Industry approved & cerMfied security
• Fully managed plaiorm, e2e
• Private or public cloud
• Hadoop distribuMon (like Hortonworks) • DistribuMon support
Tresata Cer6fied & Packaged BDaaS Stack
• Extensible AnalyMcs Plaiorm • Total View of Customer Engine
• Proprietary Algorithms & AnalyMcs
![Page 9: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/9.jpg)
9
do not add complexity
© tresata inc. 2011
Example Enterprise Architecture
![Page 10: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/10.jpg)
10
make the start easy
© tresata inc. 2011
Data In
Info Out
Feed IN data from
mulMple warehouses
or SOR’s
Feed BACK info to Business Processes
![Page 11: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/11.jpg)
11
if its data…you can move it
© tresata inc. 2011 Courtesy the genius of xkcd
![Page 12: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/12.jpg)
12
build vs. buy
© tresata inc. 2011
EDW EDW EDW
Commodity Hardware (HP, Dell, Super Micro etc)
Server Opera6ng System (Linux)
Data Storage (HDFS, NFS)
Data Processing (Map Reduce)
Extract
Data Access Data Pipeline
Sqoop Oozie Avro PIG Hive HBase
Data Security (Kerberos,Custom)
Business ApplicaMons
Viz EV Report Analyze Model
Hadoop Big Data Pla\orm
Tresata Pla\orm
Source: Tresata IllustraMon 2012
![Page 13: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/13.jpg)
13
focus on use cases
Structured & Unstructured
Data Sources
Store & Process
• Clean
• Parse
• De‐dupe
• Match
• Join
Compute
• Clustering
• Graph
• Classifiers
• SenMment
• Behavior
Visualize
• Heatmaps
• Benchmark
• Geotag
• Outliers
• Networks
Tresata Data Engine (fully engineered on Hadoop)
Client Data
Mul6ple
Use Cases
Marke6ng
© tresata inc. 2011
Social Data
Market Data Risk
Trading
Source: Tresata IllustraMon 2012
![Page 14: hadoop reference architecturehortonworks.com/wp-content/uploads/2012/02/ref-architecture-ubs.pdf · Data Storage (HDFS, NFS) Data Processing (Map Reduce) Extract Data Access Data](https://reader034.fdocuments.us/reader034/viewer/2022042922/5f6d79713267957d485c97d7/html5/thumbnails/14.jpg)
14
how hortonworks can help
• Training and Cer6fica6on – www.hortonworks.com/training/
• Hortonworks Data Pla\orm and Support
– www.hortonworks.com/hortonworksdataplaiorm/
– www.hortonworks.com/technology/techpreview/
– www.hortonworks.com/support/
• Educa6onal Webinars – www.hortonworks.com/webinars/