Non-Stop Hadoop for Hortonworks
-
Upload
hortonworks -
Category
Technology
-
view
136 -
download
8
description
Transcript of Non-Stop Hadoop for Hortonworks
© Hortonworks Inc. 2013
Your Presenters
Page 2
• Jagane Sundar (@jagane) – CTO of Big Data at WANdisco – Co-founder of AltoStor and former Director of
Engineering in Yahoo’s Hadoop group – Managed Hadoop 0.20.204 release for Yahoo
• Rohit Bakhshi (@Rohit2b) – Product Management at Hortonworks – Focus on HDP Platform Services, Hadoop
Core and Windows enablement – Enjoy live jazz and expresso
© Hortonworks Inc. 2013
Today’s Topics
• Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop in the MDA • WANdisco’s role in the MDA • Q&A
Page 3
© Hortonworks Inc. 2013
Existing Data Architecture
Page 4
APPLICAT
IONS
DATA
SYSTEM
REPOSITORIES
SOURC
ES
Exis4ng Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
OPERATIONAL TOOLS
MANAGE & MONITOR
DEV & DATA TOOLS
BUILD & TEST
Business Analy4cs
Custom Applica4ons
Packaged Applica4ons
© Hortonworks Inc. 2013
Existing Data Architecture
Page 5
APPLICAT
IONS
DATA
SYSTEM
REPOSITORIES
SOURC
ES
Exis4ng Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Business Analy4cs
Custom Applica4ons
Packaged Applica4ons
Source: IDC
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
© Hortonworks Inc. 2013 - Confidential
Modern Data Architecture Enabled
Page 6
APPLICAT
IONS
DATA
SYSTEM
REPOSITORIES
SOURC
ES
Exis4ng Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sen4ment, Geo, Unstructured)
OPERATIONAL TOOLS
MANAGE & MONITOR
DEV & DATA TOOLS
BUILD & TEST
Business Analy4cs
Custom Applica4ons
Packaged Applica4ons
© Hortonworks Inc. 2013 - Confidential
Drivers of Hadoop Adoption
Page 7
A Modern Data Architecture Complement your existing data systems: the right workload in the right place
Architectural
New Business Applications
Types of Big Data • CRM, ERP • Server log • Clickstream
• Sentiment/Social • Machine/Sensor • Geo-locations
© Hortonworks Inc. 2013 - Confidential
Opportunity in types of data
1. Sentiment Understand how your customers feel about your brand and products – right now
2. Clickstream Capture and analyze website visitors’ data trails and optimize your website
3. Sensor/Machine Discover patterns in data streaming automatically from remote sensors and machines
4. Geographic Analyze location-based data to manage operations where they occur
5. Server Logs Research logs to diagnose process failures and prevent security breaches
6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents
Value
Page 8
© Hortonworks Inc. 2013 - Confidential
Integrated Interoperable with existing data center investments Skills
Leverage your existing skills: development, operations, analytics
Requirements for Hadoop Adoption
Page 9
Key Services Platform, operational and data services essential for the enterprise
3 Requirements for Hadoop’s Role in the Modern Data Architecture
© Hortonworks Inc. 2013 - Confidential
1
Integrated Engineered with existing data center investments
Key Services Platform, Operational and Data services essential for the enterprise Skills Leverage your existing skills: development, analytics, operations
2
3
Requirements for Enterprise Hadoop
Page 10
OS/VM Cloud Appliance
PLATFORM SERVICES
CORE
Enterprise Readiness High Availability, Disaster Recovery, Rolling Upgrades, Security and Snapshots
HORTONWORKS DATA PLATFORM (HDP)
OPERATIONAL SERVICES
DATA SERVICES
HDFS
SQOOP
FLUME
NFS
LOAD & EXTRACT
WebHDFS
KNOX*
OOZIE
AMBARI
FALCON*
YARN
MAP TEZ REDUCE
HIVE & HCATALOG PIG HBASE
© Hortonworks Inc. 2013 - Confidential
Requirements for Enterprise Hadoop
Page 11
1
Integration Engineered with existing data center investments
Key Services Platform, operational and data services essential for the enterprise
Skills Leverage your existing skills: development, analytics, operations
2
3 DE
VELO
P AN
ALYZE
OPE
RATE
COLLECT PROCESS BUILD
EXPLORE QUERY DELIVER
PROVISION MANAGE MONITOR
© Hortonworks Inc. 2013 - Confidential
Familiar and Existing Tools
Page 12
1 Key Services Platform, operational and data services essential for the enterprise
Skills Leverage your existing skills: development, analytics, operations
2
DEVE
LOP
ANAL
YZE
OPE
RATE
COLLECT PROCESS BUILD
EXPLORE QUERY DELIVER
PROVISION MANAGE MONITOR
BusinessObjects BI
Integration Interoperable with existing data center investments 3
© Hortonworks Inc. 2013 - Confidential
APPLICAT
IONS
DATA
SYSTEM
REPOSITORIES
SOURC
ES
Exis4ng Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Emerging Sources (Sensor, Sen4ment, Geo, Unstructured)
OPERATIONAL TOOLS
MANAGE & MONITOR
DEV & DATA TOOLS
BUILD & TEST
Business Analy4cs
Custom Applica4ons
Packaged Applica4ons
Requirements for Enterprise Hadoop
Page 13
Integration Engineered with existing data center investments 3
Integrated with Applications Business Intelligence, Developer IDEs, Data Integration
Systems Data Systems & Storage, Systems Management
Platforms Operating Systems, Virtualization, Cloud, Appliances
© Hortonworks Inc. 2013 - Confidential
WANdisco in the Modern Data Architecture
Page 14
APPLICAT
IONS
DATA
SYSTEM
SOURC
ES
RDBMS EDW MPP
Emerging Sources (Sensor, Sen4ment, Geo, Unstructured)
HANA
BusinessObjects BI
OPERATIONAL TOOLS
DEV & DATA TOOLS
Exis4ng Sources (CRM, ERP, Clickstream, Logs)
INFRASTRUCTURE
© Hortonworks Inc. 2013 - Confidential
Non-Stop Hadoop for Hortonworks
Page 15
• Non-stop technology delivers continuous uptime with no data loss
• One Hadoop cluster across data centers any distance
• Eliminates the bottleneck of a single active NameNode
• Automatic backup, failover and recovery within across data centers
• LAN-speed read and write
© Hortonworks Inc. 2013
Today’s Topics
• Introduction • Drivers for the Modern Data Architecture (MDA) • Apache Hadoop’s role in the MDA • WANdisco’s role in the MDA • Q&A
Page 16
© WANdisco 2013
u WANdisco: Wide Area Network Distributed Computing – Enterprise ready, high availability software solutions that enable globally distributed
organizations to meet today’s data challenges of secure storage, scalability and availability
u Leader in tools for software engineers – Subversion – Apache Software Foundation sponsor
u Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND)
u US patented active-active replication technology granted, November 2012
u Global locations – San Ramon (CA) – Chengdu (China) – Tokyo (Japan) – Boston (MA) – Sheffield (UK) – Belfast (UK)
WANdisco Background
/ page 17
© WANdisco 2013
u Overarching theme - We’re enabling global protection against:
• Data loss
• Downtime
• Loss of Intellectual Property
• Loss of revenue/time to market
• Falling behind the competition
WANdisco
© WANdisco 2013
Non-Stop Hadoop
u Single HDFS that spans multiple Data Centers across the world
u Provides 100% Uptime for Hadoop
u Built as an extension on top of Apache Hadoop HDFS
u 100 % HDFS / 100% compatibility with Hadoop applications – Applications run unmodified
u Applications can run in any Data Center
u Not Simple Mirroring or a Copy
Extending HDFS across Data Centers
© WANdisco 2013
u WANdisco’s patented WAN capable Paxos implementation – Mathematically proven – Provides distributed co-ordination of File system metadata
• Active-Active (All locations)
• Create, Modify, Delete
• Share nothing (No Leader)
u No restrictions on distance between data centers – US Patent granted for time independent implementation of Paxos
u Not based on SAN block device synchronization such as EMC SRDF – SAN block replication has distance limits resulting from the inability of file systems such as
NTFS and ext4 to tolerate long RTTs to block storage – Possible distribution of corrupted blocks
Distributed Coordination Engine WANdisco DConE
© WANdisco 2013
u Architecture – Non-Intrusive - Not Simple Mirroring or a Copy – Does not modify Apache Hadoop – Runs on HDP 2 and later
u Provides 100% Uptime for Hadoop – Provides Continuous Availability of HDFS Data – Guarantees 100% Uptime of HDFS During all 4 Categories of Failures
u Enables HDFS to be Deployed Globally – Across the WAN – Extends HDFS Across Multiple Data Centers – Unifies the HDFS Namespace – Exceeds Business Continuity Requirements for SLAs and Compliance
u Load Balances NameNode Traffic for Increased Scalability
Non-Stop Hadoop
© WANdisco 2013
u Disaster Recovery – Data is as current as possible (no periodic synchronizations) – Virtually zero downtime to recover from regional data center failure – Regulatory compliance
u Load Balancing
u Multi Data Center Ingest – Information doesn’t need to be sent to one DC and then copied back to the other using DistCP – Parallel ingest methods don’t require redirected data streams
u Global MapReduce – Global Click Stream Analysis – Global Log Analysis – Etc.
u Maximize Resource Utilization – All data centers can be used to run different jobs concurrently
/ page 41
Use Cases for Non-Stop Hadoop with Hortonworks
© WANdisco 2013
u Non-Stop Hadoop make Hadoop Enterprise/Production Ready
u Load balancing eliminates the bottleneck of a single NameNode
u Active-Active replication solves the Hadoop high availability issue
u No job restarts or lost time for NameNode failures (Continuous Availability)
u Single HDFS across multiple data centers – No out of sync issues – No Load Balancer maintenance problems
u Data Centers can be located at any distance from each other
u If any Data Center fails, applications can be run on any other replicated Data Center
u If a Data Center is completely lost, any other replica of that Data Center can be used to restore it
/ page 42
Non-Stop Hadoop for Hortonworks Key Takeaways
© Hortonworks Inc. 2013
Next Steps:
Page 43
More about Non-Stop Hadoop for Hortonworks http://www.wandisco.com/hadoop/non-stop-hadoop-hortonworks
Get started on Hadoop with Hortonworks Sandbox http://hortonworks.com/hadoop-tutorial/
Try Non-Stop Hadoop for Hortonworks Contact us: [email protected]