Discover.hdp2.2.h base.final[2]
-
Upload
hortonworks -
Category
Software
-
view
817 -
download
0
Transcript of Discover.hdp2.2.h base.final[2]
Page 1 © Hortonworks Inc. 2014
Discover HDP 2.2: Apache HBase with YARN & Slider for Fast NoSQL Access
Hortonworks. We do Hadoop.
Page 2 © Hortonworks Inc. 2014
Speakers
Justin Sears
Hortonworks Product Marketing Manager
Carter Shanklin
Hortonworks Director of Product Management & PM for Apache HBase in Hortonworks Data Platform
Enis Soztutar
Hortonworks Engineer, Apache HBase Committer & PMC Member
Page 3 © Hortonworks Inc. 2014
Agenda
• Introduction to Apache HBase
• New HBase Innovation in HDP 2.2 – HBase HA – Support for rolling upgrades
– HBase on YARN using Apache Slider
• Q & A
We’ll move quickly: • Attendee phone lines are muted • Text any questions to Enis Soztutar using Webex chat
• Questions answered at the end • Unanswered questions and answers in upcoming blog post
Page 4 © Hortonworks Inc. 2014
Big Data, Hadoop & Data Center Re-platforming
Business Drivers
• From reactive analytics to proactive interactions
• Insights that drive competitive advantage & optimal returns
Financial Drivers
• Cost of data systems, as % of IT spend, continues to grow
• Cost advantages of commodity hardware & open source software
$ Technical Drivers
• Data is growing exponentially & existing systems overwhelmed
• Predominantly driven by NEW types of data that can inform analytics
There is an inequitable balance between vendor and customer in the market
Page 5 © Hortonworks Inc. 2014
Clickstream Capture and analyze website visitors’ data trails and optimize your website
Sensors Discover patterns in data streaming automatically from remote sensors and machines
Server Logs Research logs to diagnose process failures and prevent security breaches
New Types of Data Hadoop Value:
Sentiment Understand how your customers feel about your brand and products – right now
Geographic Analyze location-based data to manage operations where they occur
Unstructured Understand patterns in files across millions of web pages, emails, and documents
Page 6 © Hortonworks Inc. 2014
A Shift from Reactive to Proactive Interactions
HDP and Hadoop allow organizations to use data to shift interactions from…
Reactive Post Transaction
Proactive Pre Decision
…to Real-time Personalization From static branding
…to repair before break From break then fix
…to Designer Medicine From mass treatment
…to Automated Algorithms From Educated Investing
…to 1x1 Targeting From mass branding
A shift in Advertising
A shift in Financial Services
A shift in Healthcare
A shift in Retail
A shift in Telco
Page 7 © Hortonworks Inc. 2014
Enterprise Goals for the Modern Data Architecture
• Consolidate siloed data sets structured and unstructured
• Central data set on a single cluster
• Multiple workloads across batch interactive and real time
• Central services for security, governance and operation
• Preserve existing investment in current tools and platforms
• Single view of the customer, product, supply chain
APP
LIC
ATIO
NS
DAT
A S
YSTE
M
Business Analytics
Custom Applications
Packaged Applications
RDBMS
EDW
MPP
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° N
Interactive Real-Time Batch CRM
ERP
Other 1 ° ° °
° ° ° °
HDFS (Hadoop Distributed File System)
SOU
RC
ES
EXISTING Systems
Clickstream Web &Social
Geoloca9on Sensor & Machine
Server Logs
Unstructured
Page 8 © Hortonworks Inc. 2014
YARN Transformed Hadoop & Opened a New Era
YARN The Architectural Center of Hadoop
• Common data platform, many applications
• Support multi-tenant access & processing
• Batch, interactive & real-time use cases
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Java Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV Engines
HDFS (Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
Slider Slider
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Page 9 © Hortonworks Inc. 2014
YARN Extends Hadoop to Other Data Center Leaders
YARN The Architectural Center of Hadoop
• Common data platform, many applications
• Support multi-tenant access & processing
• Batch, interactive & real-time use cases
• Supports 3rd-party ISV tools
(ex. SAS, Syncsort, Actian, etc.)
YARN Ready Applications Facilitates ongoing innovation and enterprise adoption via ecosystem of new and existing “YARN Ready” solutions
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Java Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV Engines
HDFS (Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
Slider Slider
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Page 10 © Hortonworks Inc. 2014
Enterprise Hadoop: Central Set of Services
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
° ° ° ° °
° ° ° ° °
Enables Apache Hadoop to be an Enterprise Data Platform with centralized services for:
• Governance
• Operations
• Security
Everything that plugs into Hadoop inherits these services
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Load data and manage
according to policy
Deploy and effectively
manage the platform
Provide layered approach to
security through Authentication, Authorization,
Accounting, and Data Protection
SECURITY GOVERNANCE OPERATIONS
Script
Pig
SQL
Hive
Java Scala
Cascading
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Others
ISV Engines
YARN: Data Operating System (Cluster Resource Management)
HDFS (Hadoop Distributed File System)
Tez Slider Slider Tez Tez
Page 11 © Hortonworks Inc. 2014
Hortonworks Data Platform 2.2
HDP Delivers Enterprise Hadoop
YARN: Data Operating System (Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Java Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
HDFS (Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase Accumulo
Slider Slider
SECURITY GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume Kafka NFS
WebHDFS
Authentication Authorization
Audit Data Protection
Storage: HDFS
Resources: YARN Access: Hive
Pipeline: Falcon Cluster: Ranger Cluster: Knox
Deployment Choice Linux Windows Cloud
YARN is the architectural center of HDP
• Common data set across all applications
• Batch, interactive & real-time workloads
• Multi-tenant access & processing
Provides comprehensive enterprise capabilities
• Governance
• Security
• Operations
Enables broad ecosystem adoption
• ISVs can plug directly into Hadoop
The widest range of deployment options • Linux & Windows
• On premises & cloud
Others
ISV Engines
On-Premises
Page 12 © Hortonworks Inc. 2014
Hortonworks Data Platform 2.2
HDP Delivers Enterprise Hadoop
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez Tez
Java Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
HDFS (Hadoop Distributed File System)
Stream
Storm
Search
Solr
Slider
SECURITY GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume Kafka NFS
WebHDFS
Authentication Authorization
Audit Data Protection
Storage: HDFS
Resources: YARN Access: Hive
Pipeline: Falcon Cluster: Ranger Cluster: Knox
YARN is the architectural center of HDP
• Common data set across all applications
• Batch, interactive & real-time workloads
• Multi-tenant access & processing
Provides comprehensive enterprise capabilities
• Governance
• Security
• Operations
Enables broad ecosystem adoption
• ISVs can plug directly into Hadoop
The widest range of deployment options • Linux & Windows
• On premises & cloud
Others
ISV Engines
YARN: Data Operating System (Cluster Resource Management)
Deployment Choice Linux Windows Cloud On-Premises
NoSQL
HBase Accumulo
Slider
Page 13 © Hortonworks Inc. 2014
Introduction to Apache HBase
Page 14 © Hortonworks Inc. 2014
What Is Apache HBase?
Flexible Schema Extreme Low Latency SQL and NoSQL Interfaces Store and Process Petabytes of Data Scale out on Commodity Servers Integrated with YARN 100% Open Source
YARN : Data Opera9ng System
HBase
RegionServer
1 ° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° ° N
HDFS (Permanent Data Storage)
HBase
RegionServer
HBase
RegionServer
Flexible Schema Extreme Low Latency
Directly Integrated with Hadoop
Page 15 © Hortonworks Inc. 2014
New in HDP 2.2: HBase HA
Page 16 © Hortonworks Inc. 2014
Primary Keys: (Read Write)
1-‐100
Standby Keys: (Read Only)
101-‐200 201-‐300
Primary Keys: (Read Write)
101-‐200
Standby Keys: (Read Only)
201-‐300 301-‐400
Primary Keys: (Read Write)
201-‐300
Standby Keys: (Read Only)
301-‐400 1-‐100
Primary Keys: (Read Write)
301-‐400
Standby Keys: (Read Only)
1-‐100 101-‐200
HBase RegionServer 1
HBase RegionServer 2
HBase RegionServer 3
HBase RegionServer 4
HDFS (3 Copies of All Data, Available to all RegionServers)
1
2
3
1 HBase Keys are range parVVoned across servers, node failure affects 1 key range, rest remain available.
2 HBase HA stores read-‐only copies in separate RegionServers. Data can sVll be read if a node fails.
3 3 copies of all data stored in HDFS. Data from failed nodes automaVcally recovered on other nodes.
HBase HA: 3 Levels of Protec9on
Page 17 © Hortonworks Inc. 2014
Comparing HBase HA Phase 1 Versus 2
Item HA Phase 1 / HDP 2.1 HA Phase 2 / HDP 2.2
Data Staleness > 30s Near Zero
HA in Scans Unsupported Supported
Region Split/Merge Disabled Supported
META Table Highly Available Unsupported Supported
HBCK check for common HA problems Unsupported Supported
Page 18 © Hortonworks Inc. 2014
New in HDP 2.2: Rolling Upgrades
Page 19 © Hortonworks Inc. 2014
Rolling Upgrade Goals Zero downtime upgrades
Roll forward and roll backward
Update clients and servers independently
Page 20 © Hortonworks Inc. 2014
HBase Rolling Upgrade: Component Overview
New Package Format
Install mulVple versions of Hadoop so`ware on a single
node or cluster.
hdp-‐select U9lity
Choose the component version you want, roll forward or backward.
Decoupled Clients and Servers
Upgrade servers
independently of clients.
Page 21 © Hortonworks Inc. 2014
HBase Rolling Upgrade: Directory Layout Directory Layout: /usr/hdp
[root@cluster1 current]# pwd /usr/hdp/current [root@cluster1 current]# ls -‐l | grep hbase lrwxrwxrwx. 1 root root 27 Dec 6 22:57 hbase-‐client -‐> /usr/hdp/2.2.0.0-‐1995/hbase lrwxrwxrwx. 1 root root 27 Dec 6 22:57 hbase-‐master -‐> /usr/hdp/2.2.0.0-‐1995/hbase lrwxrwxrwx. 1 root root 27 Dec 6 22:57 hbase-‐regionserver -‐> /usr/hdp/2.2.0.0-‐1995/hbase
[root@cluster1 hdp]# pwd /usr/hdp [root@cluster1 hdp]# ls -‐l drwxr-‐xr-‐x. 19 root root 4096 Nov 15 07:26 2.2.0.0-‐1995 drwxr-‐xr-‐x. 2 root root 4096 Dec 7 01:22 2.2.0.1-‐2217 drwxr-‐xr-‐x. 2 root root 4096 Dec 6 22:57 current
Multiple versions of the HDP stack.
Within /usr/hdp/current
Page 22 © Hortonworks Inc. 2014
HBase Rolling Upgrade: Upgrade One Component hdp-‐select [root@cluster1 hdp]# hdp-‐select status | grep hbase hbase-‐client -‐ 2.2.0.0-‐1995 hbase-‐master -‐ 2.2.0.0-‐1995 hbase-‐regionserver -‐ 2.2.0.0-‐1995
Upgrade Servers Before Clients
[root@cluster1 hdp]# hdp-‐select set hbase-‐master 2.2.0.1-‐2217
[root@cluster1 current]# pwd /usr/hdp/current [root@cluster1 current]# ls -‐l | grep hbase lrwxrwxrwx. 1 root root 27 Dec 6 22:57 hbase-‐client -‐> /usr/hdp/2.2.0.0-‐1995/hbase lrwxrwxrwx. 1 root root 27 Dec 7 02:23 hbase-‐master -‐> /usr/hdp/2.2.0.1-‐2217/hbase lrwxrwxrwx. 1 root root 27 Dec 6 22:57 hbase-‐regionserver -‐> /usr/hdp/2.2.0.0-‐1995/hbase
Page 23 © Hortonworks Inc. 2014
Rolling Upgrade Contracts Rolling Upgrade works for minor upgrades. • Example: HDP 2.2.0 to HDP 2.2.1.
Wire compatibility guaranteed between clients and servers.
Binary compatibility guaranteed, e.g. for coprocessors.
Data format compatibility guaranteed.
Page 24 © Hortonworks Inc. 2014
Rolling Upgrade Benefits
Rolling Upgrade Benefit Upgrade with zero downVme. Roll forward and roll backward. Instant switchover / restart preserve data locality when upgrading HBase. Update servers and clients independently.
Page 25 © Hortonworks Inc. 2014
New in HDP 2.2: HBase on YARN via Slider
Page 26 © Hortonworks Inc. 2014
Deploying HBase with Slider What is it? • Deploy HBase into the Hadoop cluster using YARN.
Benefit Details Simplified Deployment No need to deploy HBase or its configuration to individual cluster nodes. Lifecycle Management Start / stop / process management handled automatically. Multitenancy Different users can run HBase clusters within one Hadoop cluster. Multiple Versions Run different versions of HBase (e.g. 0.98 and 1.0) on the same cluster. Elasticity Cluster size is a parameter and easily changed. Co-located Analytics HBase resource usage is known to YARN, nodes running HBase will not
be used as heavily to satisfy MapReduce or Tez jobs.
Page 27 © Hortonworks Inc. 2014
HBase / Slider Sample Configure HBase settings in appConfig.json and resources.json
Sample Slider Command: • slider create mycluster \
-‐-‐template appConfig.json \
-‐-‐resources resources.json
{ "schema": "http://example.org/specification/v2.0.0", "metadata": { }, "global": { "site.hbase-‐site.hbase.hstore.flush.retries.number": "120", "site.hbase-‐site.hbase.client.keyvalue.maxsize": "10485760", "site.hbase-‐site.hbase.hstore.compactionThreshold": "3", "site.hbase-‐site.hbase.rootdir": "${DEFAULT_DATA_DIR}/data", "site.hbase-‐site.hbase.stagingdir": "${DEFAULT_DATA_DIR}/staging", "site.hbase-‐site.hbase.regionserver.handler.count": "60”, ...
Page 28 © Hortonworks Inc. 2014
Q & A
Page 29 © Hortonworks Inc. 2014
Thank you! Learn more at: hortonworks.com/hadoop/hbase/
Register for the last
Discover HDP 2.2 Webinar
Hortonworks.com/webinars