StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928...
Transcript of StateoftheDatabase% · StateoftheDatabase% @HBase’ hp:// hbase.apache.org’ 20150928...
State of the Database
@HBase h)p://hbase.apache.org
2015-‐09-‐28
Nick Dimiduk (@xefyr) h)p://n10k.com
#apachebigdata
Agenda o State of the Project o State of the SoMware o State of the Ecosystem o Latest Releases o Q & A
Project: Vision Simple, steady, and powerful: “A first class high performance horizontally scalable data storage engine for Big Data, suitable as the store of record for mission criZcal data.”
J.G. Keenan Elementary Theory of Gas Turbines and Jet Propulsion (1946)
State of the Project o Data access for medium-‐ and high-‐scale services o Hundreds of enterprises and startups o Some of the largest Internet companies in the world o Running major producZon workloads since 2011 o Use-‐cases: messaging, security, measurement/“IoT”, collaboraZon, digital media, digital adverZsing, telecommunicaZons, computaZonal biology, clinical informaZcs/healthcare, insurance
Project: Goals o Availability: Always more, always faster o Stability and operability o Scaling up, scaling down o Up-‐to-‐date with NextGen “commodity” hardware o MulZ-‐tenancy o Diversity of ecosystem
State of the SoMware o Mature codebase
o >100 contributors o 1.1M lines of code (each acZve branch) o est. 1200+ human-‐years’ effort
o Clusters sizes from 10 to 1000+ machines (that we know of) o Runs on HDFS, MapR, Gluster, GPFS, Lustre o HBase as a Service: AWS/EMR, HDInsight, Qubole, Google (sort-‐of)
SoMware: Releases
SoMware: AcZve Development o Smaller regions, more regions
o Less write amplificaZon o 1M+ region clusters
o Stability o ProcedureV2 o Assignment improvements/stability
o Backup, restore tools o Built on snapshots, easier operaZons
SoMware: AcZve Development o AdapZon: Workloads
o HBase as Medium Object Store (MOB) o Tunable Availability
o Region replicas o TIMELINE consistency
o Coprocessor API stability o Profile-‐driven opZmizaZon o Less GC, more RAM (off-‐heap)
SoMware: AcZve Development o MulZ-‐tenancy
o Table groups o Quotas o PrioriZes
o Improved machine uZlizaZon o More RAM (100’s of GB) o IOPS o All of the CPUs
Ecosystem o OpenTSDB o TransacZon Managers
o Themis, Tephera, Omid2, LeanXcale o Graph engines
o Titan, Giraph, Zen, S2Graph (+loads of custom soluZons) o Myriad SQL’s o Other Hadoop components o Google Cloud Bigtable
Ecosystem: SQL
Ecosystem: Hadoop Components o YARN-‐2928 ApplicaZon Timeline Service o HIVE-‐9452 HBase to store Hive metadata o AMBARI-‐5707 Ambari Metrics System
Release: 0.94 o Last (final?) release: 0.94.27, 2015-‐03-‐26 o “ancient history”
o No new deployments o ExisZng users highly encouraged to upgrade
o Requires downZme to upgrade 😫 😡 (╯°□°)╯( ┻━┻
Release: 0.98 o Last release: 0.98.14, 2015-‐08-‐31 o “legacy”
o Most producZon deploys (probably) o Largest producZon clusters (probably) o New features backported when possible
Release 1.x o Last release: 1.1.2, 2015-‐09-‐01 o “stable”
o ProducZon deploys moving here o AcZve development
o Rolling upgrade from 0.98.x 😄 😍 ヽ(´ー`)ノ
Release 1.0 o Released 1.0.0, 2015-‐02-‐24 o AdopZng semanZc versioning
o MAJOR.MINOR.PATCH[-‐idenZfier] o Patch releases don’t quite follow spec yet
o Client / Server API cleanup o Interfaces, builder pa)ern, @InterfaceAudience
o Region Replicas o Trade Consistency, resources for Availability
Region Replicas o MulZple Region Servers host each region
o Primary + N read replicas (usually 2) o Primary is authority on reads and writes o Replicas tail replicate edits, offer TIMELINE view
o Client’s choice o Read primary only for “classic” strong consistency o Fan-‐out reads for faster, potenZally TIMELINE results
Release 1.1 o Release 1.1.0, 2015-‐05-‐15 o Async RPC client o Scanner improvements
o RPC chunking, heartbeat messages, API o ProcedureV2
o Improved operaZonal reliability o RPC thro)ling
o quotas for per user, table, namespace o CompacZon thro)ling, monitoring
ProcedureV2 o Distributed, fault-‐tolerant operaZons
o MulZple steps on mulZple machines o Roll-‐back in case of failure
o CoordinaZon of long-‐running procedures o CompacZons, splits, &c.
o Progress tracking o NoZficaZons across mulZple machines o Current status inquiries
Branch-‐1.2 o Next up in 1.x line
o “any day now” o Java 8 support
o formally, thoroughly, officially
o NaZve CRC checksums o perf!
o SyncTable o rsync for HBase tables
o Region normalizer o Balancer for region size
o Flush-‐per-‐store o on by default
o ProcV2 all the things! o (More) CompacZon
improvements
Region Normalizer o AnZ-‐entropy for region size
o Converge towards uniform size o Compliments balancer working toward uniform distribuZon
o Managed by Master, runs in the background (like balancer) o Pluggable normalizaZon strategies (“simple” default) o Use-‐cases
o Merge away regions from expired Zmeseries data o Smooth uneven bulk loads o Correct operator iniZal split guesses o Ease upgrades from ancient versions (0.92/1g vs. today/20g)
Thanks!
@HBase h)p://hbase.apache.org
2015-‐09-‐28
Nick Dimiduk (@xefyr) h)p://n10k.com
#apachebigdata