Post on 21-Jun-2020
Google Cloud Bigtable
Jen TongDeveloper AdvocateGoogle Cloud Platform
@MimmingCodes
Agenda
1
2
Research
3
A story about bigness
How it works
4 When it's awesome
Google Research Publications
Google Research Publications
Managed Cloud Versions
Bigtable
Flume
Dremel
Managed Cloud Versions
Bigtable
Flume
Dremel
Bigtable
Dataflow
BigQuery
Cloud BigtableCloud Bigtable
Bigness
Google Internal Bigtable in Numbers
• Storage: 100s of PB
• Throughput: 1,000,000s of QPS
• Bandwidth: 100s of GB/sec
How much is that?
Several Datas worthPhoto credit: jdhancock
How much is that?
Millennia of DVD videoPhoto credit: illinoislibrary
Bigtable
Plus Hundreds of Internal Services
Influence
Google is not affiliated or endorsed by any of these companies. Apache HBase, Apache Cassandra and Apache Accumulo are trademarks are of The Apache Software Foundation. Hypertable is the trademark of Hypertable Inc.
Engineering
Engineering
Hundreds of engineer-years worth
Bigtable - The early years
• Jeff and Sanjay decided to build a database service that could scale linearly across thousands and thousands of commodity servers
○ Systems will fail, retain performance at scale
• Abandon traditional relational model
• The first generation was about:
○ Prototyping and build the service to do its first scaling
○ Migrate initial applications to Bigtable
○ Figure out replication, and first multi-tenant version of Bigtable
Bigtable - Stabilized
• From batch only, to serving web traffic
○ Low latency for 99th percentile of requests
• Polish the Bigtable service
○ React better to abusive usage
○ Mixed media clusters - mixture of SSD + HDD storage with configurable affinity
○ Bring tablet server recovery time from 10s of seconds to 1 second or less
○ Easier replication
Google Cloud Bigtable
• A fully-managed service
• Focus more on your business, less on infrastructure
• Straightforward pricing model
Data Model
Data model
How it works
HBase Architecture
HBase Cluster
Region Server
Region Server
Region Server
Region Server
Master
Region Server
Bloomfilter
Memory Table
WAL
Block Cache
RegionRegion
Region Region
ZooKeeper
HBase Client
HDFS
Bigtable Architecture
Bigtable Cell
Tabletserver
Tabletserver
Tabletserver
Tabletserver
Master
Tabletserver
Bloomfilter
Memtable
Sharedlog
Block Cache
TabletTablet
Tablet Tablet
Chubby
HBase Client
Colossus
Bigtable System Architecture
Bigtable Cell
Tabletserver
Tabletserver
Tabletserver
Tabletserver
Master
Tabletserver
Bloomfilter
Memtable
Sharedlog
Block Cache
TabletTablet
Tablet Tablet
Chubby
HBase Client
Colossus
Bigtable Architecture
Bigtable Cell
Tabletserver
Tabletserver
Tabletserver
Tabletserver
Master
Tabletserver
Bloomfilter
Memtable
Sharedlog
Block Cache
TabletTablet
Tablet Tablet
Chubby
HBase Client
Colossus
Bigtable Architecture
Bigtable Cell
Tabletserver
Tabletserver
Tabletserver
Tabletserver
Master
Tabletserver
Bloomfilter
Memtable
Sharedlog
Block Cache
TabletTablet
Tablet Tablet
Chubby
HBase Client
Colossus
Life of Bigtable data
Life of Bigtable data
Life of Bigtable data
Life of Bigtable data
When it's awesome
Management
● Who in the audience have used HBase before?
● Things you will not see in Cloud Bigtable:
○ Compactions
○ Pre-splitting
○ Lots of configuration settings
○ 1 minute regionserver outages
○ Coprocessors (for now)
Throughput
Write Throughput (MB/s)Mixed Read/Write Throughput(MB/s)
Latency
late
ncy
(ms)
at
99%
read
update
Financial ServicesFaster risk analysis, credit card fraud/abuse
Marketing/ Digital MediaUser engagement, clickstream analysis, real-time adaptive content
Internet of ThingsSensor data dashboards and anomaly detection
TelecommunicationsSampled traffic patterns, metric collection and reporting
EnergyOil well sensors, anomaly detection, predictive modeling
BiomedicalGenomics sequencing data analysis
Cloud Bigtable Use Cases
When not to use it
• Relational joins, like for online transaction processing
• Interactive querying
• Blobs over 10MB
• ACID transactions
• Automatic cross-zone replication
• You don't have much data yet
When not to use it
• Relational joins, like for online transaction processing - Cloud SQL
• Interactive querying - BigQuery
• Blobs over 10MB - Cloud Storage
• ACID transactions - Datastore
• Automatic cross-zone replication - Datastore
• You don't have much data yet - Datastore, Firebase, or Cloud SQL
Thank you!
Jen TongDeveloper AdvocateGoogle Cloud Platform
@MimmingCodeslittle418.com