OpenTSDB + Bigtable
Integrating time series database withGoogle Cloud Bigtable
Danil Zburivsky, Big Data Practice Lead - @zburivskyChristos Soulios, Big Data Architect - @c_soulios
Pythian specializes in design, implementation, and management of systems that directly contribute to revenue and business success.
History19 years in business
Growing at 30+% per year
400+ employees
300+ customers worldwide
HQ Ottawa, Canada - global reach
Technology agnostic = trusted advisor
Deep expertise: Oracle, Oracle Apps, MySQL, AWS, SQL Server, Cassandra/DataStax, Azure, PostgreSQL, Cloudera, MapR, Hortonworks etc.
Google Premier Partner Status (as of end Aug)
5 Certified Developers (soon to be 12)
Dedicated Google Technical Champion
Launch partner for: Kubernetes, Dataflow, Cloud SQL, Dataproc
Integrated OpenTSDB with Bigtable
DW Explorers Program Partner
Upcoming BigQuery & Cloud ML Launch Partner
• (time, metric, value)
• OS and apps metrics
• Industrial equipment
• Web traffic
Time series data
• Volume can be explosive
• Data arrival and access patterns are different
Storing time series data is a challenge
• Volume can be explosive
• Data arrival and access patterns are different
Storing time series data is a challenge
• NoSQL
• Data model and storage optimized for time series
• Separate query language
Better alternatives — specialized stores
• Open source
• Uses HBase as a data store
• Data model optimized for time series
• REST API
OpenTSDB
<metric_uid><timestamp><tagk1><tagv1>[...<tagkN><tagvN>]
<col_t+1>[...<col_t+N>]
OpenTSDB Architecture
Server Server Server Server
TSD TSD
HBase
TSD RPC
HBase RPC
Web UI
Scripts/Alerting
HTTP
TSD RPC
• HBase requires a full Hadoop setup (3xZK, 2xNN, 3xDN, 2xHMaster, 3xHRegion)
• HBase tuning is a job for the brave (HFiles, WAL, MemStore, BucketCache, BlockCache)
HBase can be too much
HBase can be too much
But all I wanted was a time series database
Google Cloud Bigtable
• Highly Scalable NoSQL database
• Low latency, high throughput
• Powers most Google products
• Available as a Google Cloud Service
Migrate HBase apps to Cloud Bigtable
• The Bigtable client is API compatible with HBase client
• Only replace hbase-client.jar with bigtable-hbase.jar
• No code changes required!
Migrate OpenTSDB to Cloud Bigtable
• OpenTSDB does not use standard hbase-client.jar
• OpenTSDB is based on AsyncHBase library
AsyncHBase library
• Open source HBase client library
• Multi-threaded Multiple threads use the same instance
• Fully asynchronous, non-blocking
• Implements the low level HBase RPCs
Detour: Asynchronous programming
Detour: Why asynchronous?
• Efficient thread usage
• Less threads = less memory
• CPU scheduler friendly
• Extremely high concurrency
AsyncHBase library
http://www.tsunanet.net/~tsuna/asynchbase/benchmark/viz.html
AsyncHBase library
“AsyncHBase client differs significantly from HBase's client. Switching to it is not easy as it requires to rewrite all the code that was interacting with any HBase API”
AsyncHBase documentation
AsyncBigtable library
● Complete rewrite of AsyncHBase API
● Uses standard hbase-client for Bigtable access
● Compatible with the bigtable-hbase API
AsyncBigtable challenges
● OpenTSDB jar dependencies
● AsyncBigtable is not async!
● BufferedMutator + Threadpool to emulate async
AsyncBigtable library
AsyncBigtable library
● Merged upstream OpenTSDB v2.3.0
● http://opentsdb.net/docs/build/html/user_guide/backends/bigtable.html
● https://github.com/OpenTSDB/asyncbigtable
Future work
● Native Bigtable API
● Fully asynchronous
● Improve performance
● Add more unit tests
Questions?
https://github.com/opentsdb/asyncbigtable
Top Related