Scaling VividCortex's Big Data Systems on MySQL
-
Upload
vividcortex -
Category
Technology
-
view
578 -
download
3
Transcript of Scaling VividCortex's Big Data Systems on MySQL
![Page 1: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/1.jpg)
SCALING VIVIDCORTEX'S BIG DATA SYSTEMS ON
MYSQLBARON SCHWARTZ
SCALE FEBRUARY 2015
![Page 2: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/2.jpg)
ABOUT VIVIDCORTEX
VIVIDCORTEX IS THE BEST WAY TO SEE WHAT YOUR PRODUCTION MYSQL SERVERS ARE DOING
CAPTURES THOUSANDS OF METRICS IN ONE-SECOND RESOLUTION FROM YOUR PRODUCTION SYSTEMS
NO MORE SLOW-QUERY-LOG ANALYSIS AND PAINFUL MANUAL CONFIGURATION — GET INSIGHT IN SECONDS, NOT HOURS
AWESOME USER INTERFACE
FREE TRIAL, NO-RISK: VIVIDCORTEX.COM/
![Page 3: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/3.jpg)
![Page 4: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/4.jpg)
WHAT IS TIME-SERIES DATA?ANY MEASUREMENTS TAKEN AT A SPECIFIC POINT IN TIME
STOCK TICKERS, WEATHER DATA, TWEETS (?)
FOR TODAY'S PURPOSES, LOTS AND LOTS OF:
A MEASUREMENT (VALUE)
OF A SPECIFIC METRIC OF INTEREST
FROM A PARTICULAR HOST/SOURCE
AT A SPECIFIC MOMENT IN TIME
![Page 5: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/5.jpg)
POPULAR TIME-SERIES DATABASES
RRDTOOL
GRAPHITE (WHISPER)
HBASE, CASSANDRA, OPENTSDB, ETC
INFLUXDB
HOMEGROWN
![Page 6: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/6.jpg)
VIVIDCORTEX’S TIME-SERIES DATA
METRICS: {HOST, METRIC, TIMESTAMP, VALUE}
E.G. {83, “OS.CPU.UTILIZATION”, 1418143666, 18.2%}
QUERY METRICS
DITTO, BUT THE METRIC NAME IS RELATED TO THE QUERY FAMILY
E.G. “HOST.QUERIES.C.1374C6821EAD6F47.TPUT”
METRICS PER-USER, PER-PROCESS, PER-DATABASE, ETC
QUERY SAMPLES, EVENTS, FAULTS, SYSTEM VARIABLE CHANGES, ETC
OUT OF SCOPE TODAY; SEE HERE
![Page 7: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/7.jpg)
DENSE AND SPARSE METRICS
DENSE METRICS
ALWAYS EXIST AT EVERY POINT IN TIME
EXAMPLE: SYSTEM FREE MEMORY
EXAMPLE: CPU UTLIZATION
SPARSE METRICS
MAY ONLY OCCUR OCCASIONALLY
EXAMPLE: METRICS RELATED TO A SPECIFIC QUERY
![Page 8: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/8.jpg)
WHAT’S UNUSUAL AT VIVIDCORTEX
HIGH RESOLUTION: EVERYTHING IN 1-SECOND GRANULARITY
LARGE NUMBER OF METRICS (CARDINALITY, AND RATE)
MANY METRICS ARE HIGHLY SPARSE
![Page 9: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/9.jpg)
QUESTIONS WE ASKRETRIEVE METRIC A FROM TIMESTAMP B TO C AT RESOLUTION D
RANK ALL METRICS MATCHING PATTERN X FROM B TO C, LIMIT N
![Page 10: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/10.jpg)
SCHEMA DESIGN + INDEXING
MULTI-TENANT, SHARDED ARCHITECTURE
EACH CUSTOMER’S DATA STORED IN A SEPARATE DATABASE
STRONG ENCRYPTION IN-FLIGHT AND AT-REST (SEE BLOG POST)
DATA IS PARTITIONED BY TIME
WE USE INNODB STORAGE ENGINE (TRANSACTIONAL, CRASH AND CORRUPTION RESISTANT, CLUSTERED INDEXES)
![Page 11: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/11.jpg)
SCHEMA DESIGN + INDEXING
METRIC-FIRST OR TIMESTAMP-FIRST, THAT IS THE QUESTION.
FOR THIS PURPOSE, A HOST/SOURCE IS ESSENTIALLY A METRIC PREFIX.
![Page 12: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/12.jpg)
METRIC-FIRST
ADVANTAGES:
OPTIMIZED FOR FAST READS OF DENSE METRICS
DRAWBACKS:
ENUMERATING / READING LARGE CATEGORIES OF METRICS
![Page 13: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/13.jpg)
TIMESTAMP-FIRST
ADVANTAGES:
OPTIMIZED FOR WRITING METRICS
OPTIMIZED FOR READING ALL METRICS FOR A TIME RANGE
DRAWBACKS:
PENALIZES READING A DENSE METRIC FOR A TIME RANGE
NOT OPTIMAL FOR STREAMING BY METRIC BY TIMESTAMP
![Page 14: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/14.jpg)
SECONDARY INDEXING?BENEFITS:
OPTIMIZED FOR BOTH USE CASES, THEORETICALLY
HOWEVER, NO SIGNIFICANT DIFFERENCE IN OUR TESTS
DRAWBACKS:
WRITE AMPLIFICATION, SPACE AMPLIFICATION
STILL DOESN’T COVER ALL NEEDED SCENARIOS (WE’D NEED AT LEAST SIX INDEXES)
CREATES RANDOM ACCESS LOOKUPS IN THE PRIMARY KEY
HMMMM…. TOKUDB? SOME OPERATIONAL CHALLENGES.
![Page 15: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/15.jpg)
PARTITIONING
ADVANTAGES:
COARSE-GRAINED TIMESTAMP-FIRST INDEXING
EASY PURGE OF OLD DATA
TRANSPARENT TO THE APPLICATION
DRAWBACKS:
PARTITION MAINTENANCE CAN BE A DRAG
OPERATIONAL HASSLES FOR ALTER TABLE AND SO FORTH
IMPROVEMENTS IN MYSQL 5.6 ARE VERY HELPFUL THOUGH
![Page 16: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/16.jpg)
PARTITIONING, NO MORE
DEATH BY ALTER TABLE
DEATH BY WHOLE-TABLE LOCKING
NOW, TABLE-PER-TIME-RANGE
TABLE’S NAME ENCODES TIME RANGE, SCHEMA VERSION
OBSERVATION_1_S_1424444400_1424448000
THE GO PROGRAMMING LANGUAGE MAKES IT EASY FOR US TO PARALLELIZE QUERIES ACROSS SERVERS AND TABLES
![Page 17: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/17.jpg)
BY THE WAY, YOU SHOULD USE GO :-)VIVIDCORTEX.COM/RESOURCES/BUILDING-DATABASE-DRIVEN-APPS-WITH-GO/
![Page 18: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/18.jpg)
CHALLENGE #1: HIGH INGEST RATE
LARGE NUMBER OF METRICS/SEC ARRIVING AT OUR SYSTEMS
CURRENTLY ABOUT 100K METRICS/SEC PER SHARD
WRITE WORKLOAD, SPACE USAGE
![Page 19: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/19.jpg)
CHALLENGE #1: HIGH INGEST RATE
SOLUTION: BATCH METRICS INTO VECTORS
DRAWBACK: LOSE ABILITY TO QUERY WITH SQL
COMPROMISE: AGGREGATE METADATA PER VECTOR
SOLUTION: STORE METRIC IDS, NOT NAMES, WITH VECTORS
DRAWBACK: MUST “JOIN” TO METRIC DICTIONARY FOR PATTERN-MATCHING ETC
SOLUTION (IN PROGRESS): CATEGORIZE METRIC PATTERNS
![Page 20: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/20.jpg)
CHALLENGE #2: SPARSE METRICS
HUGE CARDINALITY OF METRICS X HOSTS
CAN BE TENS OF MILLIONS OF METRICS PER HOST
MOST OF THEM INACTIVE DURING ANY GIVEN TIME RANGE
QUERYING FOR ALL IS INEFFICIENT; MUST FILTER OUT INACTIVE
NEED: TIMESTAMP-BASED INDEX OF “METRIC HAS DATA”
INEFFICIENT IN MYSQL, WORKS WELL IN REDIS
FUTURE GOAL: ERADICATE REDIS THROUGH CLEVER DESIGN
![Page 21: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/21.jpg)
HOW WELL DOES IT WORK?DATA IS REASONABLY COMPACT, EVEN THOUGH NOT COMPRESSED
FOR VIVIDCORTEX’S 50 PRODUCTION HOSTS:
FOR 10 DAYS OF 1-SECOND DATA AND 90 DAYS OF 1-MIN
80GB OF TOTAL DATA
MOST DATA IS IN QUERY SAMPLES, EVENT DATA, ETC (BLOBS)
![Page 22: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/22.jpg)
HOW IS PERFORMANCE?WE USE “WEAK” AWS EC2 SERVERS; 8CPU, 26GB MEMORY
WE INGEST ~28 BILLION METRICS PER DAY (332K/SEC)
THESE ARE ESSENTIALLY HANDLED 100% BY 3 SERVERS
(WE HAVE PASSIVE STANDBY SERVERS IN-REGION, CROSS-REGION, BACKUPS, ETC).
![Page 23: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/23.jpg)
WHAT’S GOOD?RAW EFFICIENCY PER SERVER IS REASONABLY HIGH
OUR INFRASTRUCTURE IS FAIRLY HOMOGENEOUS
WE’RE RUNNING PRETTY LEAN
![Page 24: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/24.jpg)
WHAT’S NOT SO GOOD?PROGRAMMER EFFICIENCY COULD BE BETTER
CAN’T AD-HOC QUERY THE TIMESERIES DATA
MUST USE INTERNAL TIMESERIES SERVICE INSTEAD
MYSQL IS STILL NOT AS EFFICIENT AS I WANT
INNODB OVERHEAD
BLOB STORAGE
![Page 25: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/25.jpg)
ALTERNATIVES?CASSANDRA, CASSANDRA+SPARK, ELASTICSEARCH, INFLUXDB, HBASE, OPENTSDB, DRUID…?
PROBLEMS: COMPLEXITY, PERFORMANCE, IMMATURITY, INEFFICIENCY, UNRELIABILITY...
VENDOR PITCHES ARE OFTEN FAIRLY ABSURD
RIGHT NOW, MYSQL’S RAW EFFICIENCY IS ENOUGH TO COMPENSATE FOR SOME OTHER SHORTCOMINGS. BETTER THE DEVIL YOU KNOW THAN THE DEVIL YOU DON’T?
![Page 26: Scaling VividCortex's Big Data Systems on MySQL](https://reader038.fdocuments.us/reader038/viewer/2022102903/55a524a01a28abf10e8b45b0/html5/thumbnails/26.jpg)
QUESTIONS?CONTACT INFO
- @VIVIDCORTEX AND @XAPRB
- VIVIDCORTEX.COM
- VISIT OUR BOOTH
HTTPS://WWW.FLICKR.COM/PHOTOS/OREGONDOT/14721613997/